Visual Parser Help

Table of Contents

Overview

Visual Parser is an application which demonstrates the workings of the LL(1) parsing algorithm. Two inputs are required of the user, a grammar file and an input file. Once these files are provided, Visual Parser displays a series of tabbed panels, each dedicated to a specific stage of the parsing process.

Grammar File

The grammar file is a text file with a .ll extension. It contains the representation of the grammar as a series of rules in EBNF, with each rule terminated by a semi-colon. Consult the Grammar Specification for details.

Input File

The input file is a text file containing a sequence of symbolic tokens (not the lexemes) of a program to be parsed. Following is an example:

Original Program:


public class Numbers
{
    public static void main( String[] args )
    {
        System.out.println( "The first Ramanujan number is " + 1729 );
    }
}

Above program as a sequence of symbolic tokens:

PUBLIC CLASS IDENTIFIER
LCURLY
    PUBLIC STATIC VOID IDENTIFIER LPAREN IDENTIFIER LBRACK RBRACK IDENTIFIER RPAREN
    LCURLY
        IDENTIFIER DOT IDENTIFIER DOT IDENTIFIER LPAREN STRING_LITERAL PLUS INTEGER RPAREN SEMI
    RCURLY
RCURLY

A string beginning with "//" and ending with a newline is considered a comment. Any token in the input file that is not a terminal in the input grammar will be reported as a parse error.

Operation

When Visual Parser is first run, a blank screen is all that is seen. Nothing further will appear until the user has opened a grammar.

After the user has selected a grammar file, Visual Parser will convert the EBNF representation of the grammar into a BNF representation. The program will use the BNF grammar to build both the data structures needed for the proper operation of the algorithm and models of the creation process for each such data structure. Both the data structure and the models will be built before the user sees any activity on the screen.

Once the data structure and model construction process has finished, seven tabbed panels will appear:

The last two panels will be disabled until an input file has been loaded.

Tabbed Panels

Input Grammar Panel

Displays the contents of the grammar file, as is. The navigational menus and buttons (Start, Previous, Next, Finish) are disabled in this panel.

Pure BNF Grammar Panel

Displays the converted BNF rules, with rule numbers. Rules are sorted starting with the rule containing the start symbol of the grammar and proceeding through the list in a depth first order. Unreachable rules are deleted and a message will appear in the Comment box to this effect.

Comments from the original grammar file do not appear in the panel. No user actions are possible in this panel.

First Sets Panel

The two panes in this panel display the list of Pure BNF Rules and the collection of First Sets for the nonterminals in the grammar.

As the user progresses through the construction process for the First Sets using the navigational menus and buttons, each rule is examined in turn. The current rule is bold with its left hand side, the current nonterminal, highlighted in red. At each stage of the construction process, the algorithm examines one term, usually the first, on the right hand side of the current rule. This current term is highlighted in green. For example:


    10. restOfTerm ::= DIVIDE factor restOfTerm

The First Sets pane displays both the current contents of the First Set of each nonterminal in the grammar and its derivation history. As the construction process proceeds, any derivation term or set element added at the current step is displayed in bold. For example:


    First( restOfTerm ) = First( <empty> ) U First( STAR ) U First( DIVIDE )
                        = { <empty>, STAR, DIVIDE }

See First Set Algorithm for the algorithm employed here.

Follow Sets Panel

The three panes in this panel display the list of Pure BNF Rules, the completed First Sets, and the collection of Follow Sets for the nonterminals in the grammar.

As the user progresses through the construction process for the Follow Sets using the navigational menus and buttons, each rule is examined in turn, and so is each nonterminal on the right hand side of that rule. The current rule is bold and the current nonterminal is red. For each nonterminal, the terms following it are visited with the current term highlighted in green.


    1. E ::= T  B1  

The Follow Sets pane displays both the current contents of the Follow Set of each nonterminal in the grammar and its derivation history. As the construction process proceeds, any derivation term or set element added at the current step is displayed in bold.


    Follow(átermá)á=áFirst(árestOfExpressioná)áUáFollow(áexpressioná)á
                   =á{áPLUS,áMINUS,á$á}

See Follow Set Algorithm for the algorithm employed here.

Parse Table Panel

This panel contains three text panes across the top of the screen displaying the list of Pure BNF Rules, the completed First Sets, and the completed Follow Sets. Below this is a pane in which the Parse Table will be constructed as the algorithm proceeds.

As the user progresses through the Parse Table construction process using the navigational menus and buttons, each rule is examined in turn. The current rule is bold with its left hand side, the current nonterminal, highlighted in red. The algorithm examines each term, starting with the first, on the right hand side of the current rule. This term is highlighted in green. It this term derives ε, the next term is examined, and so on.


    1. expression ::= term restOfExpression

If this term is a nonterminal, its First Set entry in the First Sets pane is highlighted in bold text. The algorithm then proceeds to examine each terminal in the First Set of the current term, highlighting each element in the set in blue as it proceeds.


    First( term ) = { NUMBER, LPAREN }

If the algorithm advances passed the last term in the current rule, it must examine each term in the Follow Set for the nonterminal on the left hand side of the current rule. When this happens, the Follow Set for the current nonterminal is highlighted in bold and the current terminal in this set is highlighted in blue.


    Follow( restOfExpression ) = { $, RPAREN }

Whenever a new entry is made in the Parse Table, the entry is highlighted in bold text and the background color of the cell in which it was added is changed to yellow.

   PLUS   MINUS   STAR   DIVIDE   LPAREN   RPAREN   NUMBER   $ 
expression           1   1  
restOfExpression    4 5       3     3  
term                
restOfTerm                
factor                

See LL(1) Parse Table Algorithm for the algorithm employed here.

Input Panel

Displays the contents of the input, or source file, including comments. No user actions are possible in this panel.

Parse Action Panel

This panel contains a topmost text pane displaying the source file being parsed. Below this is a pane containing the Pure BNF Rules and the Parse Table. Below this pane are separate panes for the Stack and the Current Line.

As the user progresses through the parsing process using the navigational menus and buttons, the current line of the input file is displayed in bold text, and the current token is highlighted in red.


    NUMBER STAR LPAREN NUMBER PLUS NUMBER RPAREN
    $

This line will also appear in the Input Line pane, though here only the current token will be highlighted in bold red. Any token already parsed will not be displayed.


    STAR LPAREN NUMBER PLUS NUMBER RPAREN

As the parsing process proceeds, both terminals and nonterminals are added to the Stack. If the top token on the stack is a terminal and it matches the current token in the input line, the terminal will be highlighted in bold red text. The next time the user executes the Next command, this terminal will disappear from both the Stack and Current Line panes.

Stack:


    $ restOfExpression restOfTerm NUMBER

Current Line:


    NUMBER STAR LPAREN NUMBER PLUS NUMBER RPAREN

If the top token on the stack is a nonterminal, it will be highlighted in bold green.


    $ restOfExpression restOfTerm

If there is an entry in the Parse Table for the combination of the top nonterminal on the stack and the current input token, the cell containing this entry will change its color to light blue, and the rule number will be displayed in bold text.

   PLUS   MINUS   STAR   DIVIDE   LPAREN   RPAREN   NUMBER    $  
expression         1   1  
restOfExpression  4 5       3   3
term         2   2  
restOfTerm 8 8 9 10   8   8
factor         7   6  

When the Parse Table has found a rule to substitute for the the top nonterminal on the Stack, that rule will be highlighted in bold text in the Pure BNF Grammar pane and its right hand side, which will subsequently be substituted for the top nonterminal on the Stack, will be highlighted in blue.


    9. restOfTerm ::= STAR factor restOfTerm 

Menus

File Menu Action Menu Help Menu

Grammar Specification

The grammar must be specified in EBNF. There are five accepted EBNF operations:
  1. Assignment - indicated by the string "::=".
  2. Repetition - a series of symbols lying between the characters '{' and '}' indicates that those symbols may appear zero or more times.
  3. Option - a series of symbols lying between the characters '[' and ']' indicates that those symbols may or may not appear.
  4. Alternative - symbols separated from one another by the character '|' indicates that one of the symbols may appear.
  5. Concatenation - a series of symbols separated by whitespaces.
  6. Grouping - a series of symbols lying between the characters '(' and ')' indicates that those symbols are to be treated as a single entity.

A whitespace is a space, tab or a newline. Empty string (ε) is indicated by the string <empty>. A string beginning with "//" and ending with a newline is considered a comment.

Formally, a grammar has the following structure:


    Grammar      ::= Rule ";" { Rule ";" }
    Rule         ::= NonTerminal "::=" AlterExpr
    AlterExpr    ::= ConcatExpr { "|" ConcatExpr }
    ConcatExpr   ::= Phrase { Phrase }
    Phrase       ::= "[" AlterExpr "]" | "{" AlterExpr "}" | "(" AlterExpr ")" | NonTerminal | Terminal | "<empty>"
    NonTerminal  ::=  Symbol
    Terminal     ::=  Symbol
    Symbol       ::= Letter { Letter | Digit | "_" } { "'" }
    Letter       ::= "A" ... "Z" | "a" ... "z"
    Digit        ::= "0" ... "9"

Any symbol that appears on the left hand side of a rule is a nonterminal. All other symbols are terminals. The left hand side symbol of the first rule is the start symbol of the grammar.

In the example grammars that are furnished with this project, we conform to the following conventions:

  1. Nonterminals will be represented as mixed-case strings.
  2. Terminals will be represented as upper-case strings.

Algorithms

The following algorithms are taken from Kenneth C. Louden's text "Compiler Construction, Principles and Practices".

First Set Algorithm


For all nonterminals A do
    First( A ) := { }
While there are changes to any First( A ) do
    For each production choice A ->  X1 X2 ... Xn do
	k := 1
	continue := true
	While continue = true and k <= n do
	    Add First( Xk ) - { ε } to First( A )
	    If ε is not in First( Xk ) then
	        continue := false
	    k := k + 1
	If continue = true then
	    Add ε to First( A )

Follow Set Algorithm


Follow( start-symbol ) := { $ }
For all nonterminals A != start-symbol do
    Follow( A ) := { }
While there are changes to any of the Follow Sets do
    For each production A ::= X1 X2 ... Xn do
        For each Xi that is a nonterminal do
	    Add First(Xi+1Xi+2...Xn) - { ε } to Follow(Xi)
            ( Note: If i = n, then Xi+1Xi+2...Xn = ε )
	    If ε is in First( Xi+1Xi+2...Xn ) then 
                Add Follow( A ) to Follow( Xi )

LL(1) Parse Table Algorithm


Repeat the following two steps for each nonterminal A and production choice A ::= α 
    1. For each token a in First( α ), add A ::= α to the entry M[ A, a ].
    2. If ε is in First( α ), for each element a of Follow( A ) (a token or $), add A ::= α to the entry M[ A, a ].

LL(1) Parsing Algorithm


Push $ and the start symbol onto the top of the parsing stack
While the top of the parsing stack != $ and the next token != $ 
    If the top of the parsing stack is terminal a and the next token = a then
        Pop the top symbol from parsing stack and advance the input
    Else if the top of the parsing stack is nonterminal A and parse table entry M[ A, a ] contains production A ::= X1 X2 ... Xn then
        Pop the top symbol from the parsing stack
        For i = n downto 1 do 
            Push Xi onto the parsing stack
    Else 
        Display an error message and stop 
If the top of the parsing stack = $ and next token = $ then 
    Parsing is complete
Else 
    Display an error message and stop