Visual Parser is an application which demonstrates the workings of the LL(1) parsing algorithm. Two inputs are required of the user, a grammar file and an input file. Once these files are provided, Visual Parser displays a series of tabbed panels, each dedicated to a specific stage of the parsing process.
The grammar file is a text file with a .ll extension. It contains the representation of the grammar as a series of rules in EBNF, with each rule terminated by a semi-colon. Consult the Grammar Specification for details.
The input file is a text file containing a sequence of symbolic tokens (not the lexemes) of a program to be parsed. Following is an example:
Original Program:
public class Numbers
{
public static void main( String[] args )
{
System.out.println( "The first Ramanujan number is " + 1729 );
}
}
Above program as a sequence of symbolic tokens:
PUBLIC CLASS IDENTIFIER
LCURLY
PUBLIC STATIC VOID IDENTIFIER LPAREN IDENTIFIER LBRACK RBRACK IDENTIFIER RPAREN
LCURLY
IDENTIFIER DOT IDENTIFIER DOT IDENTIFIER LPAREN STRING_LITERAL PLUS INTEGER RPAREN SEMI
RCURLY
RCURLY
A string beginning with "//" and ending with a newline is considered a comment. Any token in the input file that is not a terminal in the input grammar will be reported as a parse error.
When Visual Parser is first run, a blank screen is all that is seen. Nothing further will appear until the user has opened a grammar.
After the user has selected a grammar file, Visual Parser will convert the EBNF representation of the grammar into a BNF representation. The program will use the BNF grammar to build both the data structures needed for the proper operation of the algorithm and models of the creation process for each such data structure. Both the data structure and the models will be built before the user sees any activity on the screen.
Once the data structure and model construction process has finished, seven tabbed panels will appear:
Displays the contents of the grammar file, as is. The navigational menus and buttons (Start, Previous, Next, Finish) are disabled in this panel.
Displays the converted BNF rules, with rule numbers. Rules are sorted starting with the rule containing the start symbol of the grammar and proceeding through the list in a depth first order. Unreachable rules are deleted and a message will appear in the Comment box to this effect.
Comments from the original grammar file do not appear in the panel. No user actions are possible in this panel.
The two panes in this panel display the list of Pure BNF Rules and the collection of First Sets for the nonterminals in the grammar.
As the user progresses through the construction process for the First Sets using the navigational menus and buttons, each rule is examined in turn. The current rule is bold with its left hand side, the current nonterminal, highlighted in red. At each stage of the construction process, the algorithm examines one term, usually the first, on the right hand side of the current rule. This current term is highlighted in green. For example:
10. restOfTerm ::= DIVIDE factor restOfTerm
The First Sets pane displays both the current contents of the First Set of each nonterminal in the grammar and its derivation history. As the construction process proceeds, any derivation term or set element added at the current step is displayed in bold. For example:
First( restOfTerm ) = First( <empty> ) U First( STAR ) U First( DIVIDE )
= { <empty>, STAR, DIVIDE }
See First Set Algorithm for the algorithm employed here.
The three panes in this panel display the list of Pure BNF Rules, the completed First Sets, and the collection of Follow Sets for the nonterminals in the grammar.
As the user progresses through the construction process for the Follow Sets using the navigational menus and buttons, each rule is examined in turn, and so is each nonterminal on the right hand side of that rule. The current rule is bold and the current nonterminal is red. For each nonterminal, the terms following it are visited with the current term highlighted in green.
1. E ::= T B1
The Follow Sets pane displays both the current contents of the Follow Set of each nonterminal in the grammar and its derivation history. As the construction process proceeds, any derivation term or set element added at the current step is displayed in bold.
Follow(átermá)á=áFirst(árestOfExpressioná)áUáFollow(áexpressioná)á
=á{áPLUS,áMINUS,á$á}
See Follow Set Algorithm for the algorithm employed here.
Parse Table PanelThis panel contains three text panes across the top of the screen displaying the list of Pure BNF Rules, the completed First Sets, and the completed Follow Sets. Below this is a pane in which the Parse Table will be constructed as the algorithm proceeds.
As the user progresses through the Parse Table construction process using the navigational menus and buttons, each rule is examined in turn. The current rule is bold with its left hand side, the current nonterminal, highlighted in red. The algorithm examines each term, starting with the first, on the right hand side of the current rule. This term is highlighted in green. It this term derives ε, the next term is examined, and so on.
1. expression ::= term restOfExpression
If this term is a nonterminal, its First Set entry in the First Sets pane is highlighted in bold text. The algorithm then proceeds to examine each terminal in the First Set of the current term, highlighting each element in the set in blue as it proceeds.
First( term ) = { NUMBER, LPAREN }
If the algorithm advances passed the last term in the current rule, it must examine each term in the Follow Set for the nonterminal on the left hand side of the current rule. When this happens, the Follow Set for the current nonterminal is highlighted in bold and the current terminal in this set is highlighted in blue.
Follow( restOfExpression ) = { $, RPAREN }
Whenever a new entry is made in the Parse Table, the entry is highlighted in bold text and the background color of the cell in which it was added is changed to yellow.
PLUS MINUS STAR DIVIDE LPAREN RPAREN NUMBER $ expression 1 1 restOfExpression 4 5 3 3 term restOfTerm factor
See LL(1) Parse Table Algorithm for the algorithm employed here.
Input PanelDisplays the contents of the input, or source file, including comments. No user actions are possible in this panel.
Parse Action PanelThis panel contains a topmost text pane displaying the source file being parsed. Below this is a pane containing the Pure BNF Rules and the Parse Table. Below this pane are separate panes for the Stack and the Current Line.
As the user progresses through the parsing process using the navigational menus and buttons, the current line of the input file is displayed in bold text, and the current token is highlighted in red.
NUMBER STAR LPAREN NUMBER PLUS NUMBER RPAREN
$
This line will also appear in the Input Line pane, though here only the current token will be highlighted in bold red. Any token already parsed will not be displayed.
STAR LPAREN NUMBER PLUS NUMBER RPAREN
As the parsing process proceeds, both terminals and nonterminals are added to the Stack. If the top token on the stack is a terminal and it matches the current token in the input line, the terminal will be highlighted in bold red text. The next time the user executes the Next command, this terminal will disappear from both the Stack and Current Line panes.
Stack:
$ restOfExpression restOfTerm NUMBER
Current Line:
NUMBER STAR LPAREN NUMBER PLUS NUMBER RPAREN
If the top token on the stack is a nonterminal, it will be highlighted in bold green.
$ restOfExpression restOfTerm
If there is an entry in the Parse Table for the combination of the top nonterminal on the stack and the current input token, the cell containing this entry will change its color to light blue, and the rule number will be displayed in bold text.
PLUS MINUS STAR DIVIDE LPAREN RPAREN NUMBER $ expression 1 1 restOfExpression 4 5 3 3 term 2 2 restOfTerm 8 8 9 10 8 8 factor 7 6
When the Parse Table has found a rule to substitute for the the top nonterminal on the Stack, that rule will be highlighted in bold text in the Pure BNF Grammar pane and its right hand side, which will subsequently be substituted for the top nonterminal on the Stack, will be highlighted in blue.
9. restOfTerm ::= STAR factor restOfTerm
A whitespace is a space, tab or a newline. Empty string (ε) is indicated by the string <empty>. A string beginning with "//" and ending with a newline is considered a comment.
Formally, a grammar has the following structure:
Grammar ::= Rule ";" { Rule ";" }
Rule ::= NonTerminal "::=" AlterExpr
AlterExpr ::= ConcatExpr { "|" ConcatExpr }
ConcatExpr ::= Phrase { Phrase }
Phrase ::= "[" AlterExpr "]" | "{" AlterExpr "}" | "(" AlterExpr ")" | NonTerminal | Terminal | "<empty>"
NonTerminal ::= Symbol
Terminal ::= Symbol
Symbol ::= Letter { Letter | Digit | "_" } { "'" }
Letter ::= "A" ... "Z" | "a" ... "z"
Digit ::= "0" ... "9"
Any symbol that appears on the left hand side of a rule is a nonterminal. All other symbols are terminals. The left hand side symbol of the first rule is the start symbol of the grammar.
In the example grammars that are furnished with this project, we conform to the following conventions:
The following algorithms are taken from Kenneth C. Louden's text "Compiler Construction, Principles and Practices".
For all nonterminals A do
First( A ) := { }
While there are changes to any First( A ) do
For each production choice A -> X1 X2 ... Xn do
k := 1
continue := true
While continue = true and k <= n do
Add First( Xk ) - { ε } to First( A )
If ε is not in First( Xk ) then
continue := false
k := k + 1
If continue = true then
Add ε to First( A )
Follow( start-symbol ) := { $ }
For all nonterminals A != start-symbol do
Follow( A ) := { }
While there are changes to any of the Follow Sets do
For each production A ::= X1 X2 ... Xn do
For each Xi that is a nonterminal do
Add First(Xi+1Xi+2...Xn) - { ε } to Follow(Xi)
( Note: If i = n, then Xi+1Xi+2...Xn = ε )
If ε is in First( Xi+1Xi+2...Xn ) then
Add Follow( A ) to Follow( Xi )
Repeat the following two steps for each nonterminal A and production choice A ::= α
1. For each token a in First( α ), add A ::= α to the entry M[ A, a ].
2. If ε is in First( α ), for each element a of Follow( A ) (a token or $), add A ::= α to the entry M[ A, a ].
Push $ and the start symbol onto the top of the parsing stack
While the top of the parsing stack != $ and the next token != $
If the top of the parsing stack is terminal a and the next token = a then
Pop the top symbol from parsing stack and advance the input
Else if the top of the parsing stack is nonterminal A and parse table entry M[ A, a ] contains production A ::= X1 X2 ... Xn then
Pop the top symbol from the parsing stack
For i = n downto 1 do
Push Xi onto the parsing stack
Else
Display an error message and stop
If the top of the parsing stack = $ and next token = $ then
Parsing is complete
Else
Display an error message and stop