Inevitably when you write Bison/Yacc grammars you will run into a grammatical errors. In this case the grammar was my LBNF grammar (Fortran.cf, Fortran.y, Fortran.l) for Fortran and I’m running the terminal.for module from the old Galaxy program through my ‘go’ script (current files: debugging.7z). Here’s the terminal output from my script run. Interspersed with the output from my echo statements are the outputs from the various programs run (including Bison/Yacc):
--- Arguments ---
fpgm='terminal.for'
flag='-d'
extn='for'
fn='terminal'
--- Clean up ---
--- Compiling LBNF grammar with BNFC ---
148 rules accepted
no change to file ./Absyn.h
no change to file ./Absyn.c
writing file ./Fortran.l (saving old file as ./Fortran.l.bak)
writing file ./Fortran.y (saving old file as ./Fortran.y.bak)
writing file ./Parser.h (saving old file as ./Parser.h.bak)
no change to file ./Skeleton.h
no change to file ./Skeleton.c
no change to file ./Printer.h
no change to file ./Printer.c
no change to file ./Test.c
writing file ./Makefile (saving old file as ./Makefile.bak)
--- Cleaning symbols ---
--- Turning on Debug in Makefile ---
--- Makefile ---
5c5
< FLEX_OPTS = -PFortran --debug
---
> FLEX_OPTS = -PFortran
8c8
< BISON_OPTS = -t -pFortran --debug -r all -g
---
> BISON_OPTS = -t -pFortran
--- Fixup generated BNF Lexical Analyser ---
--- Show differences: Fortran.l ---
34c34,35
< "\n" { ++yy_mylinenumber; return T_NEWLINE; };
---
> "
> " return T_NEWLINE;
91c92,93
< [ \t\r]+ /* ignore white space. */;
---
> \n ++yy_mylinenumber ;
> [ \t\r\n\f] /* ignore white space. */;
--- Making ---
flex -PFortran --debug -oLexer.c Fortran.l
gcc -g -W -Wall -c Lexer.c
Lexer.c:2053:16: warning: ‘input’ defined but not used [-Wunused-function]
static int input (void)
^~~~~
Lexer.c:2002:17: warning: ‘yyunput’ defined but not used [-Wunused-function]
static void yyunput (int c, char * yy_bp )
^~~~~~~
bison -t -pFortran --debug -r all -g Fortran.y -o Parser.c
Fortran.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
Fortran.y: warning: 12 reduce/reduce conflicts [-Wconflicts-rr]
gcc -g -W -Wall -c Parser.c
gcc -g -W -Wall -c Test.c
Linking TestFortran...
gcc -g -W -Wall Absyn.o Lexer.o Parser.o Printer.o Test.o -o TestFortran
=========================================
--- Cleaning of input Fortran program ---
Counts:
Comments: 45
Continuations: 0
--- Compiling input Fortran program ---
--- Output in TestFortran.out ---
--(end of buffer or a NUL)
--accepting rule at line 86 ("// ESCAPE CHARACTER
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//********* TERMINAL CONTROL ROUTINES **************
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// A TERMINAL WITH CURSOR POSITIONING AND CLEAR SCREEN IS REQUIRED
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// MODIFY GTCHAR, TPOS, AND CLEAR FOR YOUR TERMINAL(S)
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//****************************************************
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// BY WILLIAM WOOD, SEPTEMBER 1980
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// Modified by Stuart Renes, WeCo, July 20th, 1981
")
--accepting rule at line 86 ("// to add Vt52 support and make ADM-3A /FT1 type.
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// Added ADDS 520/580 support on August 14th, 1981 as
")
--accepting rule at line 86 ("// /FT2 type.
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// TPOS - PUT CHARS IN BUF TO POSITION CURSOR AT IROW, ICOL
")
--accepting rule at line 86 ("// WPW 9/19/80
")
--accepting rule at line 91 (" ")
--accepting rule at line 82 ("SUBROUTINE")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("TPOS")
--accepting rule at line 35 ("(")
--accepting rule at line 87 ("IROW")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ICOL")
--accepting rule at line 37 (")")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 61 ("COMMON")
--accepting rule at line 42 ("/")
--accepting rule at line 87 ("CURSOR")
--accepting rule at line 42 ("/")
--accepting rule at line 87 ("TTYPE")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 74 ("INTEGER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("TTYPE")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 57 ("BYTE")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ADMV")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("VT100V")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("vt52v")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("addsv")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 77 ("PARAMETER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ADM3A")
--accepting rule at line 91 (" ")
--accepting rule at line 40 ("=")
--accepting rule at line 91 (" ")
--accepting rule at line 90 ("1")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 77 ("PARAMETER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("VT100")
--accepting rule at line 91 (" ")
--accepting rule at line 40 ("=")
--accepting rule at line 91 (" ")
--accepting rule at line 90 ("2")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("parameter")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("vt52")
error: line 28: syntax error at vt52
alan@Midnight$
The important part for this post are the lines:
Fortran.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
Fortran.y: warning: 12 reduce/reduce conflicts [-Wconflicts-rr]
This tells us there are two types of errors occuring shift/reduce and reduce/reduce which prevents Bison/Yacc from creating a parser. There can be other types of errors, such as gramatical errors, which will only be discovered by thorough testing.
Understanding the errors
To understand exactly what shift/reduce and reduce/reduce errors a compiler course which includes the theory of LR parsers would be handy. However, the quick explanation is that as Bison/Yacc tries to find a complete rule in your grammar it shifts (lexical and rule) tokens onto the stack until it finds the last token in the rule then Bison/Yacc reduces (removes) all the tokens on the stack corresponding to that rule and replaces them with a single token of that rule type. Essentially it is building the AST (Abstract Synatx Tree) representation of the program being compiled.
To solve these types of errors we look at the Parser.output file which contains the output of the run of Bison/Yacc. This file contains a number of sections:
- Error list
- Abbreviated grammar
- Terminal (leaf) nodes in the AST
- Non-Terminal (internal) nodes in the AST
- The generated grammar states.
The Error list just lists all the shift/reduce and reduce/reduce errors that need to be fixed before a working parser is generated. The Abbreviated grammar shows the raw grammar without any adornments that go along with the grammar (like your C-code). This abbreviated grammar is usually easier to read once the parser file becomes filled in.
The Terminal (leaf) node list gives a cross reference to the rules in the Abbreviated grammar where that terminal node appears. The first number in each line, in brackets (), is the token ID of the token (the lexer and parser use this number to communicate the token).
The Non-Terminal node list again has a node ID in brackets() and then lists all the grammar rules that contain that node on either the right-hand-side(output) or the left-hand-side(input).
The Generated grammar states are created corresponding to a NFA (Non-deterministic Finite Atomaton) which is like a DFA (Deterministic FA) but will take into account the Bison/Yacc grammar doesn’t always have a terminal token at each location. For instance in the following grammar line when the parser tries to parse a ‘statement’ it has three (in this case) statements that it can follow. In a DFA it needs a terminal-token (e.g. ‘a’,’b’,’c’, … , ‘(‘, ‘)’ etc to make a decision on which rule to follow. What a NFA does is just follows all the possibilities until it finds a terminal token :
The gist of this is that the generated states in the Parser.output file will look very dissimilar to any one of your rules (usually). They will be, instead, a collection of your rules. Here is the generated state 2 from my grammar:
1 Program: ListLblStm . [$end]
3 ListLblStm: ListLblStm . LblStm T_NEWLINE
4 LblStm: . Labeled_stm
5 | . Simple_stm
6 | . %empty [T_NEWLINE]
7 Labeled_stm: . _INTEGER_ Simple_stm
8 Simple_stm: . T_IMPL Type_Spec Type_Qual T_LPAREN T_NAME T_MINUS T_NAME T_RPAREN
9 | . T_PARM ListNameValue
10 | . T_DIMS ListNameDim
11 | . Type_Spec Type_Qual ListNameDim
12 | . Type_Spec ListNameDim
13 | . T_DATA ListDataSeg
14 | . T_COMM T_DIV T_NAME T_DIV ListName
15 | . T_WRITE T_LPAREN ListAssignName T_RPAREN
16 | . T_WRITE T_LPAREN ListAssignName T_RPAREN ListNameOrArray
17 | . T_FMT T_LPAREN ListFmtSpecs T_RPAREN
18 | . T_READ T_LPAREN ListAssignName T_RPAREN ListNameOrArray
19 | . T_READ T_EQUALS LExp
20 | . T_IF T_LPAREN LExp T_RPAREN IfThenPart
21 | . T_NAME T_EQUALS LExp
22 | . T_NAME T_LPAREN ListLExp T_RPAREN T_EQUALS LExp
23 | . T_CALL T_NAME T_LPAREN ListSpecLExp T_RPAREN
24 | . T_CALL T_NAME
25 | . T_GO T_TO _INTEGER_
26 | . T_OPEN T_LPAREN ListAssignName T_RPAREN
27 | . T_CLOSE T_LPAREN ListAssignName T_RPAREN
28 | . T_DO _INTEGER_ DoRangePart
29 | . T_STOP
30 | . T_STOP T_SQSTR
31 | . T_END
32 | . T_SUBR T_NAME T_LPAREN ListSpecLExp T_RPAREN
33 | . T_SUBR T_NAME
34 | . T_FUNC T_NAME T_LPAREN ListSpecLExp T_RPAREN
35 | . T_FUNC T_NAME
36 | . T_CONT
37 | . T_RTN
38 | . T_EQU T_LPAREN T_NAME T_COMMA NameOrArrRef T_RPAREN
143 Type_Spec: . T_INT
144 | . T_REAL
145 | . T_DBL
146 | . T_CHAR
147 | . T_BYTE
148 | . T_LOGI
T_BYTE shift, and go to state 4
T_CALL shift, and go to state 5
T_CHAR shift, and go to state 6
T_CLOSE shift, and go to state 7
T_COMM shift, and go to state 8
T_CONT shift, and go to state 9
T_DATA shift, and go to state 10
T_DIMS shift, and go to state 11
T_DO shift, and go to state 12
T_DBL shift, and go to state 13
T_END shift, and go to state 14
T_EQU shift, and go to state 15
T_FMT shift, and go to state 16
T_FUNC shift, and go to state 17
T_GO shift, and go to state 18
T_IF shift, and go to state 19
T_IMPL shift, and go to state 20
T_INT shift, and go to state 21
T_LOGI shift, and go to state 22
T_OPEN shift, and go to state 23
T_PARM shift, and go to state 24
T_READ shift, and go to state 25
T_REAL shift, and go to state 26
T_RTN shift, and go to state 27
T_STOP shift, and go to state 28
T_SUBR shift, and go to state 29
T_WRITE shift, and go to state 30
T_NAME shift, and go to state 31
_INTEGER_ shift, and go to state 32
T_NEWLINE reduce using rule 6 (LblStm)
$default reduce using rule 1 (Program)
LblStm go to state 33
Labeled_stm go to state 34
Simple_stm go to state 35
Type_Spec go to state 36
In the numbered lines at the top corresponding to our grammar the period ‘.’ corresponds to the location in all those rules that is the current position of parsing. Thus in lines 8-32 we are in the Simple_stm: part of our grammar waiting to get the first token of a new statement. If we get a T_WRITE token then two rules (15,16) will correspond. If we then go down to the next section which shows what we do when we get various tokens we see that a T_WRITE leads us to state 30 where we will process the next tokens from a write statement (e.g “WRITE(1, 13) (ISCORE(I), I = 1, 8)” ).
Shift/Reduce and Reduce/Reduce errors
So now lets look at the two types of errors shift/reduce and reduce/reduce errors in my program and go through the debugging process.
The shift/reduce error then is when Bison/Yacc can’t decide whether to shift the next token onto the stack (for instance when we are in the middle of a statement) or remove a set of tokens and replace them with a single token (like when a statement has been completely parsed). Here is the top of my Parser.output file:
_ERROR_
State 174 conflicts: 1 shift/reduce
State 178 conflicts: 3 reduce/reduce
State 226 conflicts: 3 reduce/reduce
State 227 conflicts: 3 reduce/reduce
State 228 conflicts: 1 shift/reduce, 3 reduce/reduce
Error “State 174 conflicts: 1 shift/reduce”
First lets look at the “State 174 conflicts: 1 shift/reduce” error. Going down to “State 174” in this file we find:
90 LExp: . LExp T_OR LExp2
91 | . LExp T_AND LExp2
92 | . LExp2
93 LExp2: . LExp2 T_EQ LExp3
94 | . LExp2 T_NE LExp3
95 | . LExp3
96 LExp3: . LExp3 T_LT LExp4
97 | . LExp3 T_GT LExp4
98 | . LExp3 T_LE LExp4
99 | . LExp3 T_GE LExp4
100 | . LExp4
101 LExp4: . LExp4 T_PLUS LExp5
102 | . LExp4 T_MINUS LExp5
103 | . LExp5
104 LExp5: . LExp5 T_MULT LExp6
105 | . LExp5 T_DIV LExp6
106 | . LExp6
107 LExp6: . Unary_operator LExp7
108 | . LExp7
109 LExp8: . LExp5 T_POW LExp8
110 | . LExp8 T_LPAREN T_RPAREN
110 | LExp8 T_LPAREN . T_RPAREN
111 | . LExp8 T_LPAREN ListSpecLExp T_RPAREN
111 | LExp8 T_LPAREN . ListSpecLExp T_RPAREN
112 | . LExp9
113 LExp9: . TIntVar RangePart
114 | . T_SQSTR
115 | . LExp10
118 TIntVar: . _INTEGER_
119 | . T_TRUE
120 | . T_FALSE
121 | . T_NAME
122 | . T_READ
125 LExp7: . LExp8
126 LExp10: . LExp11
127 LExp11: . T_LPAREN LExp T_RPAREN
128 Unary_operator: . T_PLUS
129 | . T_MINUS
130 | . T_NOT
131 ListSpecLExp: . SpecLExp
132 | . SpecLExp T_COMMA ListSpecLExp
133 SpecLExp: . %empty [T_RPAREN, T_COMMA]
134 | . LExp
T_LPAREN shift, and go to state 93
T_MINUS shift, and go to state 94
T_RPAREN shift, and go to state 229
T_PLUS shift, and go to state 95
T_TRUE shift, and go to state 96
T_FALSE shift, and go to state 97
T_NOT shift, and go to state 98
T_READ shift, and go to state 99
T_NAME shift, and go to state 100
T_SQSTR shift, and go to state 101
_INTEGER_ shift, and go to state 102
T_RPAREN [reduce using rule 133 (SpecLExp)]
$default reduce using rule 133 (SpecLExp)
LExp go to state 129
LExp2 go to state 104
LExp3 go to state 105
LExp4 go to state 106
LExp5 go to state 107
LExp6 go to state 108
LExp8 go to state 109
LExp9 go to state 110
TIntVar go to state 111
LExp7 go to state 112
LExp10 go to state 113
LExp11 go to state 114
Unary_operator go to state 115
ListSpecLExp go to state 230
SpecLExp go to state 131
The second section were Bison/Yacc show the SHIFTs and REDUCEs it wants to do you can see two lines that define the problem:
...
T_RPAREN [reduce using rule 133 (SpecLExp)]
The parser wants to both SHIFT and REDUCE at this state when it gets a T_RPAREN or ‘)’. So now we have to go through our grammar (at the top of this state) to determine where we see a period ‘.’ followed directly by a T_RPAREN. There are two places (grammar lines 110 and 133). In line 110 :
133 SpecLExp: . %empty [T_RPAREN, T_COMMA]
This is a good time to go back to our original grammar (Fortran.cf) and look a the code corresponding to these two rules:
Efunk. LExp8 ::= LExp8 "(" ")";
Efunkpar. LExp8 ::= LExp8 "(" [SpecLExp] ")";
...
(:[]). [SpecLExp] ::= SpecLExp ;
(:). [SpecLExp] ::= SpecLExp "," [SpecLExp];
SpLExpNil. SpecLExp ::= ;
SpLExpNot. SpecLExp ::= LExp;
You will notice that the Efunk. and Efunkpar. rules are almost identical except the second one has a list of SpecLExp tokens. Going further down the grammar file we notice that the SpecLExp can be either NILL (SpLExpNil.) or an LExp (SpLExpNot.). This essentially means there is two ways to have an empty function call in this grammar so the parser doesn’t know if it should shift a “SpecLExp” token onto the stack or to reduce the LExp8 “(” “)” token sequence into a LExp8 token.
The solution, that I decided upon (there may be many ways to change the grammar to fix this problem) was to eliminate the 110 rule (Ffunk.). So I did that and recompiled and the “State 174” error disappeared (NOTE: The state names may change over compiles).
Error “State 178 conflicts: 3 reduce/reduce”
The solution to reduce/reduce errors is similar. In this case state 178 shows us:
107 LExp6: Unary_operator LExp7 . [T_NEWLINE, T_MINUS, T_RPAREN, T_MULT, T_COMMA, T_PLUS, T_DIV, T_OR, T_AND, T_EQ, T_NE, T_LT, T_GT, T_LE, T_GE, T_POW]
108 | LExp7 . [T_MULT, T_DIV, T_POW]
T_MULT reduce using rule 107 (LExp6)
T_MULT [reduce using rule 108 (LExp6)]
T_DIV reduce using rule 107 (LExp6)
T_DIV [reduce using rule 108 (LExp6)]
T_POW reduce using rule 107 (LExp6)
T_POW [reduce using rule 108 (LExp6)]
$default reduce using rule 107 (LExp6)
You will note here that the terminals T_MULT, T_DIV, and T_POW are all mentioned on two REDUCE lines each. These are the 3 reduce/reduce errors for this state. Looking at the BNFC grammar file (Fortran.cf) again we see what the problem is:
Eland. LExp ::= LExp ".AND." LExp2;
Eeq. LExp2 ::= LExp2 ".EQ." LExp3;
Eneq. LExp2 ::= LExp2 ".NE." LExp3;
Elthen. LExp3 ::= LExp3 ".LT." LExp4;
Egrthen. LExp3 ::= LExp3 ".GT." LExp4;
Ele. LExp3 ::= LExp3 ".LE." LExp4;
Ege. LExp3 ::= LExp3 ".GE." LExp4;
Eplus. LExp4 ::= LExp4 "+" LExp5;
Eminus. LExp4 ::= LExp4 "-" LExp5;
Etimes. LExp5 ::= LExp5 "*" LExp6;
Ediv. LExp5 ::= LExp5 "/" LExp6;
Epreop. LExp6 ::= Unary_operator LExp7;
Epower. LExp8 ::= LExp5 "**" LExp8;
Efunkpar. LExp8 ::= LExp8 "(" [SpecLExp] ")";
Evar. LExp9 ::= TIntVar RangePart ;
Estr. LExp9 ::= SQString ;
ERangeNull. RangePart ::= ;
ERange. RangePart ::= ":" TIntVar ;
ETInt. TIntVar ::= Integer;
ETTrue. TIntVar ::= ".TRUE.";
ETFalse. TIntVar ::= ".FALSE.";
ETNameVar. TIntVar ::= Name;
ETRead. TIntVar ::= "READ";
(:[]). [LExp] ::= LExp ;
(:). [LExp] ::= LExp "," [LExp];
_. LExp ::= LExp2 ;
_. LExp2 ::= LExp3 ;
_. LExp3 ::= LExp4 ;
_. LExp4 ::= LExp5 ;
_. LExp5 ::= LExp6 ;
_. LExp6 ::= LExp7 ;
_. LExp7 ::= LExp8 ;
_. LExp8 ::= LExp9 ;
_. LExp9 ::= LExp10 ;
_. LExp10 ::= LExp11 ;
_. LExp11 ::= "(" LExp ")" ;
OUnaryPlus. Unary_operator ::= "+" ;
OUnaryMinus. Unary_operator ::= "-" ;
OUnaryNot. Unary_operator ::= ".NOT." ;
The chunk at the top of this listing shows the problem (the unary operators ‘+’, ‘-‘ and ‘.NOT.’ are the simple reason that this causes a problem in the lines “LExp? ::= LExp5 {*|/|** } … ” where Bison/Yacc doesn’t know whether to reduce as a Epreop. token or as a LExp7 token. The overarching problem is not the unary operator it is me being lazy about thinking how logical and arithmetic expressions should be expressed in the grammar … instead I just mushed everything together.
What this version of the grammar will do is create another type of error (for instance a coding error where logical expression like “.NOT. 6” is valid. This type of error would have to be detected with tests. For now I will just remove the unary operators and will fix the Logical/Arithmetic grammar error later.
Recompiling we see that removing the unary operator does fix the reduce/reduce error but also flags a number of tokens as unused:
Unary_operator
Terminals unused in grammar
_ERROR_
T_NOT
Rules useless in grammar
143 Unary_operator: T_PLUS
144 | T_MINUS
145 | T_NOT
The rest of the error correction process is similar. In fact, looking at the remaining reduce/reduce and shift/reduce errors points to the same chunk of grammar. Therefore it appears my next job is to correct my grammar laziness.
So until I fix my grammar…