Debugging Bison/Yacc Grammars

Inevitably when you write Bison/Yacc grammars you will run into a grammatical errors. In this case the grammar was my LBNF grammar (Fortran.cf, Fortran.y, Fortran.l) for Fortran and I'm running the terminal.for module from the old Galaxy program through my 'go' script (current files: debugging.7z). Here's the terminal output from my script run. Interspersed with the output from my echo statements are the outputs from the various programs run (including Bison/Yacc):

alan@Midnight$ ./go terminal.for  -d
--- Arguments ---
fpgm='terminal.for'
flag='-d'
extn='for'
fn='terminal'
--- Clean up ---
--- Compiling LBNF grammar with BNFC ---

148 rules accepted

no change to file ./Absyn.h
no change to file ./Absyn.c
writing file ./Fortran.l (saving old file as ./Fortran.l.bak)
writing file ./Fortran.y (saving old file as ./Fortran.y.bak)
writing file ./Parser.h (saving old file as ./Parser.h.bak)
no change to file ./Skeleton.h
no change to file ./Skeleton.c
no change to file ./Printer.h
no change to file ./Printer.c
no change to file ./Test.c
writing file ./Makefile (saving old file as ./Makefile.bak)
--- Cleaning symbols ---
--- Turning on Debug in Makefile ---
--- Makefile ---
    5c5
    < FLEX_OPTS = -PFortran --debug
    ---
    > FLEX_OPTS = -PFortran
    8c8
    < BISON_OPTS = -t -pFortran --debug -r all -g
    ---
    > BISON_OPTS = -t -pFortran
--- Fixup generated BNF Lexical Analyser ---
--- Show differences: Fortran.l ---
    34c34,35
    < "\n"       { ++yy_mylinenumber; return T_NEWLINE; };
    ---
    > "
    > "        return T_NEWLINE;
    91c92,93
    < [ \t\r]+        /* ignore white space. */;
    ---
    > \n ++yy_mylinenumber ;
    > [ \t\r\n\f]        /* ignore white space. */;
--- Making ---
flex -PFortran --debug -oLexer.c Fortran.l
gcc -g -W -Wall -c Lexer.c
Lexer.c:2053:16: warning: ‘input’ defined but not used [-Wunused-function]
     static int input  (void)
                ^~~~~
Lexer.c:2002:17: warning: ‘yyunput’ defined but not used [-Wunused-function]
     static void yyunput (int c, char * yy_bp )
                 ^~~~~~~
bison -t -pFortran --debug -r all -g Fortran.y -o Parser.c
Fortran.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
Fortran.y: warning: 12 reduce/reduce conflicts [-Wconflicts-rr]
gcc -g -W -Wall -c Parser.c
gcc -g -W -Wall -c Test.c
Linking TestFortran...
gcc -g -W -Wall Absyn.o Lexer.o Parser.o Printer.o Test.o -o TestFortran
=========================================
--- Cleaning of input Fortran program ---
Counts:
  Comments:         45
  Continuations:     0
--- Compiling input Fortran program ---
--- Output in TestFortran.out ---
--(end of buffer or a NUL)
--accepting rule at line 86 ("// ESCAPE CHARACTER
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//*********  TERMINAL CONTROL ROUTINES  **************
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//  A TERMINAL WITH CURSOR POSITIONING AND CLEAR SCREEN IS REQUIRED
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//  MODIFY GTCHAR, TPOS, AND CLEAR FOR YOUR TERMINAL(S)
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//****************************************************
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//  BY WILLIAM WOOD, SEPTEMBER 1980
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// Modified by Stuart Renes, WeCo, July 20th, 1981
")
--accepting rule at line 86 ("// to add Vt52 support and make ADM-3A /FT1 type.
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("// Added ADDS 520/580 support on August 14th, 1981 as
")
--accepting rule at line 86 ("// /FT2 type.
")
--accepting rule at line 86 ("//
")
--accepting rule at line 86 ("//   TPOS - PUT CHARS IN BUF TO POSITION CURSOR AT IROW, ICOL
")
--accepting rule at line 86 ("// WPW 9/19/80
")
--accepting rule at line 91 ("      ")
--accepting rule at line 82 ("SUBROUTINE")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("TPOS")
--accepting rule at line 35 ("(")
--accepting rule at line 87 ("IROW")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ICOL")
--accepting rule at line 37 (")")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 ("      ")
--accepting rule at line 61 ("COMMON")
--accepting rule at line 42 ("/")
--accepting rule at line 87 ("CURSOR")
--accepting rule at line 42 ("/")
--accepting rule at line 87 ("TTYPE")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 ("      ")
--accepting rule at line 74 ("INTEGER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("TTYPE")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 ("      ")
--accepting rule at line 57 ("BYTE")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ADMV")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("VT100V")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("vt52v")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
--accepting rule at line 39 (",")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("addsv")
--accepting rule at line 35 ("(")
--accepting rule at line 90 ("2")
--accepting rule at line 37 (")")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 ("      ")
--accepting rule at line 77 ("PARAMETER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("ADM3A")
--accepting rule at line 91 (" ")
--accepting rule at line 40 ("=")
--accepting rule at line 91 (" ")
--accepting rule at line 90 ("1")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 ("      ")
--accepting rule at line 77 ("PARAMETER")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("VT100")
--accepting rule at line 91 (" ")
--accepting rule at line 40 ("=")
--accepting rule at line 91 (" ")
--accepting rule at line 90 ("2")
")accepting rule at line 91 ("
--accepting rule at line 34 ("
")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("parameter")
--accepting rule at line 91 (" ")
--accepting rule at line 87 ("vt52")
error: line 28: syntax error at vt52
alan@Midnight$

The important part for this post are the lines:

bison -t -pFortran --debug -r all -g Fortran.y -o Parser.c
Fortran.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
Fortran.y: warning: 12 reduce/reduce conflicts [-Wconflicts-rr]

This tells us there are two types of errors occuring shift/reduce and reduce/reduce which prevents Bison/Yacc from creating a parser. There can be other types of errors, such as gramatical errors, which will only be discovered by thorough testing.

Understanding the errors

To understand exactly what shift/reduce and reduce/reduce errors a compiler course which includes the theory of LR parsers would be handy. However, the quick explanation is that as Bison/Yacc tries to find a complete rule in your grammar it shifts (lexical and rule) tokens onto the stack until it finds the last token in the rule then Bison/Yacc reduces (removes) all the tokens on the stack corresponding to that rule and replaces them with a single token of that rule type. Essentially it is building the AST (Abstract Synatx Tree) representation of the program being compiled.

To solve these types of errors we look at the Parser.output file which contains the output of the run of Bison/Yacc. This file contains a number of sections:

  • Error list
  • Abbreviated grammar
  • Terminal (leaf) nodes in the AST
  • Non-Terminal (internal) nodes in the AST
  • The generated grammar states.

The Error list just lists all the shift/reduce and reduce/reduce errors that need to be fixed before a working parser is generated. The Abbreviated grammar shows the raw grammar without any adornments that go along with the grammar (like your C-code). This abbreviated grammar is usually easier to read once the parser file becomes filled in.

The Terminal (leaf) node list gives a cross reference to the rules in the Abbreviated grammar where that terminal node appears. The first number in each line, in brackets (), is the token ID of the token (the lexer and parser use this number to communicate the token).

The Non-Terminal node list again has a node ID in brackets() and then lists all the grammar rules that contain that node on either the right-hand-side(output) or the left-hand-side(input).

The Generated grammar states are created corresponding to a NFA (Non-deterministic Finite Atomaton) which is like a DFA (Deterministic FA) but will take into account the Bison/Yacc grammar doesn't always have a terminal token at each location. For instance in the following grammar line when the parser tries to parse a 'statement' it has three (in this case) statements that it can follow. In a DFA it needs a terminal-token (e.g. 'a','b','c', ... , '(', ')' etc to make a decision on which rule to follow. What a NFA does is just follows all the possibilities until it finds a terminal token :

statement := dimension_statement | for_statement | assignment_statement

The gist of this is that the generated states in the Parser.output file will look very dissimilar to any one of your rules (usually). They will be, instead, a collection of your rules. Here is the generated state 2 from my grammar:

State 2

    1 Program: ListLblStm .  [$end]
    3 ListLblStm: ListLblStm . LblStm T_NEWLINE
    4 LblStm: . Labeled_stm
    5       | . Simple_stm
    6       | . %empty  [T_NEWLINE]
    7 Labeled_stm: . _INTEGER_ Simple_stm
    8 Simple_stm: . T_IMPL Type_Spec Type_Qual T_LPAREN T_NAME T_MINUS T_NAME T_RPAREN
    9           | . T_PARM ListNameValue
   10           | . T_DIMS ListNameDim
   11           | . Type_Spec Type_Qual ListNameDim
   12           | . Type_Spec ListNameDim
   13           | . T_DATA ListDataSeg
   14           | . T_COMM T_DIV T_NAME T_DIV ListName
   15           | . T_WRITE T_LPAREN ListAssignName T_RPAREN
   16           | . T_WRITE T_LPAREN ListAssignName T_RPAREN ListNameOrArray
   17           | . T_FMT T_LPAREN ListFmtSpecs T_RPAREN
   18           | . T_READ T_LPAREN ListAssignName T_RPAREN ListNameOrArray
   19           | . T_READ T_EQUALS LExp
   20           | . T_IF T_LPAREN LExp T_RPAREN IfThenPart
   21           | . T_NAME T_EQUALS LExp
   22           | . T_NAME T_LPAREN ListLExp T_RPAREN T_EQUALS LExp
   23           | . T_CALL T_NAME T_LPAREN ListSpecLExp T_RPAREN
   24           | . T_CALL T_NAME
   25           | . T_GO T_TO _INTEGER_
   26           | . T_OPEN T_LPAREN ListAssignName T_RPAREN
   27           | . T_CLOSE T_LPAREN ListAssignName T_RPAREN
   28           | . T_DO _INTEGER_ DoRangePart
   29           | . T_STOP
   30           | . T_STOP T_SQSTR
   31           | . T_END
   32           | . T_SUBR T_NAME T_LPAREN ListSpecLExp T_RPAREN
   33           | . T_SUBR T_NAME
   34           | . T_FUNC T_NAME T_LPAREN ListSpecLExp T_RPAREN
   35           | . T_FUNC T_NAME
   36           | . T_CONT
   37           | . T_RTN
   38           | . T_EQU T_LPAREN T_NAME T_COMMA NameOrArrRef T_RPAREN
  143 Type_Spec: . T_INT
  144          | . T_REAL
  145          | . T_DBL
  146          | . T_CHAR
  147          | . T_BYTE
  148          | . T_LOGI

    T_BYTE     shift, and go to state 4
    T_CALL     shift, and go to state 5
    T_CHAR     shift, and go to state 6
    T_CLOSE    shift, and go to state 7
    T_COMM     shift, and go to state 8
    T_CONT     shift, and go to state 9
    T_DATA     shift, and go to state 10
    T_DIMS     shift, and go to state 11
    T_DO       shift, and go to state 12
    T_DBL      shift, and go to state 13
    T_END      shift, and go to state 14
    T_EQU      shift, and go to state 15
    T_FMT      shift, and go to state 16
    T_FUNC     shift, and go to state 17
    T_GO       shift, and go to state 18
    T_IF       shift, and go to state 19
    T_IMPL     shift, and go to state 20
    T_INT      shift, and go to state 21
    T_LOGI     shift, and go to state 22
    T_OPEN     shift, and go to state 23
    T_PARM     shift, and go to state 24
    T_READ     shift, and go to state 25
    T_REAL     shift, and go to state 26
    T_RTN      shift, and go to state 27
    T_STOP     shift, and go to state 28
    T_SUBR     shift, and go to state 29
    T_WRITE    shift, and go to state 30
    T_NAME     shift, and go to state 31
    _INTEGER_  shift, and go to state 32

    T_NEWLINE  reduce using rule 6 (LblStm)
    $default   reduce using rule 1 (Program)

    LblStm       go to state 33
    Labeled_stm  go to state 34
    Simple_stm   go to state 35
    Type_Spec    go to state 36

In the numbered lines at the top corresponding to our grammar the period '.' corresponds to the location in all those rules that is the current position of parsing. Thus in lines 8-32 we are in the Simple_stm: part of our grammar waiting to get the first token of a new statement. If we get a T_WRITE token then two rules (15,16) will correspond. If we then go down to the next section which shows what we do when we get various tokens we see that a T_WRITE leads us to state 30 where we will process the next tokens from a write statement (e.g "WRITE(1, 13) (ISCORE(I), I = 1, 8)" ).

Shift/Reduce and Reduce/Reduce errors

So now lets look at the two types of errors shift/reduce and reduce/reduce errors in my program and go through the debugging process.

The shift/reduce error then is when Bison/Yacc can't decide whether to shift the next token onto the stack (for instance when we are in the middle of a statement) or remove a set of tokens and replace them with a single token (like when a statement has been completely parsed). Here is the top of my Parser.output file:

Terminals unused in grammar

   _ERROR_


State 174 conflicts: 1 shift/reduce
State 178 conflicts: 3 reduce/reduce
State 226 conflicts: 3 reduce/reduce
State 227 conflicts: 3 reduce/reduce
State 228 conflicts: 1 shift/reduce, 3 reduce/reduce

Error "State 174 conflicts: 1 shift/reduce"

First lets look at the "State 174 conflicts: 1 shift/reduce" error. Going down to "State 174" in this file we find:

State 174

   90 LExp: . LExp T_OR LExp2
   91     | . LExp T_AND LExp2
   92     | . LExp2
   93 LExp2: . LExp2 T_EQ LExp3
   94      | . LExp2 T_NE LExp3
   95      | . LExp3
   96 LExp3: . LExp3 T_LT LExp4
   97      | . LExp3 T_GT LExp4
   98      | . LExp3 T_LE LExp4
   99      | . LExp3 T_GE LExp4
  100      | . LExp4
  101 LExp4: . LExp4 T_PLUS LExp5
  102      | . LExp4 T_MINUS LExp5
  103      | . LExp5
  104 LExp5: . LExp5 T_MULT LExp6
  105      | . LExp5 T_DIV LExp6
  106      | . LExp6
  107 LExp6: . Unary_operator LExp7
  108      | . LExp7
  109 LExp8: . LExp5 T_POW LExp8
  110      | . LExp8 T_LPAREN T_RPAREN
  110      | LExp8 T_LPAREN . T_RPAREN
  111      | . LExp8 T_LPAREN ListSpecLExp T_RPAREN
  111      | LExp8 T_LPAREN . ListSpecLExp T_RPAREN
  112      | . LExp9
  113 LExp9: . TIntVar RangePart
  114      | . T_SQSTR
  115      | . LExp10
  118 TIntVar: . _INTEGER_
  119        | . T_TRUE
  120        | . T_FALSE
  121        | . T_NAME
  122        | . T_READ
  125 LExp7: . LExp8
  126 LExp10: . LExp11
  127 LExp11: . T_LPAREN LExp T_RPAREN
  128 Unary_operator: . T_PLUS
  129               | . T_MINUS
  130               | . T_NOT
  131 ListSpecLExp: . SpecLExp
  132             | . SpecLExp T_COMMA ListSpecLExp
  133 SpecLExp: . %empty  [T_RPAREN, T_COMMA]
  134         | . LExp

    T_LPAREN   shift, and go to state 93
    T_MINUS    shift, and go to state 94
    T_RPAREN   shift, and go to state 229
    T_PLUS     shift, and go to state 95
    T_TRUE     shift, and go to state 96
    T_FALSE    shift, and go to state 97
    T_NOT      shift, and go to state 98
    T_READ     shift, and go to state 99
    T_NAME     shift, and go to state 100
    T_SQSTR    shift, and go to state 101
    _INTEGER_  shift, and go to state 102

    T_RPAREN  [reduce using rule 133 (SpecLExp)]
    $default  reduce using rule 133 (SpecLExp)

    LExp            go to state 129
    LExp2           go to state 104
    LExp3           go to state 105
    LExp4           go to state 106
    LExp5           go to state 107
    LExp6           go to state 108
    LExp8           go to state 109
    LExp9           go to state 110
    TIntVar         go to state 111
    LExp7           go to state 112
    LExp10          go to state 113
    LExp11          go to state 114
    Unary_operator  go to state 115
    ListSpecLExp    go to state 230
    SpecLExp        go to state 131

The second section were Bison/Yacc show the SHIFTs and REDUCEs it wants to do you can see two lines that define the problem:

    T_RPAREN   shift, and go to state 229
              ...
    T_RPAREN  [reduce using rule 133 (SpecLExp)]

The parser wants to both SHIFT and REDUCE at this state when it gets a T_RPAREN or ')'. So now we have to go through our grammar (at the top of this state) to determine where we see a period '.' followed directly by a T_RPAREN. There are two places (grammar lines 110 and 133). In line 110 :

  110      | LExp8 T_LPAREN . T_RPAREN
  133 SpecLExp: . %empty  [T_RPAREN, T_COMMA]

This is a good time to go back to our original grammar (Fortran.cf) and look a the code corresponding to these two rules:

Epower.      LExp8 ::= LExp5 "**" LExp8;
Efunk.       LExp8 ::= LExp8 "(" ")";
Efunkpar.    LExp8 ::= LExp8 "(" [SpecLExp] ")";
...
(:[]).   [SpecLExp] ::= SpecLExp ;
(:).     [SpecLExp] ::= SpecLExp "," [SpecLExp];

SpLExpNil. SpecLExp ::= ;
SpLExpNot. SpecLExp ::= LExp;

You will notice that the Efunk. and Efunkpar. rules are almost identical except the second one has a list of SpecLExp tokens. Going further down the grammar file we notice that the SpecLExp can be either NILL (SpLExpNil.) or an LExp (SpLExpNot.). This essentially means there is two ways to have an empty function call in this grammar so the parser doesn't know if it should shift a "SpecLExp" token onto the stack or to reduce the LExp8 "(" ")" token sequence into a LExp8 token.

The solution, that I decided upon (there may be many ways to change the grammar to fix this problem) was to eliminate the 110 rule (Ffunk.). So I did that and recompiled and the "State 174" error disappeared (NOTE: The state names may change over compiles).

Error "State 178 conflicts: 3 reduce/reduce"

The solution to reduce/reduce errors is similar. In this case state 178 shows us:

State 178

  107 LExp6: Unary_operator LExp7 .  [T_NEWLINE, T_MINUS, T_RPAREN, T_MULT, T_COMMA, T_PLUS, T_DIV, T_OR, T_AND, T_EQ, T_NE, T_LT, T_GT, T_LE, T_GE, T_POW]
  108      | LExp7 .  [T_MULT, T_DIV, T_POW]

    T_MULT    reduce using rule 107 (LExp6)
    T_MULT    [reduce using rule 108 (LExp6)]
    T_DIV     reduce using rule 107 (LExp6)
    T_DIV     [reduce using rule 108 (LExp6)]
    T_POW     reduce using rule 107 (LExp6)
    T_POW     [reduce using rule 108 (LExp6)]
    $default  reduce using rule 107 (LExp6)

You will note here that the terminals T_MULT, T_DIV, and T_POW are all mentioned on two REDUCE lines each. These are the 3 reduce/reduce errors for this state. Looking at the BNFC grammar file (Fortran.cf) again we see what the problem is:

Elor.        LExp  ::= LExp ".OR." LExp2;
Eland.       LExp  ::= LExp ".AND." LExp2;
Eeq.         LExp2 ::= LExp2 ".EQ." LExp3;
Eneq.        LExp2 ::= LExp2 ".NE." LExp3;
Elthen.      LExp3 ::= LExp3 ".LT." LExp4;
Egrthen.     LExp3 ::= LExp3 ".GT." LExp4;
Ele.         LExp3 ::= LExp3 ".LE." LExp4;
Ege.         LExp3 ::= LExp3 ".GE." LExp4;
Eplus.       LExp4 ::= LExp4 "+" LExp5;
Eminus.      LExp4 ::= LExp4 "-" LExp5;
Etimes.      LExp5 ::= LExp5 "*" LExp6;
Ediv.        LExp5 ::= LExp5 "/" LExp6;
Epreop.      LExp6 ::= Unary_operator LExp7;
Epower.      LExp8 ::= LExp5 "**" LExp8;
Efunkpar.    LExp8 ::= LExp8 "(" [SpecLExp] ")";
Evar.        LExp9 ::= TIntVar RangePart ;
Estr.        LExp9 ::= SQString ;

ERangeNull. RangePart ::= ;
ERange.     RangePart ::= ":" TIntVar ;

ETInt.       TIntVar ::= Integer;
ETTrue.      TIntVar ::= ".TRUE.";
ETFalse.     TIntVar ::= ".FALSE.";
ETNameVar.   TIntVar ::= Name;
ETRead.      TIntVar ::= "READ";

(:[]).   [LExp] ::= LExp ;
(:).     [LExp] ::= LExp "," [LExp];

_. LExp   ::= LExp2 ;
_. LExp2  ::= LExp3 ;
_. LExp3  ::= LExp4 ;
_. LExp4  ::= LExp5 ;
_. LExp5  ::= LExp6 ;
_. LExp6  ::= LExp7 ;
_. LExp7  ::= LExp8 ;
_. LExp8  ::= LExp9 ;
_. LExp9  ::= LExp10 ;
_. LExp10 ::= LExp11 ;
_. LExp11 ::= "(" LExp ")" ;

OUnaryPlus.   Unary_operator ::= "+" ;
OUnaryMinus.  Unary_operator ::= "-" ;
OUnaryNot.    Unary_operator ::= ".NOT." ;

The chunk at the top of this listing shows the problem (the unary operators '+', '-' and '.NOT.' are the simple reason that this causes a problem in the lines "LExp? ::= LExp5 {*|/|** } ... " where Bison/Yacc doesn't know whether to reduce as a Epreop. token or as a LExp7 token. The overarching problem is not the unary operator it is me being lazy about thinking how logical and arithmetic expressions should be expressed in the grammar ... instead I just mushed everything together.

What this version of the grammar will do is create another type of error (for instance a coding error where logical expression like ".NOT. 6" is valid. This type of error would have to be detected with tests. For now I will just remove the unary operators and will fix the Logical/Arithmetic grammar error later.

Recompiling we see that removing the unary operator does fix the reduce/reduce error but also flags a number of tokens as unused:

Nonterminals useless in grammar

   Unary_operator

Terminals unused in grammar

   _ERROR_
   T_NOT

Rules useless in grammar

  143 Unary_operator: T_PLUS
  144               | T_MINUS
  145               | T_NOT

The rest of the error correction process is similar. In fact, looking at the remaining reduce/reduce and shift/reduce errors points to the same chunk of grammar. Therefore it appears my next job is to correct my grammar laziness.

So until I fix my grammar...

BNFC Quirks

BNFC is a great tool but it has some quirks that have slowed down the process of building a front-end for my fortran2c translator. However, the code generation, especially if you haven't built a compiler before moves you quite a distance forward. Here's what I've found so far...

Position dependent code:

First, in the lexical analyzer (the part that converts from a character stream to a token stream) BNFC is fairly inflexible. In the case of Fortran and other older languages there are position dependent tokens. Some examples come to mind: The comment, the continuation line, label-numbers and the sequence column. These are illustrated in the file segment (Maze.for) below:

C
C   MAZE DESCRIPTION
C
  180   WRITE(6,190) HEIGHT,WIDTH,DEPTH
  190   FORMAT('0',' YOUR MAZE HAS A HEIGHT OF',I5,/,
    1  '             AND A WIDTH OF',I5,/,
    1  '            WITH A DEPTH OF',I5,//,
    2  '  THE DIRECTION COMMANDS FOR MAZE ARE SINGLE LETTERS',/,
    2  '    N(ORTH), U(P),    OR 8 IS UP',/,
    2  '    E(AST) , R(IGHT), OR 6 IS RIGHT',/,
    2  '    S(OUTH), D(OWN),  OR 2 IS DOWN',/,
    2  '    W(EST) , L(EFT),  OR 4 IS LEFT',/,
    2  '    I(N)   ,          OR 9 IS IN TO SCREEN',/,
    2  '    O(UT)  ,          OR 7 IS OUT OF SCREEN',/,

Comments start with a "C" in column one and go to the end-of-line. Continuation lines start at the beginning with a tab or 5 spaces, have a continuation mark (usually 0-9 or '+') then a space and the body of the continuation. Labels are numbers preceded with spaces and ending at column 5.

Unfortunately I don't have a example of code with a sequence column. These were basically 8-digit numbers in columns 73-80. Code including the above continuation/label-number preface to statements was from column 1 to 72 with 73-80 being left over for the sequence number. I believe this sequence number was a holdover from the old punched card days when people would do a 52-card pickup with a program deck (which usually went far beyond 52 cards) and needed a way to sort it back into a program (by machine).

All these things need Flex code that allows for multiple states (i.e. <prefix>, <seqno>, <statement>) which doesn't seem to be a possibility in BNFC.

What I ended up doing was writing a small state-machine program (in C) that converted comments to C-like '//' comments and just joined continuation lines into one long line (since we aren't working on 80-column punch cards anymore). The code below can be compiled with the command (I'm running Ubuntu/linux with the GNU gcc compiler):

g++ -std=c++11 -g fixup.c -o fixup

/*
 * Program to do some preprocessing on a Fortran file to deal with:
 *     "\nC" ==> "\n//"                   -- Comments, and
 *     "[ \t]*\n     [0-9+][ \t]*" ==> "" -- Continuation lines
 *     "[ \t]*\n\t[0-9+][ \t]*"    ==> "" -- Continuation lines
 *
 */


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
   
#define BUF_SZ 1000

#define DEBUG false

char buf[BUF_SZ];
int bufidx=-1;
int state=0;

int CntComments=0;
int CntContinue=0;

void save(int chr){
    buf[++bufidx]=(char)chr;

    if(bufidx==BUF_SZ){
        fprintf(stderr,"ERROR: Buffer Overflow\n");
        exit(1);
    }
}

void unsave(){
    buf[bufidx--]=0;

    if(bufidx<-1){
        fprintf(stderr,"ERROR: Buffer Underflow\n");
        exit(2);
    }
}

void reset(){
    memset(buf,0,BUF_SZ);
    bufidx=-1;
    state=0;
}

void purge(){
    printf("%s",buf);
    reset();
}


void asaprintf( const char * format, ... )
{
    va_list args;
    va_start (args, format);
    if(DEBUG) vprintf (format, args);
    va_end (args);
}

void newstate(int ns){
    state=ns;
    //asaprintf("<%d>",state);
}


int main(int argc,char* argv[]){

    int chr=0;
    int idx=0;

    reset();

    state=2; // Start in state 2 because first line in file may be a comment

    while((chr=getchar())!=EOF){
        asaprintf("%6d) state=%d chr='%c'(0x%02x)\n",idx++,state,chr,chr);
        save(chr);
        if(chr==0){
            reset();
        }else{
            switch(state){
                case 0:
                    switch(chr){
                        case ' ': break;
                        case '\t': break;
                        case '\n': newstate(2); break;
                        default: purge(); break;
                    };
                    break;

                case 2:
                    switch(chr){
                        case ' ': newstate(6); break;
                        case 'C': case 'c':
                            unsave(); purge(); printf("//"); CntComments++; break;
                        case '\t': newstate(10); break;
                        default: purge(); break;
                    };
                    break;
           
                case 6:
                    switch(chr){
                        case ' ': newstate(7); break;
                        default: purge(); break;
                    };
                    break;
           
                case 7:
                    switch(chr){
                        case ' ': newstate(8); break;
                        default: purge(); break;
                    };
                    break;
           
                case 8:
                    switch(chr){
                        case ' ': newstate(9); break;
                        default: purge(); break;
                    };
                    break;
           
                case 9:
                    switch(chr){
                        case ' ': newstate(10); break;
                        default: purge(); break;
                    };
                    break;
           
                case 10:
                    switch(chr){
                        case '0':
                        case '1':
                        case '2':
                        case '3':
                        case '4':
                        case '5':
                        case '6':
                        case '7':
                        case '8':
                        case '9':
                        case '+':
                            newstate(11); break;
                        default: purge(); break;
                    };
                    break;
           
                case 11:
                    switch(chr){
                        case ' ': break;
                        case '\t': break;
                        default: reset(); putchar(chr); CntContinue++; break;
                    };
                    break;
           
            }
        }
    }
    printf("\n");

    fprintf(stderr, "Counts:\n");
    fprintf(stderr, "  Comments:      %5d\n",CntComments);
    fprintf(stderr, "  Continuations: %5d\n",CntContinue);
    return(0);
}

The Maze.for program then became (Maze_pp.for):

//
//  MAZE -  USES A VT100 TO WANDER AROUND.
//      THE VT100 MUST HAVE ADVANCED VIDEO OPTION.
//      ANSI VT100 ESCAPE SEQUENCES ARE USED.
//
//  WRITTEN BY DON MCLEAN
//  OF THE MACNEAL-SCHWENDLER CORP.
//
//  THE PURPOSE OF THIS PROGRAM WAS TO
//      1. LEARN SOMETHING ABOUT THE VT100 GRAPHICS.
//      2. KEEP MY KIDS BUSY ON WEEKENDS. WHILE I TRIED
//         TO GET SOMETHING ELSE DONE.
//
//  USE OF THIS PROGRAM FOR ANY PURPOSE OTHER THAN FUN
//  IS PROHIBITED.
//
    IMPLICIT INTEGER*4 (A-Z)
//
//  MAZE DIMENSIONS
//  HMAX AND WMAX SHOULD NOT BE LARGER THAN 22 AND 80 RESP.
//
    PARAMETER HMAX=22, WMAX=80, DMAX=4
//
    DIMENSION SLEEP(2)
//
//  DIMENSION IS HMAX*WMAX*DMAX
    INTEGER*2  EXIT(HMAX*WMAX*DMAX), MAT(HMAX*WMAX*DMAX)
    INTEGER*2  LCOUNT(DMAX)
//
    BYTE CLEAR(2)
//
    CHARACTER*200 INPUT
//
    COMMON /MAZECM/ STARTH,STARTW,STARTD,ENDH,ENDW,ENDD,NOBELL
//
//  CLEAR IS A VT100 RESET
//
    DATA CLEAR / 27, 'c' /
//
//  START - SEE IF AN OLD GAME IS TO BE USED.
//
    WRITE(6,10)
   10   FORMAT(' WELCOME TO MAZE')
//
   20   WRITE(6,30)
   30   FORMAT(' ARE YOU GOING TO PLAY A SAVED GAME? ',$)
    READ(5,40) NC,INPUT
   40   FORMAT(Q,A)
    IF(INDEX(INPUT(1:NC),'Y').NE.0) GO TO 120
    SAVE = 0
//
//  INPUT DIMENSION OF MAZE
//
   50   WRITE(6,60) HMAX
   60   FORMAT(' PLEASE INPUT HEIGHT OF MAZE - DEFAULT = ',I2,' ',$)
    READ(5,40) NC,INPUT
    READ(INPUT,70,ERR=50) HEIGHT
   70   FORMAT(BNI2)
    IF(HEIGHT.EQ.0) HEIGHT=HMAX
    IF(HEIGHT.LT.2) HEIGHT=2
    IF(HEIGHT.GT.HMAX) HEIGHT=HMAX
   80   WRITE(6,90) WMAX
   90   FORMAT(' PLEASE INPUT WIDTH  OF MAZE - DEFAULT = ',I2,' ',$)
    READ(5,40) NC,INPUT
    READ(INPUT,70,ERR=80) WIDTH
    IF(WIDTH.EQ.0) WIDTH = WMAX
    IF(WIDTH.LT.2) WIDTH=2
    IF(WIDTH.GT.WMAX) WIDTH=WMAX
  100   WRITE(6,110)
  110   FORMAT(' PLEASE INPUT DEPTH  OF MAZE - DEFAULT =  1 ',$)
    READ(5,40) NC,INPUT
    READ(INPUT,70,ERR=100) DEPTH
    IF(DEPTH.LE.0) DEPTH = 1
    IF(DEPTH.GT.DMAX) DEPTH = DMAX
    NTERMS = HEIGHT * WIDTH * DEPTH
...

Symbols:

The next problem that encountered were the symbols in the BNFC lexer/parser, _SYMB_43, for example. Yuck! I would have been shot at a code review for that. Here's what the lexer looked like (Fortran.l.bkp):

/* -*- c -*- This FLex file was machine-generated by the BNF converter */
%option noyywrap
%{
#define yylval Fortranlval
#define YY_BUFFER_APPEND Fortran_BUFFER_APPEND
#define YY_BUFFER_RESET Fortran_BUFFER_RESET
#define initialize_lexer Fortran_initialize_lexer
#include <string.h>
#include "Parser.h"
#define YY_BUFFER_LENGTH 4096
extern int yy_mylinenumber ;
char YY_PARSED_STRING[YY_BUFFER_LENGTH];
void YY_BUFFER_APPEND(char *s)
{
  strcat(YY_PARSED_STRING, s); //Do something better here!
}
void YY_BUFFER_RESET(void)
{
  int x;
  for(x = 0; x < YY_BUFFER_LENGTH; x++)
    YY_PARSED_STRING[x] = 0;
}

%}

LETTER [a-zA-Z]
CAPITAL [A-Z]
SMALL [a-z]
DIGIT [0-9]
IDENT [a-zA-Z0-9'_]
%START YYINITIAL COMMENT CHAR CHARESC CHAREND STRING ESCAPED
%%

<YYINITIAL>"
"        return _SYMB_0;
<YYINITIAL>"("           return _SYMB_1;
<YYINITIAL>"-"           return _SYMB_2;
<YYINITIAL>")"           return _SYMB_3;
<YYINITIAL>"*"           return _SYMB_4;
<YYINITIAL>","           return _SYMB_5;
<YYINITIAL>"="           return _SYMB_6;
<YYINITIAL>"+"           return _SYMB_7;
<YYINITIAL>"/"           return _SYMB_8;
<YYINITIAL>"$"           return _SYMB_9;
<YYINITIAL>".OR."        return _SYMB_10;
<YYINITIAL>".AND."           return _SYMB_11;
<YYINITIAL>".EQ."        return _SYMB_12;
<YYINITIAL>".NE."        return _SYMB_13;
<YYINITIAL>".LT."        return _SYMB_14;
<YYINITIAL>".GT."        return _SYMB_15;
<YYINITIAL>".LE."        return _SYMB_16;
<YYINITIAL>".GE."        return _SYMB_17;
<YYINITIAL>"**"          return _SYMB_18;
<YYINITIAL>":"           return _SYMB_19;
<YYINITIAL>".TRUE."          return _SYMB_20;
<YYINITIAL>".FALSE."         return _SYMB_21;
<YYINITIAL>".NOT."           return _SYMB_22;
<YYINITIAL>"BYTE"        return _SYMB_23;
<YYINITIAL>"CALL"        return _SYMB_24;
<YYINITIAL>"CHARACTER"           return _SYMB_25;
<YYINITIAL>"CLOSE"           return _SYMB_26;
<YYINITIAL>"COMMON"          return _SYMB_27;
<YYINITIAL>"CONTINUE"        return _SYMB_28;
<YYINITIAL>"DATA"        return _SYMB_29;
<YYINITIAL>"DIMENSION"           return _SYMB_30;
<YYINITIAL>"DO"          return _SYMB_31;
<YYINITIAL>"DOUBLE"          return _SYMB_32;
<YYINITIAL>"END"         return _SYMB_33;
<YYINITIAL>"EQUIVALENCE"         return _SYMB_34;
<YYINITIAL>"FORMAT"          return _SYMB_35;
<YYINITIAL>"FUNCTION"        return _SYMB_36;
<YYINITIAL>"GO"          return _SYMB_37;
<YYINITIAL>"IF"          return _SYMB_38;
<YYINITIAL>"IMPLICIT"        return _SYMB_39;
<YYINITIAL>"INTEGER"         return _SYMB_40;
<YYINITIAL>"LOGICAL"         return _SYMB_41;
<YYINITIAL>"OPEN"        return _SYMB_42;
<YYINITIAL>"PARAMETER"           return _SYMB_43;
<YYINITIAL>"READ"        return _SYMB_44;
<YYINITIAL>"REAL"        return _SYMB_45;
<YYINITIAL>"RETURN"          return _SYMB_46;
<YYINITIAL>"STOP"        return _SYMB_47;
<YYINITIAL>"SUBROUTINE"          return _SYMB_48;
<YYINITIAL>"TO"          return _SYMB_49;
<YYINITIAL>"WRITE"           return _SYMB_50;

<YYINITIAL>"//"[^\n]*\n     ++yy_mylinenumber;   /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)*        yylval.string_ = strdup(yytext); return _SYMB_51;
<YYINITIAL>'.+'          yylval.string_ = strdup(yytext); return _SYMB_52;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F)          yylval.string_ = strdup(yytext); return _SYMB_53;
<YYINITIAL>{DIGIT}+          yylval.int_ = atoi(yytext); return _INTEGER_;
\n ++yy_mylinenumber ;
<YYINITIAL>[ \t\r\n\f]           /* ignore white space. */;
<YYINITIAL>.         return _ERROR_;
%%
void initialize_lexer(FILE *inp) { yyrestart(inp); BEGIN YYINITIAL; }

The parser(Fortran.y.bkp) was worse ... I wish they had some way of converting these symbols to something more human readable:

%start Program
%%
Program : ListLblStm { $$ = make_Progr(reverseListLblStm($1)); YY_RESULT_Program_= $$; }
;
ListLblStm : /* empty */ { $$ = 0;  }
  | ListLblStm LblStm _SYMB_0 { $$ = make_ListLblStm($2, $1);  }
;
LblStm : Labeled_stm { $$ = make_SLabel($1); YY_RESULT_LblStm_= $$; }
  | Simple_stm { $$ = make_SSimple($1); YY_RESULT_LblStm_= $$; }
  | /* empty */ { $$ = make_SNill(); YY_RESULT_LblStm_= $$; }
;
Labeled_stm : _INTEGER_ Simple_stm { $$ = make_SLabelOne($1, $2);  }
;
Simple_stm : _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3 { $$ = make_SImplicit($2, $3, $5, $7);  }
  | _SYMB_43 ListNameValue { $$ = make_SParameter($2);  }
  | _SYMB_30 ListNameDim { $$ = make_SDiment($2);  }
  | Type_Spec Type_Qual ListNameDim { $$ = make_SDeclQual($1, $2, $3);  }
  | Type_Spec ListNameDim { $$ = make_SDecl($1, $2);  }
  | _SYMB_29 ListDataSeg { $$ = make_SData($2);  }
  | _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName { $$ = make_SCommon($3, $5);  }
  | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SWrtEmp($3);  }
  | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SWrite($3, $5);  }
  | _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3 { $$ = make_SFormat($3);  }
  | _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SRead($3, $5);  }
  | _SYMB_44 _SYMB_6 LExp { $$ = make_SAsignRead($3);  }
  | _SYMB_38 _SYMB_1 LExp _SYMB_3 IfThenPart { $$ = make_SIf($3, $5);  }
  | _SYMB_51 _SYMB_6 LExp { $$ = make_SAssign($1, $3);  }
  | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_SAsnArr($1, $3, $6);  }
  | _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunCall($2, $4);  }
  | _SYMB_24 _SYMB_51 { $$ = make_SFunCallNil($2);  }
  | _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_SGoto($3);  }
  | _SYMB_42 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SOpen($3);  }
  | _SYMB_26 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SClose($3);  }
  | _SYMB_31 _INTEGER_ DoRangePart { $$ = make_SDo($2, $3);  }
  | _SYMB_47 { $$ = make_SStop();  }
  | _SYMB_47 _SYMB_52 { $$ = make_SStopMsg($2);  }
  | _SYMB_33 { $$ = make_SEnd();  }
  | _SYMB_48 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SSubr($2, $4);  }
  | _SYMB_48 _SYMB_51 { $$ = make_SSubrNil($2);  }
  | _SYMB_36 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunct($2, $4);  }
  | _SYMB_36 _SYMB_51 { $$ = make_SFunctNil($2);  }
  | _SYMB_28 { $$ = make_SContinue();  }
  | _SYMB_46 { $$ = make_SReturn();  }
  | _SYMB_34 _SYMB_1 _SYMB_51 _SYMB_5 NameOrArrRef _SYMB_3 { $$ = make_SEquiv($3, $5);  }
;
Type_Qual : _SYMB_4 _INTEGER_ { $$ = make_QType($2);  }
;
ListNameValue : NameValue { $$ = make_ListNameValue($1, 0);  }
  | NameValue _SYMB_5 ListNameValue { $$ = make_ListNameValue($1, $3);  }
;
NameValue : _SYMB_51 _SYMB_6 _INTEGER_ { $$ = make_NVPair($1, $3);  }
;
ListNameDim : NameDim { $$ = make_ListNameDim($1, 0);  }
  | NameDim _SYMB_5 ListNameDim { $$ = make_ListNameDim($1, $3);  }
;
NameDim : _SYMB_51 _SYMB_1 ListDExp _SYMB_3 { $$ = make_PNameDim($1, $3);  }
  | _SYMB_51 { $$ = make_PNameDim2($1);  }
;
ListDExp : DExp { $$ = make_ListDExp($1, 0);  }
  | DExp _SYMB_5 ListDExp { $$ = make_ListDExp($1, $3);  }
;
DExp : DExp _SYMB_7 DExp1 { $$ = make_EDplus($1, $3);  }
  | DExp _SYMB_2 DExp1 { $$ = make_EDminus($1, $3);  }
  | DExp1 { $$ = $1;  }
;
DExp1 : DExp1 _SYMB_4 DExp2 { $$ = make_EDtimes($1, $3);  }
  | DExp1 _SYMB_8 DExp2 { $$ = make_EDdiv($1, $3);  }
  | DExp2 { $$ = $1;  }
;
DExp2 : _SYMB_1 DExp _SYMB_3 { $$ = $2;  }
  | _INTEGER_ { $$ = make_EDInt($1);  }
  | _SYMB_51 { $$ = make_EDName($1);  }
;
ListDataSeg : DataSeg { $$ = make_ListDataSeg($1, 0);  }
  | DataSeg _SYMB_5 ListDataSeg { $$ = make_ListDataSeg($1, $3);  }
;
DataSeg : ListVars _SYMB_8 ListDataVal _SYMB_8 { $$ = make_PDSeg($1, $3);  }
;
ListVars : Vars { $$ = make_ListVars($1, 0);  }
  | Vars _SYMB_5 ListVars { $$ = make_ListVars($1, $3);  }
;
Vars : _SYMB_51 { $$ = make_PVars($1);  }
;
ListDataVal : DataVal { $$ = make_ListDataVal($1, 0);  }
  | DataVal _SYMB_5 ListDataVal { $$ = make_ListDataVal($1, $3);  }
;
DataVal : _SYMB_7 DataValType { $$ = make_PDValPls($2);  }
  | _SYMB_2 DataValType { $$ = make_PDValNeg($2);  }
  | DataValType { $$ = make_PDValNil($1);  }
;
DataValType : _INTEGER_ { $$ = make_PDVInt($1);  }
  | _SYMB_53 { $$ = make_PDVFloat($1);  }
  | _SYMB_52 { $$ = make_PDVChar($1);  }
;
ListName : _SYMB_51 { $$ = make_ListName($1, 0);  }
  | _SYMB_51 _SYMB_5 ListName { $$ = make_ListName($1, $3);  }
;
ListFmtSpecs : FmtSpecs { $$ = make_ListFmtSpecs($1, 0);  }
  | FmtSpecs _SYMB_5 ListFmtSpecs { $$ = make_ListFmtSpecs($1, $3);  }
;
FmtSpecs : _SYMB_52 { $$ = make_FSString($1);  }
  | _SYMB_51 { $$ = make_FSName($1);  }
  | _SYMB_9 { $$ = make_FSINNL();  }
  | _SYMB_8 { $$ = make_FSSlash();  }
;
ListNameOrArray : NameOrArray { $$ = make_ListNameOrArray($1, 0);  }
  | NameOrArray _SYMB_5 ListNameOrArray { $$ = make_ListNameOrArray($1, $3);  }
;
NameOrArray : _SYMB_51 { $$ = make_PNALName($1);  }
  | _SYMB_1 _SYMB_51 _SYMB_1 ListName _SYMB_3 _SYMB_5 DoRangePart _SYMB_3 { $$ = make_PNALArry($2, $4, $7);  }
;
IfThenPart : _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_PIfGoto($3);  }
  | _SYMB_51 _SYMB_6 LExp { $$ = make_PIfAsgn($1, $3);  }
  | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_PIFAsnArr($1, $3, $6);  }
  | _SYMB_46 { $$ = make_PIfRetn();  }
  | _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_PIfCall($2, $4);  }
  | _SYMB_24 _SYMB_51 { $$ = make_PIfCallNil($2);  }
;
LExp : LExp _SYMB_10 LExp2 { $$ = make_Elor($1, $3);  }
  | LExp _SYMB_11 LExp2 { $$ = make_Eland($1, $3);  }
  | LExp2 { $$ = $1;  }
;
LExp2 : LExp2 _SYMB_12 LExp3 { $$ = make_Eeq($1, $3);  }
  | LExp2 _SYMB_13 LExp3 { $$ = make_Eneq($1, $3);  }
  | LExp3 { $$ = $1;  }
;
LExp3 : LExp3 _SYMB_14 LExp4 { $$ = make_Elthen($1, $3);  }
  | LExp3 _SYMB_15 LExp4 { $$ = make_Egrthen($1, $3);  }
  | LExp3 _SYMB_16 LExp4 { $$ = make_Ele($1, $3);  }
  | LExp3 _SYMB_17 LExp4 { $$ = make_Ege($1, $3);  }
  | LExp4 { $$ = $1;  }
;
LExp4 : LExp4 _SYMB_7 LExp5 { $$ = make_Eplus($1, $3);  }
  | LExp4 _SYMB_2 LExp5 { $$ = make_Eminus($1, $3);  }
  | LExp5 { $$ = $1;  }
;
LExp5 : LExp5 _SYMB_4 LExp6 { $$ = make_Etimes($1, $3);  }
  | LExp5 _SYMB_8 LExp6 { $$ = make_Ediv($1, $3);  }
  | LExp6 { $$ = $1;  }
;
LExp6 : Unary_operator LExp8 { $$ = make_Epreop($1, $2);  }
  | LExp8 { $$ = $1;  }
;
LExp8 : LExp5 _SYMB_18 LExp8 { $$ = make_Epower($1, $3);  }
  | LExp8 _SYMB_1 _SYMB_3 { $$ = make_Efunk($1);  }
  | LExp8 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_Efunkpar($1, $3);  }
  | LExp9 { $$ = $1;  }
;
LExp9 : TIntVar RangePart { $$ = make_Evar($1, $2);  }
  | _SYMB_52 { $$ = make_Estr($1);  }
  | LExp10 { $$ = $1;  }
;
RangePart : /* empty */ { $$ = make_ERangeNull();  }
  | _SYMB_19 TIntVar { $$ = make_ERange($2);  }
;
TIntVar : _INTEGER_ { $$ = make_ETInt($1);  }
  | _SYMB_20 { $$ = make_ETTrue();  }
  | _SYMB_21 { $$ = make_ETFalse();  }
  | _SYMB_51 { $$ = make_ETNameVar($1);  }
  | _SYMB_44 { $$ = make_ETRead();  }
;
ListLExp : LExp { $$ = make_ListLExp($1, 0);  }
  | LExp _SYMB_5 ListLExp { $$ = make_ListLExp($1, $3);  }
;
LExp10 : LExp11 { $$ = $1;  }
;
LExp11 : _SYMB_1 LExp _SYMB_3 { $$ = $2;  }
;
Unary_operator : _SYMB_7 { $$ = make_OUnaryPlus();  }
  | _SYMB_2 { $$ = make_OUnaryMinus();  }
  | _SYMB_22 { $$ = make_OUnaryNot();  }
;
ListSpecLExp : SpecLExp { $$ = make_ListSpecLExp($1, 0);  }
  | SpecLExp _SYMB_5 ListSpecLExp { $$ = make_ListSpecLExp($1, $3);  }
;
SpecLExp : LExp { $$ = make_SpLExpNot($1);  }
;
ListAssignName : AssignName { $$ = make_ListAssignName($1, 0);  }
  | AssignName _SYMB_5 ListAssignName { $$ = make_ListAssignName($1, $3);  }
;
AssignName : _SYMB_51 { $$ = make_PAsgnNm($1);  }
  | _INTEGER_ { $$ = make_PAsgnInt($1);  }
  | _SYMB_51 _SYMB_6 LExp { $$ = make_PAssign($1, $3);  }
;
DoRangePart : _SYMB_51 _SYMB_6 LExp _SYMB_5 LExp { $$ = make_PDoRange($1, $3, $5);  }
;
NameOrArrRef : _SYMB_51 { $$ = make_PNOAName($1);  }
  | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 { $$ = make_PNOAArr($1, $3);  }
;
Type_Spec : _SYMB_40 { $$ = make_TInt();  }
  | _SYMB_45 { $$ = make_TFloat();  }
  | _SYMB_32 { $$ = make_TDouble();  }
  | _SYMB_25 { $$ = make_TChar();  }
  | _SYMB_23 { $$ = make_TByte();  }
  | _SYMB_41 { $$ = make_TLogi();  }
;

Fortunately, linux helps with that. I built a little script file that changes these cryptic symbols for something a little more tolerable:

# Clean up the symbols in both the parser and lexor
sed -f symbols Fortran.y &gt;Fortran.yy
cp Fortran.y Fortran.y.bkp
cp Fortran.yy Fortran.y

sed -f symbols Fortran.l &gt;Fortran.ll
cp Fortran.l Fortran.l.bkp
cp Fortran.ll Fortran.l

... and the associated 'symbols' file. NOTE the order of the 'symbols' commands. The _SYMB_1 was changed after the _SYMB_1? symbols otherwise the sed editor would have changed partial symbols:

s/_SYMB_10/T_OR/g
s/_SYMB_11/T_AND/g
s/_SYMB_12/T_EQ/g
s/_SYMB_13/T_NE/g
s/_SYMB_14/T_LT/g
s/_SYMB_15/T_GT/g
s/_SYMB_16/T_LE/g
s/_SYMB_17/T_GE/g
s/_SYMB_18/T_POW/g
s/_SYMB_19/T_COLON/g
s/_SYMB_20/T_TRUE/g
s/_SYMB_21/T_FALSE/g
s/_SYMB_22/T_NOT/g
s/_SYMB_23/T_BYTE/g
s/_SYMB_24/T_CALL/g
s/_SYMB_25/T_CHAR/g
s/_SYMB_26/T_CLOSE/g
s/_SYMB_27/T_COMM/g
s/_SYMB_28/T_CONT/g
s/_SYMB_29/T_DATA/g
s/_SYMB_30/T_DIMS/g
s/_SYMB_31/T_DO/g
s/_SYMB_32/T_DBL/g
s/_SYMB_33/T_END/g
s/_SYMB_34/T_EQU/g
s/_SYMB_35/T_FMT/g
s/_SYMB_36/T_FUNC/g
s/_SYMB_37/T_GO/g
s/_SYMB_38/T_IF/g
s/_SYMB_39/T_IMPL/g
s/_SYMB_40/T_INT/g
s/_SYMB_41/T_LOGI/g
s/_SYMB_42/T_OPEN/g
s/_SYMB_43/T_PARM/g
s/_SYMB_44/T_READ/g
s/_SYMB_45/T_REAL/g
s/_SYMB_46/T_RTN/g
s/_SYMB_47/T_STOP/g
s/_SYMB_48/T_SUBR/g
s/_SYMB_49/T_TO/g
s/_SYMB_50/T_WRITE/g
s/_SYMB_51/T_NAME/g
s/_SYMB_52/T_SQSTR/g
s/_SYMB_53/T_CFLT/g

s/_SYMB_0/T_NEWLINE/g
s/_SYMB_1/T_LPAREN/g
s/_SYMB_2/T_MINUS/g
s/_SYMB_3/T_RPAREN/g
s/_SYMB_4/T_MULT/g
s/_SYMB_5/T_COMMA/g
s/_SYMB_6/T_EQUALS/g
s/_SYMB_7/T_PLUS/g
s/_SYMB_8/T_DIV/g
s/_SYMB_9/T_DOLLAR/g

Makefile:

The Makefile was also a problem (Makefile.old). It is auto generated by BNFC which is great but it had no way to turn on parser debugging which is rather important when creating a frontend from scratch. Bison will often complain about shift/reduce and reduce/reduce errors and you need Bison's output file to debug these. So my next little bit of bash script dealt with that ... it just rewrites the Makefile with the appropriate flags set.

if [ "$flag" == "-d" ]; then
    # ----
    # Modify the Makefile to add debug flags so output is more verbose.
    cp Makefile Makefile.old
    cat Makefile \
        | sed "s/-PFortran$/-PFortran --debug/g" \
        | sed "s/-pFortran$/-pFortran --debug -r all -g/g" \
        &gt; Makefile.new
    cp Makefile.new Makefile

    # Show user the difference in the Makefiles
    echo "--- Makefile ---"
    diff Makefile Makefile.old | sed "s/^/    /g"
fi

This then produces the Parser.output file which we use to debug the BNFC input grammar. I'll explain the process of debugging the grammar when there are reduce/reduce and shift/reduce errors in another post:

Terminals unused in grammar

   _ERROR_


State 177 conflicts: 3 reduce/reduce
State 225 conflicts: 3 reduce/reduce
State 226 conflicts: 3 reduce/reduce
State 227 conflicts: 1 shift/reduce, 3 reduce/reduce


Grammar

    0 $accept: Program $end

    1 Program: ListLblStm

    2 ListLblStm: %empty
    3           | ListLblStm LblStm _SYMB_0

    4 LblStm: Labeled_stm
    5       | Simple_stm
    6       | %empty

    7 Labeled_stm: _INTEGER_ Simple_stm

    8 Simple_stm: _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3
    9           | _SYMB_43 ListNameValue
   10           | _SYMB_30 ListNameDim
   11           | Type_Spec Type_Qual ListNameDim
   12           | Type_Spec ListNameDim
   13           | _SYMB_29 ListDataSeg
   14           | _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName
   15           | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3
   16           | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray
   17           | _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3
   18           | _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray
   19           | _SYMB_44 _SYMB_6 LExp
...

Generated Lexer fixups:

The generated lexer had a few changes that were needed because I was trying to make it conform to what I wanted to do instead of letting it do it's thing:

<YYINITIAL>"
"        return T_NEWLINE;
<YYINITIAL>"("           return T_LPAREN;
<YYINITIAL>"-"           return T_MINUS;
<YYINITIAL>")"           return T_RPAREN;
...
<YYINITIAL>"TO"          return T_TO;
<YYINITIAL>"WRITE"           return T_WRITE;

<YYINITIAL>"//"[^\n]*\n     ++yy_mylinenumber;   /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)*        yylval.string_ = strdup(yytext); return T_NAME;
<YYINITIAL>'.+'          yylval.string_ = strdup(yytext); return T_SQSTR;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F)          yylval.string_ = strdup(yytext); return T_CFLT;
<YYINITIAL>{DIGIT}+          yylval.int_ = atoi(yytext); return _INTEGER_;
\n ++yy_mylinenumber ;
<YYINITIAL>[ \t\r\n\f]           /* ignore white space. */;
<YYINITIAL>.         return _ERROR_;
%%
void initialize_lexer(FILE *inp) { yyrestart(inp); BEGIN YYINITIAL; }

First BNFC doesn't know what to do with a newline character as a token. So you see in the first two lines a broken Flex statement. These two lines were replaced with a line derived from the file 'l1':

"\n" { ++yy_mylinenumber; return T_NEWLINE; };

which treats the newline better. The other change is to get rid of the line that counts the line numbers (the ++yy_mylinenumber; line) and remove newlines and form-feeds from the "ignore white space" line. The following bash script segment does this:

# Do the other modifications to Fortran.l to fix the above problem.
cp Fortran.l Fortran.l.old
cat Fortran.l \
    | grep -v "^<YYINITIAL>"$" \
   | grep -v "
^..\ ++yy_mylinenumber.;$" \
   | sed "
s/^"[ \t]*return T_NEWLINE;$/${l1}/g" \
    | sed "s/\\\n\\\f]/]+/g" \
    > Fortran.l.new
cp Fortran.l.new Fortran.l
touch Fortran.y

# Show changes to user.
echo "--- Fortran.l ---"
diff Fortran.l Fortran.l.old | sed "s/^/    /g"

This produces the following lexer:

"\n"       { ++yy_mylinenumber; return T_NEWLINE; };
<YYINITIAL>"("           return T_LPAREN;
<YYINITIAL>"-"           return T_MINUS;
<YYINITIAL>")"           return T_RPAREN;
...
<YYINITIAL>"TO"          return T_TO;
<YYINITIAL>"WRITE"           return T_WRITE;

<YYINITIAL>"//"[^\n]*\n     ++yy_mylinenumber;   /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)*        yylval.string_ = strdup(yytext); return T_NAME;
<YYINITIAL>'.+'          yylval.string_ = strdup(yytext); return T_SQSTR;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F)          yylval.string_ = strdup(yytext); return T_CFLT;
<YYINITIAL>{DIGIT}+          yylval.int_ = atoi(yytext); return _INTEGER_;
<YYINITIAL>[ \t\r]+          /* ignore white space. */;
<YYINITIAL>.         return _ERROR_;
%%

Code Generation:

The generation of all the Flex/Bison and the pretty printer code is the saving grace. BNFC gets you most of the way to producing a front-end for your compiler and makes life a lot easier if, like me, you tend to forget how to write a Flex and Bison driver file in between doing other coding.

However, if you are new to building a compiler then BNFC may just make things worse by adding another level of complexity to an already complex problem. For this reason it might be best for novice compiler writers to ignore BNFC and read a good book on Flex/Bison and maybe a text book on compiler writing.

In my next post I will describe the code that is generated by BNFC.

Here are all the files (complete) that I've talked about above.

Up and running…

Well apart from programming Android games ... at the moment I find myself learning WordPress. I used to program in raw HTML/CSS but I decided that I should proceed out of the stone age and thus ... here I am.

Obviously nothing particularly useful at the moment but I plan to add a number of posts on topics that I've learned something about that might be useful to others. Here are some of the things I've been working on that I'll start posting about (no particular order):

  • Android programming. Both in Java and through the NDK.
  • Cocos2d-x programming. I've published my first Android game created with Android-Studio and the Cocos2d-x game engine.
  • OpenCV. I've learned lots about vision software so now it's time to learn some OpenCV stuff. This will come in handy now that people are carrying around fairly powerful computers with included cameras.
    • My SFBC game tool for Android will be using OpenCV to convert camera images to electronic SSDs (Ship System Display -- schematics of the ship and its systems, weapons, engines etc). See Google Images for lots of examples.
  • Game programming. In particular some of the stuff that goes into game programming that isn't obvious (the stuff other than graphics and sound)
    • PCG (Procedural Content Generation). This is a way to generate games scenarios without going to all the trouble and space to create them individually. Pixel Dungeon and all its variants seem to use this. There seems to be a great amount of research work been done on PCG now and it looks like it is the technology to learn.
    • AI (Artificial Intelligence) makes the bots do their thing and makes them act in a way that makes the game challenging for the player. But an AI can't be to smart or the players get frustrated, or too stupid or the players get bored.
  • Compiler writing. I've got a Basic-to-C translator working already but I thought others could use some helpful notes on how to get started here.
    • A Fortran-to-C compiler is also in the back of my mind especially considering I have a few games in Fortran that really want to be converted to C.
    • BNFC is a program to convert between a BNF grammer and the Flex/Bison lexer and parser for C (and various other languages and compiler tools) and looks like a fairly quick way to implement the front end of a compiler ... I'll let you know.
    • Flex/Bison. I suspect that BNFC is a pretty good system but I'm not sure it will help with languages like Fortran with it's position specific code. As well BNFC produces Flex/Bison code so you have to know how to understand the error messages to know how adjust your BNFC code.
  • NCurses programming. Not terribly useful in the present day but some people are nostalgic for the old style games and this is a great way to build them (see Dwarf Fortress)