BNFC is a great tool but it has some quirks that have slowed down the process of building a front-end for my fortran2c translator. However, the code generation, especially if you haven’t built a compiler before moves you quite a distance forward. Here’s what I’ve found so far…
Position dependent code:
First, in the lexical analyzer (the part that converts from a character stream to a token stream) BNFC is fairly inflexible. In the case of Fortran and other older languages there are position dependent tokens. Some examples come to mind: The comment, the continuation line, label-numbers and the sequence column. These are illustrated in the file segment (Maze.for) below:
[cc lang=”fortran” tabsize=”7″ lines=”20″ width=”600″] C C MAZE DESCRIPTION C 180 WRITE(6,190) HEIGHT,WIDTH,DEPTH 190 FORMAT(‘0′,’ YOUR MAZE HAS A HEIGHT OF’,I5,/, 1 ‘ AND A WIDTH OF’,I5,/, 1 ‘ WITH A DEPTH OF’,I5,//, 2 ‘ THE DIRECTION COMMANDS FOR MAZE ARE SINGLE LETTERS’,/, 2 ‘ N(ORTH), U(P), OR 8 IS UP’,/, 2 ‘ E(AST) , R(IGHT), OR 6 IS RIGHT’,/, 2 ‘ S(OUTH), D(OWN), OR 2 IS DOWN’,/, 2 ‘ W(EST) , L(EFT), OR 4 IS LEFT’,/, 2 ‘ I(N) , OR 9 IS IN TO SCREEN’,/, 2 ‘ O(UT) , OR 7 IS OUT OF SCREEN’,/, [/cc]Comments start with a “C” in column one and go to the end-of-line. Continuation lines start at the beginning with a tab or 5 spaces, have a continuation mark (usually 0-9 or ‘+’) then a space and the body of the continuation. Labels are numbers preceded with spaces and ending at column 5.
Unfortunately I don’t have a example of code with a sequence column. These were basically 8-digit numbers in columns 73-80. Code including the above continuation/label-number preface to statements was from column 1 to 72 with 73-80 being left over for the sequence number. I believe this sequence number was a holdover from the old punched card days when people would do a 52-card pickup with a program deck (which usually went far beyond 52 cards) and needed a way to sort it back into a program (by machine).

All these things need Flex code that allows for multiple states (i.e. <prefix>, <seqno>, <statement>) which doesn’t seem to be a possibility in BNFC.
What I ended up doing was writing a small state-machine program (in C) that converted comments to C-like ‘//’ comments and just joined continuation lines into one long line (since we aren’t working on 80-column punch cards anymore). The code below can be compiled with the command (I’m running Ubuntu/linux with the GNU gcc compiler):
g++ -std=c++11 -g fixup.c -o fixup
[cc lang=”c” tabsize=”8″ lines=”20″ width=”600″] /* * Program to do some preprocessing on a Fortran file to deal with: * “\nC” ==> “\n//” — Comments, and * “[ \t]*\n [0-9+][ \t]*” ==> “” — Continuation lines * “[ \t]*\n\t[0-9+][ \t]*” ==> “” — Continuation lines * */ #includeThe Maze.for program then became (Maze_pp.for):
[cc lang=”fortran” tabsize=”8″ lines=”20″ width=”600″] // // MAZE – USES A VT100 TO WANDER AROUND. // THE VT100 MUST HAVE ADVANCED VIDEO OPTION. // ANSI VT100 ESCAPE SEQUENCES ARE USED. // // WRITTEN BY DON MCLEAN // OF THE MACNEAL-SCHWENDLER CORP. // // THE PURPOSE OF THIS PROGRAM WAS TO // 1. LEARN SOMETHING ABOUT THE VT100 GRAPHICS. // 2. KEEP MY KIDS BUSY ON WEEKENDS. WHILE I TRIED // TO GET SOMETHING ELSE DONE. // // USE OF THIS PROGRAM FOR ANY PURPOSE OTHER THAN FUN // IS PROHIBITED. // IMPLICIT INTEGER*4 (A-Z) // // MAZE DIMENSIONS // HMAX AND WMAX SHOULD NOT BE LARGER THAN 22 AND 80 RESP. // PARAMETER HMAX=22, WMAX=80, DMAX=4 // DIMENSION SLEEP(2) // // DIMENSION IS HMAX*WMAX*DMAX INTEGER*2 EXIT(HMAX*WMAX*DMAX), MAT(HMAX*WMAX*DMAX) INTEGER*2 LCOUNT(DMAX) // BYTE CLEAR(2) // CHARACTER*200 INPUT // COMMON /MAZECM/ STARTH,STARTW,STARTD,ENDH,ENDW,ENDD,NOBELL // // CLEAR IS A VT100 RESET // DATA CLEAR / 27, ‘c’ / // // START – SEE IF AN OLD GAME IS TO BE USED. // WRITE(6,10) 10 FORMAT(‘ WELCOME TO MAZE’) // 20 WRITE(6,30) 30 FORMAT(‘ ARE YOU GOING TO PLAY A SAVED GAME? ‘,$) READ(5,40) NC,INPUT 40 FORMAT(Q,A) IF(INDEX(INPUT(1:NC),’Y’).NE.0) GO TO 120 SAVE = 0 // // INPUT DIMENSION OF MAZE // 50 WRITE(6,60) HMAX 60 FORMAT(‘ PLEASE INPUT HEIGHT OF MAZE – DEFAULT = ‘,I2,’ ‘,$) READ(5,40) NC,INPUT READ(INPUT,70,ERR=50) HEIGHT 70 FORMAT(BNI2) IF(HEIGHT.EQ.0) HEIGHT=HMAX IF(HEIGHT.LT.2) HEIGHT=2 IF(HEIGHT.GT.HMAX) HEIGHT=HMAX 80 WRITE(6,90) WMAX 90 FORMAT(‘ PLEASE INPUT WIDTH OF MAZE – DEFAULT = ‘,I2,’ ‘,$) READ(5,40) NC,INPUT READ(INPUT,70,ERR=80) WIDTH IF(WIDTH.EQ.0) WIDTH = WMAX IF(WIDTH.LT.2) WIDTH=2 IF(WIDTH.GT.WMAX) WIDTH=WMAX 100 WRITE(6,110) 110 FORMAT(‘ PLEASE INPUT DEPTH OF MAZE – DEFAULT = 1 ‘,$) READ(5,40) NC,INPUT READ(INPUT,70,ERR=100) DEPTH IF(DEPTH.LE.0) DEPTH = 1 IF(DEPTH.GT.DMAX) DEPTH = DMAX NTERMS = HEIGHT * WIDTH * DEPTH … [/cc]Symbols:
The next problem that encountered were the symbols in the BNFC lexer/parser, _SYMB_43, for example. Yuck! I would have been shot at a code review for that. Here’s what the lexer looked like (Fortran.l.bkp):
[cc lang=”text” tabsize=”8″ lines=”20″ width=”600″] /* -*- c -*- This FLex file was machine-generated by the BNF converter */ %option noyywrap %{ #define yylval Fortranlval #define YY_BUFFER_APPEND Fortran_BUFFER_APPEND #define YY_BUFFER_RESET Fortran_BUFFER_RESET #define initialize_lexer Fortran_initialize_lexer #includeThe parser(Fortran.y.bkp) was worse … I wish they had some way of converting these symbols to something more human readable:
[cc lang=”text” tabsize=”8″ lines=”20″ width=”600″] %start Program %% Program : ListLblStm { $$ = make_Progr(reverseListLblStm($1)); YY_RESULT_Program_= $$; } ; ListLblStm : /* empty */ { $$ = 0; } | ListLblStm LblStm _SYMB_0 { $$ = make_ListLblStm($2, $1); } ; LblStm : Labeled_stm { $$ = make_SLabel($1); YY_RESULT_LblStm_= $$; } | Simple_stm { $$ = make_SSimple($1); YY_RESULT_LblStm_= $$; } | /* empty */ { $$ = make_SNill(); YY_RESULT_LblStm_= $$; } ; Labeled_stm : _INTEGER_ Simple_stm { $$ = make_SLabelOne($1, $2); } ; Simple_stm : _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3 { $$ = make_SImplicit($2, $3, $5, $7); } | _SYMB_43 ListNameValue { $$ = make_SParameter($2); } | _SYMB_30 ListNameDim { $$ = make_SDiment($2); } | Type_Spec Type_Qual ListNameDim { $$ = make_SDeclQual($1, $2, $3); } | Type_Spec ListNameDim { $$ = make_SDecl($1, $2); } | _SYMB_29 ListDataSeg { $$ = make_SData($2); } | _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName { $$ = make_SCommon($3, $5); } | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SWrtEmp($3); } | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SWrite($3, $5); } | _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3 { $$ = make_SFormat($3); } | _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SRead($3, $5); } | _SYMB_44 _SYMB_6 LExp { $$ = make_SAsignRead($3); } | _SYMB_38 _SYMB_1 LExp _SYMB_3 IfThenPart { $$ = make_SIf($3, $5); } | _SYMB_51 _SYMB_6 LExp { $$ = make_SAssign($1, $3); } | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_SAsnArr($1, $3, $6); } | _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunCall($2, $4); } | _SYMB_24 _SYMB_51 { $$ = make_SFunCallNil($2); } | _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_SGoto($3); } | _SYMB_42 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SOpen($3); } | _SYMB_26 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SClose($3); } | _SYMB_31 _INTEGER_ DoRangePart { $$ = make_SDo($2, $3); } | _SYMB_47 { $$ = make_SStop(); } | _SYMB_47 _SYMB_52 { $$ = make_SStopMsg($2); } | _SYMB_33 { $$ = make_SEnd(); } | _SYMB_48 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SSubr($2, $4); } | _SYMB_48 _SYMB_51 { $$ = make_SSubrNil($2); } | _SYMB_36 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunct($2, $4); } | _SYMB_36 _SYMB_51 { $$ = make_SFunctNil($2); } | _SYMB_28 { $$ = make_SContinue(); } | _SYMB_46 { $$ = make_SReturn(); } | _SYMB_34 _SYMB_1 _SYMB_51 _SYMB_5 NameOrArrRef _SYMB_3 { $$ = make_SEquiv($3, $5); } ; Type_Qual : _SYMB_4 _INTEGER_ { $$ = make_QType($2); } ; ListNameValue : NameValue { $$ = make_ListNameValue($1, 0); } | NameValue _SYMB_5 ListNameValue { $$ = make_ListNameValue($1, $3); } ; NameValue : _SYMB_51 _SYMB_6 _INTEGER_ { $$ = make_NVPair($1, $3); } ; ListNameDim : NameDim { $$ = make_ListNameDim($1, 0); } | NameDim _SYMB_5 ListNameDim { $$ = make_ListNameDim($1, $3); } ; NameDim : _SYMB_51 _SYMB_1 ListDExp _SYMB_3 { $$ = make_PNameDim($1, $3); } | _SYMB_51 { $$ = make_PNameDim2($1); } ; ListDExp : DExp { $$ = make_ListDExp($1, 0); } | DExp _SYMB_5 ListDExp { $$ = make_ListDExp($1, $3); } ; DExp : DExp _SYMB_7 DExp1 { $$ = make_EDplus($1, $3); } | DExp _SYMB_2 DExp1 { $$ = make_EDminus($1, $3); } | DExp1 { $$ = $1; } ; DExp1 : DExp1 _SYMB_4 DExp2 { $$ = make_EDtimes($1, $3); } | DExp1 _SYMB_8 DExp2 { $$ = make_EDdiv($1, $3); } | DExp2 { $$ = $1; } ; DExp2 : _SYMB_1 DExp _SYMB_3 { $$ = $2; } | _INTEGER_ { $$ = make_EDInt($1); } | _SYMB_51 { $$ = make_EDName($1); } ; ListDataSeg : DataSeg { $$ = make_ListDataSeg($1, 0); } | DataSeg _SYMB_5 ListDataSeg { $$ = make_ListDataSeg($1, $3); } ; DataSeg : ListVars _SYMB_8 ListDataVal _SYMB_8 { $$ = make_PDSeg($1, $3); } ; ListVars : Vars { $$ = make_ListVars($1, 0); } | Vars _SYMB_5 ListVars { $$ = make_ListVars($1, $3); } ; Vars : _SYMB_51 { $$ = make_PVars($1); } ; ListDataVal : DataVal { $$ = make_ListDataVal($1, 0); } | DataVal _SYMB_5 ListDataVal { $$ = make_ListDataVal($1, $3); } ; DataVal : _SYMB_7 DataValType { $$ = make_PDValPls($2); } | _SYMB_2 DataValType { $$ = make_PDValNeg($2); } | DataValType { $$ = make_PDValNil($1); } ; DataValType : _INTEGER_ { $$ = make_PDVInt($1); } | _SYMB_53 { $$ = make_PDVFloat($1); } | _SYMB_52 { $$ = make_PDVChar($1); } ; ListName : _SYMB_51 { $$ = make_ListName($1, 0); } | _SYMB_51 _SYMB_5 ListName { $$ = make_ListName($1, $3); } ; ListFmtSpecs : FmtSpecs { $$ = make_ListFmtSpecs($1, 0); } | FmtSpecs _SYMB_5 ListFmtSpecs { $$ = make_ListFmtSpecs($1, $3); } ; FmtSpecs : _SYMB_52 { $$ = make_FSString($1); } | _SYMB_51 { $$ = make_FSName($1); } | _SYMB_9 { $$ = make_FSINNL(); } | _SYMB_8 { $$ = make_FSSlash(); } ; ListNameOrArray : NameOrArray { $$ = make_ListNameOrArray($1, 0); } | NameOrArray _SYMB_5 ListNameOrArray { $$ = make_ListNameOrArray($1, $3); } ; NameOrArray : _SYMB_51 { $$ = make_PNALName($1); } | _SYMB_1 _SYMB_51 _SYMB_1 ListName _SYMB_3 _SYMB_5 DoRangePart _SYMB_3 { $$ = make_PNALArry($2, $4, $7); } ; IfThenPart : _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_PIfGoto($3); } | _SYMB_51 _SYMB_6 LExp { $$ = make_PIfAsgn($1, $3); } | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_PIFAsnArr($1, $3, $6); } | _SYMB_46 { $$ = make_PIfRetn(); } | _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_PIfCall($2, $4); } | _SYMB_24 _SYMB_51 { $$ = make_PIfCallNil($2); } ; LExp : LExp _SYMB_10 LExp2 { $$ = make_Elor($1, $3); } | LExp _SYMB_11 LExp2 { $$ = make_Eland($1, $3); } | LExp2 { $$ = $1; } ; LExp2 : LExp2 _SYMB_12 LExp3 { $$ = make_Eeq($1, $3); } | LExp2 _SYMB_13 LExp3 { $$ = make_Eneq($1, $3); } | LExp3 { $$ = $1; } ; LExp3 : LExp3 _SYMB_14 LExp4 { $$ = make_Elthen($1, $3); } | LExp3 _SYMB_15 LExp4 { $$ = make_Egrthen($1, $3); } | LExp3 _SYMB_16 LExp4 { $$ = make_Ele($1, $3); } | LExp3 _SYMB_17 LExp4 { $$ = make_Ege($1, $3); } | LExp4 { $$ = $1; } ; LExp4 : LExp4 _SYMB_7 LExp5 { $$ = make_Eplus($1, $3); } | LExp4 _SYMB_2 LExp5 { $$ = make_Eminus($1, $3); } | LExp5 { $$ = $1; } ; LExp5 : LExp5 _SYMB_4 LExp6 { $$ = make_Etimes($1, $3); } | LExp5 _SYMB_8 LExp6 { $$ = make_Ediv($1, $3); } | LExp6 { $$ = $1; } ; LExp6 : Unary_operator LExp8 { $$ = make_Epreop($1, $2); } | LExp8 { $$ = $1; } ; LExp8 : LExp5 _SYMB_18 LExp8 { $$ = make_Epower($1, $3); } | LExp8 _SYMB_1 _SYMB_3 { $$ = make_Efunk($1); } | LExp8 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_Efunkpar($1, $3); } | LExp9 { $$ = $1; } ; LExp9 : TIntVar RangePart { $$ = make_Evar($1, $2); } | _SYMB_52 { $$ = make_Estr($1); } | LExp10 { $$ = $1; } ; RangePart : /* empty */ { $$ = make_ERangeNull(); } | _SYMB_19 TIntVar { $$ = make_ERange($2); } ; TIntVar : _INTEGER_ { $$ = make_ETInt($1); } | _SYMB_20 { $$ = make_ETTrue(); } | _SYMB_21 { $$ = make_ETFalse(); } | _SYMB_51 { $$ = make_ETNameVar($1); } | _SYMB_44 { $$ = make_ETRead(); } ; ListLExp : LExp { $$ = make_ListLExp($1, 0); } | LExp _SYMB_5 ListLExp { $$ = make_ListLExp($1, $3); } ; LExp10 : LExp11 { $$ = $1; } ; LExp11 : _SYMB_1 LExp _SYMB_3 { $$ = $2; } ; Unary_operator : _SYMB_7 { $$ = make_OUnaryPlus(); } | _SYMB_2 { $$ = make_OUnaryMinus(); } | _SYMB_22 { $$ = make_OUnaryNot(); } ; ListSpecLExp : SpecLExp { $$ = make_ListSpecLExp($1, 0); } | SpecLExp _SYMB_5 ListSpecLExp { $$ = make_ListSpecLExp($1, $3); } ; SpecLExp : LExp { $$ = make_SpLExpNot($1); } ; ListAssignName : AssignName { $$ = make_ListAssignName($1, 0); } | AssignName _SYMB_5 ListAssignName { $$ = make_ListAssignName($1, $3); } ; AssignName : _SYMB_51 { $$ = make_PAsgnNm($1); } | _INTEGER_ { $$ = make_PAsgnInt($1); } | _SYMB_51 _SYMB_6 LExp { $$ = make_PAssign($1, $3); } ; DoRangePart : _SYMB_51 _SYMB_6 LExp _SYMB_5 LExp { $$ = make_PDoRange($1, $3, $5); } ; NameOrArrRef : _SYMB_51 { $$ = make_PNOAName($1); } | _SYMB_51 _SYMB_1 ListLExp _SYMB_3 { $$ = make_PNOAArr($1, $3); } ; Type_Spec : _SYMB_40 { $$ = make_TInt(); } | _SYMB_45 { $$ = make_TFloat(); } | _SYMB_32 { $$ = make_TDouble(); } | _SYMB_25 { $$ = make_TChar(); } | _SYMB_23 { $$ = make_TByte(); } | _SYMB_41 { $$ = make_TLogi(); } ; [/cc]Fortunately, linux helps with that. I built a little script file that changes these cryptic symbols for something a little more tolerable:
[cc lang=”bash” tabsize=”8″ lines=”9″ width=”600″] # Clean up the symbols in both the parser and lexor sed -f symbols Fortran.y >Fortran.yy cp Fortran.y Fortran.y.bkp cp Fortran.yy Fortran.y sed -f symbols Fortran.l >Fortran.ll cp Fortran.l Fortran.l.bkp cp Fortran.ll Fortran.l [/cc]… and the associated ‘symbols’ file. NOTE the order of the ‘symbols’ commands. The _SYMB_1 was changed after the _SYMB_1? symbols otherwise the sed editor would have changed partial symbols:
[cc lang=”bash” tabsize=”8″ lines=”20″ width=”600″] s/_SYMB_10/T_OR/g s/_SYMB_11/T_AND/g s/_SYMB_12/T_EQ/g s/_SYMB_13/T_NE/g s/_SYMB_14/T_LT/g s/_SYMB_15/T_GT/g s/_SYMB_16/T_LE/g s/_SYMB_17/T_GE/g s/_SYMB_18/T_POW/g s/_SYMB_19/T_COLON/g s/_SYMB_20/T_TRUE/g s/_SYMB_21/T_FALSE/g s/_SYMB_22/T_NOT/g s/_SYMB_23/T_BYTE/g s/_SYMB_24/T_CALL/g s/_SYMB_25/T_CHAR/g s/_SYMB_26/T_CLOSE/g s/_SYMB_27/T_COMM/g s/_SYMB_28/T_CONT/g s/_SYMB_29/T_DATA/g s/_SYMB_30/T_DIMS/g s/_SYMB_31/T_DO/g s/_SYMB_32/T_DBL/g s/_SYMB_33/T_END/g s/_SYMB_34/T_EQU/g s/_SYMB_35/T_FMT/g s/_SYMB_36/T_FUNC/g s/_SYMB_37/T_GO/g s/_SYMB_38/T_IF/g s/_SYMB_39/T_IMPL/g s/_SYMB_40/T_INT/g s/_SYMB_41/T_LOGI/g s/_SYMB_42/T_OPEN/g s/_SYMB_43/T_PARM/g s/_SYMB_44/T_READ/g s/_SYMB_45/T_REAL/g s/_SYMB_46/T_RTN/g s/_SYMB_47/T_STOP/g s/_SYMB_48/T_SUBR/g s/_SYMB_49/T_TO/g s/_SYMB_50/T_WRITE/g s/_SYMB_51/T_NAME/g s/_SYMB_52/T_SQSTR/g s/_SYMB_53/T_CFLT/g s/_SYMB_0/T_NEWLINE/g s/_SYMB_1/T_LPAREN/g s/_SYMB_2/T_MINUS/g s/_SYMB_3/T_RPAREN/g s/_SYMB_4/T_MULT/g s/_SYMB_5/T_COMMA/g s/_SYMB_6/T_EQUALS/g s/_SYMB_7/T_PLUS/g s/_SYMB_8/T_DIV/g s/_SYMB_9/T_DOLLAR/g [/cc]Makefile:
The Makefile was also a problem (Makefile.old). It is auto generated by BNFC which is great but it had no way to turn on parser debugging which is rather important when creating a frontend from scratch. Bison will often complain about shift/reduce and reduce/reduce errors and you need Bison’s output file to debug these. So my next little bit of bash script dealt with that … it just rewrites the Makefile with the appropriate flags set.
[cc lang=”bash” tabsize=”8″ lines=”20″ width=”600″] if [ “$flag” == “-d” ]; then # —- # Modify the Makefile to add debug flags so output is more verbose. cp Makefile Makefile.old cat Makefile \ | sed “s/-PFortran$/-PFortran –debug/g” \ | sed “s/-pFortran$/-pFortran –debug -r all -g/g” \ > Makefile.new cp Makefile.new Makefile # Show user the difference in the Makefiles echo “— Makefile —” diff Makefile Makefile.old | sed “s/^/ /g” fi [/cc]This then produces the Parser.output file which we use to debug the BNFC input grammar. I’ll explain the process of debugging the grammar when there are reduce/reduce and shift/reduce errors in another post:
[cc lang=”text” tabsize=”8″ lines=”20″ width=”600″] Terminals unused in grammar _ERROR_ State 177 conflicts: 3 reduce/reduce State 225 conflicts: 3 reduce/reduce State 226 conflicts: 3 reduce/reduce State 227 conflicts: 1 shift/reduce, 3 reduce/reduce Grammar 0 $accept: Program $end 1 Program: ListLblStm 2 ListLblStm: %empty 3 | ListLblStm LblStm _SYMB_0 4 LblStm: Labeled_stm 5 | Simple_stm 6 | %empty 7 Labeled_stm: _INTEGER_ Simple_stm 8 Simple_stm: _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3 9 | _SYMB_43 ListNameValue 10 | _SYMB_30 ListNameDim 11 | Type_Spec Type_Qual ListNameDim 12 | Type_Spec ListNameDim 13 | _SYMB_29 ListDataSeg 14 | _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName 15 | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 16 | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray 17 | _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3 18 | _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray 19 | _SYMB_44 _SYMB_6 LExp … [/cc]Generated Lexer fixups:
The generated lexer had a few changes that were needed because I was trying to make it conform to what I wanted to do instead of letting it do it’s thing:
[cc lang=”text” tabsize=”8″ lines=”20″ width=”600″]First BNFC doesn’t know what to do with a newline character as a token. So you see in the first two lines a broken Flex statement. These two lines were replaced with a line derived from the file ‘l1’:
“\n” { ++yy_mylinenumber; return T_NEWLINE; };
which treats the newline better. The other change is to get rid of the line that counts the line numbers (the ++yy_mylinenumber; line) and remove newlines and form-feeds from the “ignore white space” line. The following bash script segment does this:
[cc lang=”bash” tabsize=”8″ lines=”20″ width=”600″] # Do the other modifications to Fortran.l to fix the above problem. cp Fortran.l Fortran.l.old cat Fortran.l \ | grep -v “^This produces the following lexer:
[cc lang=”text” tabsize=”8″ lines=”20″ width=”600″] “\n” { ++yy_mylinenumber; return T_NEWLINE; };Code Generation:
The generation of all the Flex/Bison and the pretty printer code is the saving grace. BNFC gets you most of the way to producing a front-end for your compiler and makes life a lot easier if, like me, you tend to forget how to write a Flex and Bison driver file in between doing other coding.
However, if you are new to building a compiler then BNFC may just make things worse by adding another level of complexity to an already complex problem. For this reason it might be best for novice compiler writers to ignore BNFC and read a good book on Flex/Bison and maybe a text book on compiler writing.
In my next post I will describe the code that is generated by BNFC.
Here are all the files (complete) that I’ve talked about above.