The front-end to my Fortran 2 C translator turned out a lot easier to create with BNFC than I was anticipating. I was expecting a lot more difficulties than actually happened. There are a few things I should mention as potential problems with BNFC that should be watched for.
First, BNFC is supposedly designed for well formed languages which I gathered from the documentation means that there should be no position dependent stuff like is typical in Fortran:
- Comments that start with a ‘C’ in the first column
- Continuation lines, either with a tab or 5 spaces then a continuation mark at the front of the line.
- Order of the Flex token matchings.
The first two were easily dispatched with a simple state based machine (source):
[cc lang=”C” tab_size=”8″ lines=”20″ width=”600″] /* * Program to do some preprocessing on a Fortran file to deal with: * “\nC” ==> “\n//” — Comments, and * “[ \t]*\n [0-9+][ \t]*” ==> “” — Continuation lines * “[ \t]*\n\t[0-9+][ \t]*” ==> “” — Continuation lines * */ #includeDealing with the lack of access to the Fortran.l (Flex lexical analysor for the program) turned out to not as much of a problem as I had anticipated. Mostly I just adjusted to having less control over the lexor and made a little bash script to modify the lexor to use ‘\n’ to trigger a End-of-Statement token (others suggest using a pre-processor to add some sort of line termination token at the end of each line of the source code to be translated). The other modification was to remove ‘\f’ ‘\n’ as white space and to fixup the broken Flex code (resulting from my inclusion of “\n” as the statement ending token in my BNFC file).
These fixups and other code to convert the LBNF (Labelled BNF) grammar, compile it and run a test program against it is as follows (source):
[cc lang=”bash” tab_size=”8″ lines=”20″ width=”600″] #!/bin/bash # # Script file to translate a BNFC/LBNF grammar, compile it and run the resultant # program against a Fortran file to be translated. # # (Script created by alan on 21/Apr/2019 18:35:42) # pgm=go tmp=/tmp/${pgm}_ bin=${HOME}/bin log=${bin}/data/${pgm}.$(uname -n).log pgm=${1:-Maze.for} # Remove program binary to make sure it doesn’t run if we have a problem. rm TestFortran 2>/dev/null # Compile the LBNF grammar in Fortran.cf into a working front-end for translator. bnfc -m -c Fortran.cf # Touch the lex/yacc lexical analyser and parser source files to make sure they are compiled. touch *.y *.l 2>/dev/null # —- # Modify the Makefile to add debug flags so output is more verbose. cp Makefile Makefile.old cat Makefile \ | sed “s/-PFortran$/-PFortran –debug/g” \ | sed “s/-pFortran$/-pFortran –debug -r all -g/g” \ > Makefile.new cp Makefile.new Makefile # Show user the difference in the Makefiles echo “— Makefile —” diff Makefile Makefile.old | sed “s/^/ /g” # —- # Modify the lexical analyser to make \n a statement end character (deficit in BNFC). # The l1 file contains: # “\\n” { ++yy_mylinenumber; return _SYMB_0; }; # This will trap any newlines, increment the line-number and return it as a token. l1=$(cat l1 ) #echo “l1=’$l1′” # Do the other modifications to Fortran.l to fix the above problem. cp Fortran.l Fortran.l.old cat Fortran.l \ | grep -v “^Mostly I studied the ‘C’ BNFC/LBNF grammar from the BNFC website and then created my own grammar.
Starting with the Name token (whos format is particular to to Fortran and more so Vax Fortran) and started to build my grammar from the Program definition.
The program definition is just a list of statements (either numbered or not). I’d run my ‘go’ script above. It would tell me the line in my Fortran source file where the error occurred and the last tokenizing rule (line number in the auto generated Fortran.l file) that was accepted.
From this I could determine which token was triggering the error and thus where my grammar was wrong. Finally I came up with the following LBNF grammar which seemed to lex/parse all my example Fortran programs without problem (source):
[cc lang=”text” tab_size=”8″ lines=”20″ width=”600″] — — Fortran LBNF grammar (Fortran.cf) to recognize a Fortran 66/IV program. — — Created by Alan Angold 2019-04-23 — entrypoints Program, LblStm ; — Tokens ———————————————————————- position token Name ( ‘%’ )* upper ( upper | digit | ‘$’ | ‘_’ )* ; position token SQString ‘\” ( char )+ ‘\”; position token CFloat ((digit+ ‘.’ digit+)((‘e’|’E’)(‘-‘)? digit+)? (‘f’|’F’))|((digit+ (‘e’|’E’)(‘-‘)? digit+)(‘f’|’F’)); — Tokens ———————————————————————- Progr. Program ::= [ LblStm ] ; []. [LblStm] ::= ; (:). [LblStm] ::= LblStm “\n” [LblStm]; SLabel. LblStm ::= Labeled_stm ; SSimple. LblStm ::= Simple_stm ; SNill. LblStm ::= ; SLabelOne. Labeled_stm ::= Integer Simple_stm ; — Simple_stm —————————————————————— SImplicit. Simple_stm ::= “IMPLICIT” Type_Spec Type_Qual “(” Name “-” Name “)” ; QType. Type_Qual ::= “*” Integer ; — Simple_stm —————————————————————— SParameter. Simple_stm ::= “PARAMETER” [NameValue]; (:[]). [NameValue] ::= NameValue ; (:). [NameValue] ::= NameValue “,” [NameValue]; NVPair. NameValue ::= Name “=” Integer ; — Simple_stm —————————————————————— SDiment. Simple_stm ::= “DIMENSION” [ NameDim ] ; (:[]). [NameDim] ::= NameDim ; (:). [NameDim] ::= NameDim “,” [NameDim]; PNameDim. NameDim ::= Name “(” [DExp] “)” ; PNameDim2. NameDim ::= Name ; (:[]). [DExp] ::= DExp ; (:). [DExp] ::= DExp “,” [DExp]; EDplus. DExp ::= DExp “+” DExp1; EDminus. DExp ::= DExp “-” DExp1; EDtimes. DExp1 ::= DExp1 “*” DExp2; EDdiv. DExp1 ::= DExp1 “/” DExp2; _. DExp ::= DExp1 ; _. DExp1 ::= DExp2 ; _. DExp2 ::= “(” DExp “)” ; EDInt. DExp2 ::= Integer; EDName. DExp2 ::= Name; — Simple_stm —————————————————————— SDeclQual. Simple_stm ::= Type_Spec Type_Qual [NameDim] ; SDecl. Simple_stm ::= Type_Spec [NameDim] ; — Simple_stm —————————————————————— SData. Simple_stm ::= “DATA” [DataSeg]; (:[]). [DataSeg] ::= DataSeg ; (:). [DataSeg] ::= DataSeg “,” [DataSeg]; PDSeg. DataSeg ::= [Vars] “/” [DataVal] “/” ; (:[]). [Vars] ::= Vars ; (:). [Vars] ::= Vars “,” [Vars]; PVars. Vars ::= Name ; (:[]). [DataVal] ::= DataVal ; (:). [DataVal] ::= DataVal “,” [DataVal]; PDValPls. DataVal ::= “+” DataValType ; PDValNeg. DataVal ::= “-” DataValType ; PDValNil. DataVal ::= DataValType ; PDVInt. DataValType ::= Integer ; PDVFloat. DataValType ::= CFloat ; PDVChar. DataValType ::= SQString ; — Simple_stm —————————————————————— SCommon. Simple_stm ::= “COMMON” “/” Name “/” [Name] ; (:[]). [Name] ::= Name ; (:). [Name] ::= Name “,” [Name]; — Simple_stm —————————————————————— SWrtEmp. Simple_stm ::= “WRITE” “(” [AssignName] “)” ; SWrite. Simple_stm ::= “WRITE” “(” [AssignName] “)” [NameOrArray] ; SFormat. Simple_stm ::= “FORMAT” “(” [FmtSpecs] “)” ; (:[]). [FmtSpecs] ::= FmtSpecs ; (:). [FmtSpecs] ::= FmtSpecs “,” [FmtSpecs]; FSString. FmtSpecs ::= SQString ; FSName. FmtSpecs ::= Name ; FSINNL. FmtSpecs ::= “$”; FSSlash. FmtSpecs ::= “/”; — Simple_stm —————————————————————— SRead. Simple_stm ::= “READ” “(” [AssignName] “)” [NameOrArray] ; SAsignRead. Simple_stm ::= “READ” “=” LExp; (:[]). [NameOrArray] ::= NameOrArray ; (:). [NameOrArray] ::= NameOrArray “,” [NameOrArray]; PNALName. NameOrArray ::= Name ; PNALArry. NameOrArray ::= “(” Name “(” [Name] “)” “,” DoRangePart “)” ; — Simple_stm —————————————————————— SIf. Simple_stm ::= “IF” “(” LExp “)” IfThenPart; PIfGoto. IfThenPart ::= “GO” “TO” Integer ; PIfAsgn. IfThenPart ::= Name “=” LExp; PIFAsnArr. IfThenPart ::= Name “(” [LExp] “)” “=” LExp ; PIfRetn. IfThenPart ::= “RETURN” ; PIfCall. IfThenPart ::= “CALL” Name “(” [SpecLExp] “)” ; PIfCallNil. IfThenPart ::= “CALL” Name ; Elor. LExp ::= LExp “.OR.” LExp2; Eland. LExp ::= LExp “.AND.” LExp2; Eeq. LExp2 ::= LExp2 “.EQ.” LExp3; Eneq. LExp2 ::= LExp2 “.NE.” LExp3; Elthen. LExp3 ::= LExp3 “.LT.” LExp4; Egrthen. LExp3 ::= LExp3 “.GT.” LExp4; Ele. LExp3 ::= LExp3 “.LE.” LExp4; Ege. LExp3 ::= LExp3 “.GE.” LExp4; Eplus. LExp4 ::= LExp4 “+” LExp5; Eminus. LExp4 ::= LExp4 “-” LExp5; Etimes. LExp5 ::= LExp5 “*” LExp6; Ediv. LExp5 ::= LExp5 “/” LExp6; Epreop. LExp6 ::= Unary_operator LExp7; Epower. LExp8 ::= LExp5 “**” LExp8; Efunk. LExp8 ::= LExp8 “(” “)”; Efunkpar. LExp8 ::= LExp8 “(” [SpecLExp] “)”; Evar. LExp9 ::= TIntVar RangePart ; Estr. LExp9 ::= SQString ; ERangeNull. RangePart ::= ; ERange. RangePart ::= “:” TIntVar ; ETInt. TIntVar ::= Integer; ETTrue. TIntVar ::= “.TRUE.”; ETFalse. TIntVar ::= “.FALSE.”; ETNameVar. TIntVar ::= Name; ETRead. TIntVar ::= “READ”; (:[]). [LExp] ::= LExp ; (:). [LExp] ::= LExp “,” [LExp]; _. LExp ::= LExp2 ; _. LExp2 ::= LExp3 ; _. LExp3 ::= LExp4 ; _. LExp4 ::= LExp5 ; _. LExp5 ::= LExp6 ; _. LExp6 ::= LExp7 ; _. LExp7 ::= LExp8 ; _. LExp8 ::= LExp9 ; _. LExp9 ::= LExp10 ; _. LExp10 ::= LExp11 ; _. LExp11 ::= “(” LExp “)” ; OUnaryPlus. Unary_operator ::= “+” ; OUnaryMinus. Unary_operator ::= “-” ; OUnaryNot. Unary_operator ::= “.NOT.” ; — Simple_stm —————————————————————— SAssign. Simple_stm ::= Name “=” LExp; SAsnArr. Simple_stm ::= Name “(” [LExp] “)” “=” LExp ; — Simple_stm —————————————————————— SFunCall. Simple_stm ::= “CALL” Name “(” [SpecLExp] “)” ; SFunCallNil. Simple_stm ::= “CALL” Name ; (:[]). [SpecLExp] ::= SpecLExp ; (:). [SpecLExp] ::= SpecLExp “,” [SpecLExp]; SpLExpNil. SpecLExp ::= ; SpLExpNot. SpecLExp ::= LExp; — Simple_stm —————————————————————— SGoto. Simple_stm ::= “GO” “TO” Integer ; — Simple_stm —————————————————————— SOpen. Simple_stm ::= “OPEN” “(” [AssignName] “)” ; (:[]). [AssignName] ::= AssignName ; (:). [AssignName] ::= AssignName “,” [AssignName]; PAsgnNm. AssignName ::= Name ; PAsgnInt. AssignName ::= Integer ; PAssign. AssignName ::= Name “=” LExp; SClose. Simple_stm ::= “CLOSE” “(” [AssignName] “)” ; — Simple_stm —————————————————————— SDo. Simple_stm ::= “DO” Integer DoRangePart ; PDoRange. DoRangePart ::= Name “=” LExp “,” LExp ; — Simple_stm —————————————————————— SStop. Simple_stm ::= “STOP” ; SStopMsg. Simple_stm ::= “STOP” SQString ; — Simple_stm —————————————————————— SEnd. Simple_stm ::= “END” ; — Simple_stm —————————————————————— SSubr. Simple_stm ::= “SUBROUTINE” Name “(” [SpecLExp] “)” ; SSubrNil. Simple_stm ::= “SUBROUTINE” Name ; SFunct. Simple_stm ::= “FUNCTION” Name “(” [SpecLExp] “)” ; SFunctNil. Simple_stm ::= “FUNCTION” Name ; — Simple_stm —————————————————————— SContinue. Simple_stm ::= “CONTINUE” ; — Simple_stm —————————————————————— SReturn. Simple_stm ::= “RETURN” ; — Simple_stm —————————————————————— SEquiv. Simple_stm ::= “EQUIVALENCE” “(” Name “,” NameOrArrRef “)” ; PNOAName. NameOrArrRef ::= Name ; PNOAArr. NameOrArrRef ::= Name “(” [LExp] “)” ; — Simple_stm —————————————————————— TInt. Type_Spec ::= “INTEGER” ; TFloat. Type_Spec ::= “REAL” ; TDouble. Type_Spec ::= “DOUBLE” ; TChar. Type_Spec ::= “CHARACTER” ; TByte. Type_Spec ::= “BYTE” ; TLogi. Type_Spec ::= “LOGICAL” ; comment “//” ; [/cc]The program seemed to successfully lex and parse all the Fortran-66 source code I had handy but the TestFortran program failed (seg fault) when it came to printing out the AST (showProgram(parse_tree) or printProgram(parse_tree) ). I suspect these are just due to some null pointer problems in the AST and once I figure out my own AST tree crawler I should be able to figure these out.