The front-end to my Fortran 2 C translator turned out a lot easier to create with BNFC than I was anticipating. I was expecting a lot more difficulties than actually happened. There are a few things I should mention as potential problems with BNFC that should be watched for.
First, BNFC is supposedly designed for well formed languages which I gathered from the documentation means that there should be no position dependent stuff like is typical in Fortran:
- Comments that start with a ‘C’ in the first column
- Continuation lines, either with a tab or 5 spaces then a continuation mark at the front of the line.
- Order of the Flex token matchings.
The first two were easily dispatched with a simple state based machine (source):
* Program to do some preprocessing on a Fortran file to deal with:
* "\nC" ==> "\n//" -- Comments, and
* "[ \t]*\n [0-9+][ \t]*" ==> "" -- Continuation lines
* "[ \t]*\n\t[0-9+][ \t]*" ==> "" -- Continuation lines
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#define BUF_SZ 1000
#define DEBUG false
char buf[BUF_SZ];
int bufidx=-1;
int state=0;
int CntComments=0;
int CntContinue=0;
void save(int chr){
buf[++bufidx]=(char)chr;
if(bufidx==BUF_SZ){
fprintf(stderr,"ERROR: Buffer Overflow\n");
exit(1);
}
}
void unsave(){
buf[bufidx--]=0;
if(bufidx<-1){
fprintf(stderr,"ERROR: Buffer Underflow\n");
exit(2);
}
}
void reset(){
memset(buf,0,BUF_SZ);
bufidx=-1;
state=0;
}
void purge(){
printf("%s",buf);
reset();
}
void asaprintf( const char * format, ... )
{
va_list args;
va_start (args, format);
if(DEBUG) vprintf (format, args);
va_end (args);
}
void newstate(int ns){
state=ns;
//asaprintf("<%d>",state);
}
int main(int argc,char* argv[]){
int chr=0;
int idx=0;
reset();
state=2; // Start in state 2 because first line in file may be a comment
while((chr=getchar())!=EOF){
asaprintf("%6d) state=%d chr='%c'(0x%02x)\n",idx++,state,chr,chr);
save(chr);
if(chr==0){
reset();
}else{
switch(state){
case 0:
switch(chr){
case ' ': break;
case '\t': break;
case '\n': newstate(2); break;
default: purge(); break;
};
break;
case 2:
switch(chr){
case ' ': newstate(6); break;
case 'C': case 'c':
unsave(); purge(); printf("//"); CntComments++; break;
case '\t': newstate(10); break;
default: purge(); break;
};
break;
case 6:
switch(chr){
case ' ': newstate(7); break;
default: purge(); break;
};
break;
case 7:
switch(chr){
case ' ': newstate(8); break;
default: purge(); break;
};
break;
case 8:
switch(chr){
case ' ': newstate(9); break;
default: purge(); break;
};
break;
case 9:
switch(chr){
case ' ': newstate(10); break;
default: purge(); break;
};
break;
case 10:
switch(chr){
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
case '+':
newstate(11); break;
default: purge(); break;
};
break;
case 11:
switch(chr){
case ' ': break;
case '\t': break;
default: reset(); putchar(chr); CntContinue++; break;
};
break;
}
}
}
printf("\n");
fprintf(stderr, "Counts:\n");
fprintf(stderr, " Comments: %5d\n",CntComments);
fprintf(stderr, " Continuations: %5d\n",CntContinue);
return(0);
}
Dealing with the lack of access to the Fortran.l (Flex lexical analysor for the program) turned out to not as much of a problem as I had anticipated. Mostly I just adjusted to having less control over the lexor and made a little bash script to modify the lexor to use ‘\n’ to trigger a End-of-Statement token (others suggest using a pre-processor to add some sort of line termination token at the end of each line of the source code to be translated). The other modification was to remove ‘\f’ ‘\n’ as white space and to fixup the broken Flex code (resulting from my inclusion of “\n” as the statement ending token in my BNFC file).
These fixups and other code to convert the LBNF (Labelled BNF) grammar, compile it and run a test program against it is as follows (source):
#
# Script file to translate a BNFC/LBNF grammar, compile it and run the resultant
# program against a Fortran file to be translated.
#
# (Script created by alan on 21/Apr/2019 18:35:42)
#
pgm=go
tmp=/tmp/${pgm}_
bin=${HOME}/bin
log=${bin}/data/${pgm}.$(uname -n).log
pgm=${1:-Maze.for}
# Remove program binary to make sure it doesn't run if we have a problem.
rm TestFortran 2>/dev/null
# Compile the LBNF grammar in Fortran.cf into a working front-end for translator.
bnfc -m -c Fortran.cf
# Touch the lex/yacc lexical analyser and parser source files to make sure they are compiled.
touch *.y *.l 2>/dev/null
# ----
# Modify the Makefile to add debug flags so output is more verbose.
cp Makefile Makefile.old
cat Makefile \
| sed "s/-PFortran$/-PFortran --debug/g" \
| sed "s/-pFortran$/-pFortran --debug -r all -g/g" \
> Makefile.new
cp Makefile.new Makefile
# Show user the difference in the Makefiles
echo "--- Makefile ---"
diff Makefile Makefile.old | sed "s/^/ /g"
# ----
# Modify the lexical analyser to make \n a statement end character (deficit in BNFC).
# The l1 file contains:
# "\\n" { ++yy_mylinenumber; return _SYMB_0; };
# This will trap any newlines, increment the line-number and return it as a token.
l1=$(cat l1 )
#echo "l1='$l1'"
# Do the other modifications to Fortran.l to fix the above problem.
cp Fortran.l Fortran.l.old
cat Fortran.l \
| grep -v "^<YYINITIAL>"$" \
| grep -v "^..\ ++yy_mylinenumber.;$" \
| sed "s/^"[ \t]*return _SYMB_0;$/${l1}/g" \
| sed "s/\\\n\\\f]/]+/g" \
> Fortran.l.new
cp Fortran.l.new Fortran.l
# Show changes to user.
echo "--- Fortran.l ---"
diff Fortran.l Fortran.l.old | sed "s/^/ /g"
echo "--- End dofix ---"
# Make our test program (TestFortran)
make
echo "========================================="
# Use state machine program to correct two problem features of Fortran.
# Fortran C-comments need to be changed to C-style line comments:
# "\nC" ==> "\n//" -- Comments, and
# Continuation lines need to be merged together.
# "[ \t]*\n [0-9+][ \t]*" ==> "" -- Continuation lines
# "[ \t]*\n\t[0-9+][ \t]*" ==> "" -- Continuation lines
#
./fixup <${pgm} >${tmp}for
# Run the complete program
./TestFortran <${tmp}for
#
# end of 'go' script file.
#
Mostly I studied the ‘C’ BNFC/LBNF grammar from the BNFC website and then created my own grammar.
Starting with the Name token (whos format is particular to to Fortran and more so Vax Fortran) and started to build my grammar from the Program definition.
The program definition is just a list of statements (either numbered or not). I’d run my ‘go’ script above. It would tell me the line in my Fortran source file where the error occurred and the last tokenizing rule (line number in the auto generated Fortran.l file) that was accepted.
From this I could determine which token was triggering the error and thus where my grammar was wrong. Finally I came up with the following LBNF grammar which seemed to lex/parse all my example Fortran programs without problem (source):
-- Fortran LBNF grammar (Fortran.cf) to recognize a Fortran 66/IV program.
--
-- Created by Alan Angold 2019-04-23
--
entrypoints Program, LblStm ;
-- Tokens ----------------------------------------------------------------------
position token Name ( '%' )* upper ( upper | digit | '$' | '_' )* ;
position token SQString '\'' ( char )+ '\'';
position token CFloat ((digit+ '.' digit+)(('e'|'E')('-')? digit+)?
('f'|'F'))|((digit+ ('e'|'E')('-')? digit+)('f'|'F'));
-- Tokens ----------------------------------------------------------------------
Progr. Program ::= [ LblStm ] ;
[]. [LblStm] ::= ;
(:). [LblStm] ::= LblStm "\n" [LblStm];
SLabel. LblStm ::= Labeled_stm ;
SSimple. LblStm ::= Simple_stm ;
SNill. LblStm ::= ;
SLabelOne. Labeled_stm ::= Integer Simple_stm ;
-- Simple_stm ------------------------------------------------------------------
SImplicit. Simple_stm ::= "IMPLICIT" Type_Spec Type_Qual "(" Name "-" Name ")" ;
QType. Type_Qual ::= "*" Integer ;
-- Simple_stm ------------------------------------------------------------------
SParameter. Simple_stm ::= "PARAMETER" [NameValue];
(:[]). [NameValue] ::= NameValue ;
(:). [NameValue] ::= NameValue "," [NameValue];
NVPair. NameValue ::= Name "=" Integer ;
-- Simple_stm ------------------------------------------------------------------
SDiment. Simple_stm ::= "DIMENSION" [ NameDim ] ;
(:[]). [NameDim] ::= NameDim ;
(:). [NameDim] ::= NameDim "," [NameDim];
PNameDim. NameDim ::= Name "(" [DExp] ")" ;
PNameDim2. NameDim ::= Name ;
(:[]). [DExp] ::= DExp ;
(:). [DExp] ::= DExp "," [DExp];
EDplus. DExp ::= DExp "+" DExp1;
EDminus. DExp ::= DExp "-" DExp1;
EDtimes. DExp1 ::= DExp1 "*" DExp2;
EDdiv. DExp1 ::= DExp1 "/" DExp2;
_. DExp ::= DExp1 ;
_. DExp1 ::= DExp2 ;
_. DExp2 ::= "(" DExp ")" ;
EDInt. DExp2 ::= Integer;
EDName. DExp2 ::= Name;
-- Simple_stm ------------------------------------------------------------------
SDeclQual. Simple_stm ::= Type_Spec Type_Qual [NameDim] ;
SDecl. Simple_stm ::= Type_Spec [NameDim] ;
-- Simple_stm ------------------------------------------------------------------
SData. Simple_stm ::= "DATA" [DataSeg];
(:[]). [DataSeg] ::= DataSeg ;
(:). [DataSeg] ::= DataSeg "," [DataSeg];
PDSeg. DataSeg ::= [Vars] "/" [DataVal] "/" ;
(:[]). [Vars] ::= Vars ;
(:). [Vars] ::= Vars "," [Vars];
PVars. Vars ::= Name ;
(:[]). [DataVal] ::= DataVal ;
(:). [DataVal] ::= DataVal "," [DataVal];
PDValPls. DataVal ::= "+" DataValType ;
PDValNeg. DataVal ::= "-" DataValType ;
PDValNil. DataVal ::= DataValType ;
PDVInt. DataValType ::= Integer ;
PDVFloat. DataValType ::= CFloat ;
PDVChar. DataValType ::= SQString ;
-- Simple_stm ------------------------------------------------------------------
SCommon. Simple_stm ::= "COMMON" "/" Name "/" [Name] ;
(:[]). [Name] ::= Name ;
(:). [Name] ::= Name "," [Name];
-- Simple_stm ------------------------------------------------------------------
SWrtEmp. Simple_stm ::= "WRITE" "(" [AssignName] ")" ;
SWrite. Simple_stm ::= "WRITE" "(" [AssignName] ")" [NameOrArray] ;
SFormat. Simple_stm ::= "FORMAT" "(" [FmtSpecs] ")" ;
(:[]). [FmtSpecs] ::= FmtSpecs ;
(:). [FmtSpecs] ::= FmtSpecs "," [FmtSpecs];
FSString. FmtSpecs ::= SQString ;
FSName. FmtSpecs ::= Name ;
FSINNL. FmtSpecs ::= "$";
FSSlash. FmtSpecs ::= "/";
-- Simple_stm ------------------------------------------------------------------
SRead. Simple_stm ::= "READ" "(" [AssignName] ")" [NameOrArray] ;
SAsignRead. Simple_stm ::= "READ" "=" LExp;
(:[]). [NameOrArray] ::= NameOrArray ;
(:). [NameOrArray] ::= NameOrArray "," [NameOrArray];
PNALName. NameOrArray ::= Name ;
PNALArry. NameOrArray ::= "(" Name "(" [Name] ")" "," DoRangePart ")" ;
-- Simple_stm ------------------------------------------------------------------
SIf. Simple_stm ::= "IF" "(" LExp ")" IfThenPart;
PIfGoto. IfThenPart ::= "GO" "TO" Integer ;
PIfAsgn. IfThenPart ::= Name "=" LExp;
PIFAsnArr. IfThenPart ::= Name "(" [LExp] ")" "=" LExp ;
PIfRetn. IfThenPart ::= "RETURN" ;
PIfCall. IfThenPart ::= "CALL" Name "(" [SpecLExp] ")" ;
PIfCallNil. IfThenPart ::= "CALL" Name ;
Elor. LExp ::= LExp ".OR." LExp2;
Eland. LExp ::= LExp ".AND." LExp2;
Eeq. LExp2 ::= LExp2 ".EQ." LExp3;
Eneq. LExp2 ::= LExp2 ".NE." LExp3;
Elthen. LExp3 ::= LExp3 ".LT." LExp4;
Egrthen. LExp3 ::= LExp3 ".GT." LExp4;
Ele. LExp3 ::= LExp3 ".LE." LExp4;
Ege. LExp3 ::= LExp3 ".GE." LExp4;
Eplus. LExp4 ::= LExp4 "+" LExp5;
Eminus. LExp4 ::= LExp4 "-" LExp5;
Etimes. LExp5 ::= LExp5 "*" LExp6;
Ediv. LExp5 ::= LExp5 "/" LExp6;
Epreop. LExp6 ::= Unary_operator LExp7;
Epower. LExp8 ::= LExp5 "**" LExp8;
Efunk. LExp8 ::= LExp8 "(" ")";
Efunkpar. LExp8 ::= LExp8 "(" [SpecLExp] ")";
Evar. LExp9 ::= TIntVar RangePart ;
Estr. LExp9 ::= SQString ;
ERangeNull. RangePart ::= ;
ERange. RangePart ::= ":" TIntVar ;
ETInt. TIntVar ::= Integer;
ETTrue. TIntVar ::= ".TRUE.";
ETFalse. TIntVar ::= ".FALSE.";
ETNameVar. TIntVar ::= Name;
ETRead. TIntVar ::= "READ";
(:[]). [LExp] ::= LExp ;
(:). [LExp] ::= LExp "," [LExp];
_. LExp ::= LExp2 ;
_. LExp2 ::= LExp3 ;
_. LExp3 ::= LExp4 ;
_. LExp4 ::= LExp5 ;
_. LExp5 ::= LExp6 ;
_. LExp6 ::= LExp7 ;
_. LExp7 ::= LExp8 ;
_. LExp8 ::= LExp9 ;
_. LExp9 ::= LExp10 ;
_. LExp10 ::= LExp11 ;
_. LExp11 ::= "(" LExp ")" ;
OUnaryPlus. Unary_operator ::= "+" ;
OUnaryMinus. Unary_operator ::= "-" ;
OUnaryNot. Unary_operator ::= ".NOT." ;
-- Simple_stm ------------------------------------------------------------------
SAssign. Simple_stm ::= Name "=" LExp;
SAsnArr. Simple_stm ::= Name "(" [LExp] ")" "=" LExp ;
-- Simple_stm ------------------------------------------------------------------
SFunCall. Simple_stm ::= "CALL" Name "(" [SpecLExp] ")" ;
SFunCallNil. Simple_stm ::= "CALL" Name ;
(:[]). [SpecLExp] ::= SpecLExp ;
(:). [SpecLExp] ::= SpecLExp "," [SpecLExp];
SpLExpNil. SpecLExp ::= ;
SpLExpNot. SpecLExp ::= LExp;
-- Simple_stm ------------------------------------------------------------------
SGoto. Simple_stm ::= "GO" "TO" Integer ;
-- Simple_stm ------------------------------------------------------------------
SOpen. Simple_stm ::= "OPEN" "(" [AssignName] ")" ;
(:[]). [AssignName] ::= AssignName ;
(:). [AssignName] ::= AssignName "," [AssignName];
PAsgnNm. AssignName ::= Name ;
PAsgnInt. AssignName ::= Integer ;
PAssign. AssignName ::= Name "=" LExp;
SClose. Simple_stm ::= "CLOSE" "(" [AssignName] ")" ;
-- Simple_stm ------------------------------------------------------------------
SDo. Simple_stm ::= "DO" Integer DoRangePart ;
PDoRange. DoRangePart ::= Name "=" LExp "," LExp ;
-- Simple_stm ------------------------------------------------------------------
SStop. Simple_stm ::= "STOP" ;
SStopMsg. Simple_stm ::= "STOP" SQString ;
-- Simple_stm ------------------------------------------------------------------
SEnd. Simple_stm ::= "END" ;
-- Simple_stm ------------------------------------------------------------------
SSubr. Simple_stm ::= "SUBROUTINE" Name "(" [SpecLExp] ")" ;
SSubrNil. Simple_stm ::= "SUBROUTINE" Name ;
SFunct. Simple_stm ::= "FUNCTION" Name "(" [SpecLExp] ")" ;
SFunctNil. Simple_stm ::= "FUNCTION" Name ;
-- Simple_stm ------------------------------------------------------------------
SContinue. Simple_stm ::= "CONTINUE" ;
-- Simple_stm ------------------------------------------------------------------
SReturn. Simple_stm ::= "RETURN" ;
-- Simple_stm ------------------------------------------------------------------
SEquiv. Simple_stm ::= "EQUIVALENCE" "(" Name "," NameOrArrRef ")" ;
PNOAName. NameOrArrRef ::= Name ;
PNOAArr. NameOrArrRef ::= Name "(" [LExp] ")" ;
-- Simple_stm ------------------------------------------------------------------
TInt. Type_Spec ::= "INTEGER" ;
TFloat. Type_Spec ::= "REAL" ;
TDouble. Type_Spec ::= "DOUBLE" ;
TChar. Type_Spec ::= "CHARACTER" ;
TByte. Type_Spec ::= "BYTE" ;
TLogi. Type_Spec ::= "LOGICAL" ;
comment "//" ;
The program seemed to successfully lex and parse all the Fortran-66 source code I had handy but the TestFortran program failed (seg fault) when it came to printing out the AST (showProgram(parse_tree) or printProgram(parse_tree) ). I suspect these are just due to some null pointer problems in the AST and once I figure out my own AST tree crawler I should be able to figure these out.