BNFC is a great tool but it has some quirks that have slowed down the process of building a front-end for my fortran2c translator. However, the code generation, especially if you haven’t built a compiler before moves you quite a distance forward. Here’s what I’ve found so far…
Position dependent code:
First, in the lexical analyzer (the part that converts from a character stream to a token stream) BNFC is fairly inflexible. In the case of Fortran and other older languages there are position dependent tokens. Some examples come to mind: The comment, the continuation line, label-numbers and the sequence column. These are illustrated in the file segment (Maze.for) below:
C MAZE DESCRIPTION
C
180 WRITE(6,190) HEIGHT,WIDTH,DEPTH
190 FORMAT('0',' YOUR MAZE HAS A HEIGHT OF',I5,/,
1 ' AND A WIDTH OF',I5,/,
1 ' WITH A DEPTH OF',I5,//,
2 ' THE DIRECTION COMMANDS FOR MAZE ARE SINGLE LETTERS',/,
2 ' N(ORTH), U(P), OR 8 IS UP',/,
2 ' E(AST) , R(IGHT), OR 6 IS RIGHT',/,
2 ' S(OUTH), D(OWN), OR 2 IS DOWN',/,
2 ' W(EST) , L(EFT), OR 4 IS LEFT',/,
2 ' I(N) , OR 9 IS IN TO SCREEN',/,
2 ' O(UT) , OR 7 IS OUT OF SCREEN',/,
Comments start with a “C” in column one and go to the end-of-line. Continuation lines start at the beginning with a tab or 5 spaces, have a continuation mark (usually 0-9 or ‘+’) then a space and the body of the continuation. Labels are numbers preceded with spaces and ending at column 5.
Unfortunately I don’t have a example of code with a sequence column. These were basically 8-digit numbers in columns 73-80. Code including the above continuation/label-number preface to statements was from column 1 to 72 with 73-80 being left over for the sequence number. I believe this sequence number was a holdover from the old punched card days when people would do a 52-card pickup with a program deck (which usually went far beyond 52 cards) and needed a way to sort it back into a program (by machine).
All these things need Flex code that allows for multiple states (i.e. <prefix>, <seqno>, <statement>) which doesn’t seem to be a possibility in BNFC.
What I ended up doing was writing a small state-machine program (in C) that converted comments to C-like ‘//’ comments and just joined continuation lines into one long line (since we aren’t working on 80-column punch cards anymore). The code below can be compiled with the command (I’m running Ubuntu/linux with the GNU gcc compiler):
g++ -std=c++11 -g fixup.c -o fixup
* Program to do some preprocessing on a Fortran file to deal with:
* "\nC" ==> "\n//" -- Comments, and
* "[ \t]*\n [0-9+][ \t]*" ==> "" -- Continuation lines
* "[ \t]*\n\t[0-9+][ \t]*" ==> "" -- Continuation lines
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#define BUF_SZ 1000
#define DEBUG false
char buf[BUF_SZ];
int bufidx=-1;
int state=0;
int CntComments=0;
int CntContinue=0;
void save(int chr){
buf[++bufidx]=(char)chr;
if(bufidx==BUF_SZ){
fprintf(stderr,"ERROR: Buffer Overflow\n");
exit(1);
}
}
void unsave(){
buf[bufidx--]=0;
if(bufidx<-1){
fprintf(stderr,"ERROR: Buffer Underflow\n");
exit(2);
}
}
void reset(){
memset(buf,0,BUF_SZ);
bufidx=-1;
state=0;
}
void purge(){
printf("%s",buf);
reset();
}
void asaprintf( const char * format, ... )
{
va_list args;
va_start (args, format);
if(DEBUG) vprintf (format, args);
va_end (args);
}
void newstate(int ns){
state=ns;
//asaprintf("<%d>",state);
}
int main(int argc,char* argv[]){
int chr=0;
int idx=0;
reset();
state=2; // Start in state 2 because first line in file may be a comment
while((chr=getchar())!=EOF){
asaprintf("%6d) state=%d chr='%c'(0x%02x)\n",idx++,state,chr,chr);
save(chr);
if(chr==0){
reset();
}else{
switch(state){
case 0:
switch(chr){
case ' ': break;
case '\t': break;
case '\n': newstate(2); break;
default: purge(); break;
};
break;
case 2:
switch(chr){
case ' ': newstate(6); break;
case 'C': case 'c':
unsave(); purge(); printf("//"); CntComments++; break;
case '\t': newstate(10); break;
default: purge(); break;
};
break;
case 6:
switch(chr){
case ' ': newstate(7); break;
default: purge(); break;
};
break;
case 7:
switch(chr){
case ' ': newstate(8); break;
default: purge(); break;
};
break;
case 8:
switch(chr){
case ' ': newstate(9); break;
default: purge(); break;
};
break;
case 9:
switch(chr){
case ' ': newstate(10); break;
default: purge(); break;
};
break;
case 10:
switch(chr){
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
case '+':
newstate(11); break;
default: purge(); break;
};
break;
case 11:
switch(chr){
case ' ': break;
case '\t': break;
default: reset(); putchar(chr); CntContinue++; break;
};
break;
}
}
}
printf("\n");
fprintf(stderr, "Counts:\n");
fprintf(stderr, " Comments: %5d\n",CntComments);
fprintf(stderr, " Continuations: %5d\n",CntContinue);
return(0);
}
The Maze.for program then became (Maze_pp.for):
// MAZE - USES A VT100 TO WANDER AROUND.
// THE VT100 MUST HAVE ADVANCED VIDEO OPTION.
// ANSI VT100 ESCAPE SEQUENCES ARE USED.
//
// WRITTEN BY DON MCLEAN
// OF THE MACNEAL-SCHWENDLER CORP.
//
// THE PURPOSE OF THIS PROGRAM WAS TO
// 1. LEARN SOMETHING ABOUT THE VT100 GRAPHICS.
// 2. KEEP MY KIDS BUSY ON WEEKENDS. WHILE I TRIED
// TO GET SOMETHING ELSE DONE.
//
// USE OF THIS PROGRAM FOR ANY PURPOSE OTHER THAN FUN
// IS PROHIBITED.
//
IMPLICIT INTEGER*4 (A-Z)
//
// MAZE DIMENSIONS
// HMAX AND WMAX SHOULD NOT BE LARGER THAN 22 AND 80 RESP.
//
PARAMETER HMAX=22, WMAX=80, DMAX=4
//
DIMENSION SLEEP(2)
//
// DIMENSION IS HMAX*WMAX*DMAX
INTEGER*2 EXIT(HMAX*WMAX*DMAX), MAT(HMAX*WMAX*DMAX)
INTEGER*2 LCOUNT(DMAX)
//
BYTE CLEAR(2)
//
CHARACTER*200 INPUT
//
COMMON /MAZECM/ STARTH,STARTW,STARTD,ENDH,ENDW,ENDD,NOBELL
//
// CLEAR IS A VT100 RESET
//
DATA CLEAR / 27, 'c' /
//
// START - SEE IF AN OLD GAME IS TO BE USED.
//
WRITE(6,10)
10 FORMAT(' WELCOME TO MAZE')
//
20 WRITE(6,30)
30 FORMAT(' ARE YOU GOING TO PLAY A SAVED GAME? ',$)
READ(5,40) NC,INPUT
40 FORMAT(Q,A)
IF(INDEX(INPUT(1:NC),'Y').NE.0) GO TO 120
SAVE = 0
//
// INPUT DIMENSION OF MAZE
//
50 WRITE(6,60) HMAX
60 FORMAT(' PLEASE INPUT HEIGHT OF MAZE - DEFAULT = ',I2,' ',$)
READ(5,40) NC,INPUT
READ(INPUT,70,ERR=50) HEIGHT
70 FORMAT(BNI2)
IF(HEIGHT.EQ.0) HEIGHT=HMAX
IF(HEIGHT.LT.2) HEIGHT=2
IF(HEIGHT.GT.HMAX) HEIGHT=HMAX
80 WRITE(6,90) WMAX
90 FORMAT(' PLEASE INPUT WIDTH OF MAZE - DEFAULT = ',I2,' ',$)
READ(5,40) NC,INPUT
READ(INPUT,70,ERR=80) WIDTH
IF(WIDTH.EQ.0) WIDTH = WMAX
IF(WIDTH.LT.2) WIDTH=2
IF(WIDTH.GT.WMAX) WIDTH=WMAX
100 WRITE(6,110)
110 FORMAT(' PLEASE INPUT DEPTH OF MAZE - DEFAULT = 1 ',$)
READ(5,40) NC,INPUT
READ(INPUT,70,ERR=100) DEPTH
IF(DEPTH.LE.0) DEPTH = 1
IF(DEPTH.GT.DMAX) DEPTH = DMAX
NTERMS = HEIGHT * WIDTH * DEPTH
...
Symbols:
The next problem that encountered were the symbols in the BNFC lexer/parser, _SYMB_43, for example. Yuck! I would have been shot at a code review for that. Here’s what the lexer looked like (Fortran.l.bkp):
%option noyywrap
%{
#define yylval Fortranlval
#define YY_BUFFER_APPEND Fortran_BUFFER_APPEND
#define YY_BUFFER_RESET Fortran_BUFFER_RESET
#define initialize_lexer Fortran_initialize_lexer
#include <string.h>
#include "Parser.h"
#define YY_BUFFER_LENGTH 4096
extern int yy_mylinenumber ;
char YY_PARSED_STRING[YY_BUFFER_LENGTH];
void YY_BUFFER_APPEND(char *s)
{
strcat(YY_PARSED_STRING, s); //Do something better here!
}
void YY_BUFFER_RESET(void)
{
int x;
for(x = 0; x < YY_BUFFER_LENGTH; x++)
YY_PARSED_STRING[x] = 0;
}
%}
LETTER [a-zA-Z]
CAPITAL [A-Z]
SMALL [a-z]
DIGIT [0-9]
IDENT [a-zA-Z0-9'_]
%START YYINITIAL COMMENT CHAR CHARESC CHAREND STRING ESCAPED
%%
<YYINITIAL>"
" return _SYMB_0;
<YYINITIAL>"(" return _SYMB_1;
<YYINITIAL>"-" return _SYMB_2;
<YYINITIAL>")" return _SYMB_3;
<YYINITIAL>"*" return _SYMB_4;
<YYINITIAL>"," return _SYMB_5;
<YYINITIAL>"=" return _SYMB_6;
<YYINITIAL>"+" return _SYMB_7;
<YYINITIAL>"/" return _SYMB_8;
<YYINITIAL>"$" return _SYMB_9;
<YYINITIAL>".OR." return _SYMB_10;
<YYINITIAL>".AND." return _SYMB_11;
<YYINITIAL>".EQ." return _SYMB_12;
<YYINITIAL>".NE." return _SYMB_13;
<YYINITIAL>".LT." return _SYMB_14;
<YYINITIAL>".GT." return _SYMB_15;
<YYINITIAL>".LE." return _SYMB_16;
<YYINITIAL>".GE." return _SYMB_17;
<YYINITIAL>"**" return _SYMB_18;
<YYINITIAL>":" return _SYMB_19;
<YYINITIAL>".TRUE." return _SYMB_20;
<YYINITIAL>".FALSE." return _SYMB_21;
<YYINITIAL>".NOT." return _SYMB_22;
<YYINITIAL>"BYTE" return _SYMB_23;
<YYINITIAL>"CALL" return _SYMB_24;
<YYINITIAL>"CHARACTER" return _SYMB_25;
<YYINITIAL>"CLOSE" return _SYMB_26;
<YYINITIAL>"COMMON" return _SYMB_27;
<YYINITIAL>"CONTINUE" return _SYMB_28;
<YYINITIAL>"DATA" return _SYMB_29;
<YYINITIAL>"DIMENSION" return _SYMB_30;
<YYINITIAL>"DO" return _SYMB_31;
<YYINITIAL>"DOUBLE" return _SYMB_32;
<YYINITIAL>"END" return _SYMB_33;
<YYINITIAL>"EQUIVALENCE" return _SYMB_34;
<YYINITIAL>"FORMAT" return _SYMB_35;
<YYINITIAL>"FUNCTION" return _SYMB_36;
<YYINITIAL>"GO" return _SYMB_37;
<YYINITIAL>"IF" return _SYMB_38;
<YYINITIAL>"IMPLICIT" return _SYMB_39;
<YYINITIAL>"INTEGER" return _SYMB_40;
<YYINITIAL>"LOGICAL" return _SYMB_41;
<YYINITIAL>"OPEN" return _SYMB_42;
<YYINITIAL>"PARAMETER" return _SYMB_43;
<YYINITIAL>"READ" return _SYMB_44;
<YYINITIAL>"REAL" return _SYMB_45;
<YYINITIAL>"RETURN" return _SYMB_46;
<YYINITIAL>"STOP" return _SYMB_47;
<YYINITIAL>"SUBROUTINE" return _SYMB_48;
<YYINITIAL>"TO" return _SYMB_49;
<YYINITIAL>"WRITE" return _SYMB_50;
<YYINITIAL>"//"[^\n]*\n ++yy_mylinenumber; /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)* yylval.string_ = strdup(yytext); return _SYMB_51;
<YYINITIAL>'.+' yylval.string_ = strdup(yytext); return _SYMB_52;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F) yylval.string_ = strdup(yytext); return _SYMB_53;
<YYINITIAL>{DIGIT}+ yylval.int_ = atoi(yytext); return _INTEGER_;
\n ++yy_mylinenumber ;
<YYINITIAL>[ \t\r\n\f] /* ignore white space. */;
<YYINITIAL>. return _ERROR_;
%%
void initialize_lexer(FILE *inp) { yyrestart(inp); BEGIN YYINITIAL; }
The parser(Fortran.y.bkp) was worse … I wish they had some way of converting these symbols to something more human readable:
%%
Program : ListLblStm { $$ = make_Progr(reverseListLblStm($1)); YY_RESULT_Program_= $$; }
;
ListLblStm : /* empty */ { $$ = 0; }
| ListLblStm LblStm _SYMB_0 { $$ = make_ListLblStm($2, $1); }
;
LblStm : Labeled_stm { $$ = make_SLabel($1); YY_RESULT_LblStm_= $$; }
| Simple_stm { $$ = make_SSimple($1); YY_RESULT_LblStm_= $$; }
| /* empty */ { $$ = make_SNill(); YY_RESULT_LblStm_= $$; }
;
Labeled_stm : _INTEGER_ Simple_stm { $$ = make_SLabelOne($1, $2); }
;
Simple_stm : _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3 { $$ = make_SImplicit($2, $3, $5, $7); }
| _SYMB_43 ListNameValue { $$ = make_SParameter($2); }
| _SYMB_30 ListNameDim { $$ = make_SDiment($2); }
| Type_Spec Type_Qual ListNameDim { $$ = make_SDeclQual($1, $2, $3); }
| Type_Spec ListNameDim { $$ = make_SDecl($1, $2); }
| _SYMB_29 ListDataSeg { $$ = make_SData($2); }
| _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName { $$ = make_SCommon($3, $5); }
| _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SWrtEmp($3); }
| _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SWrite($3, $5); }
| _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3 { $$ = make_SFormat($3); }
| _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray { $$ = make_SRead($3, $5); }
| _SYMB_44 _SYMB_6 LExp { $$ = make_SAsignRead($3); }
| _SYMB_38 _SYMB_1 LExp _SYMB_3 IfThenPart { $$ = make_SIf($3, $5); }
| _SYMB_51 _SYMB_6 LExp { $$ = make_SAssign($1, $3); }
| _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_SAsnArr($1, $3, $6); }
| _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunCall($2, $4); }
| _SYMB_24 _SYMB_51 { $$ = make_SFunCallNil($2); }
| _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_SGoto($3); }
| _SYMB_42 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SOpen($3); }
| _SYMB_26 _SYMB_1 ListAssignName _SYMB_3 { $$ = make_SClose($3); }
| _SYMB_31 _INTEGER_ DoRangePart { $$ = make_SDo($2, $3); }
| _SYMB_47 { $$ = make_SStop(); }
| _SYMB_47 _SYMB_52 { $$ = make_SStopMsg($2); }
| _SYMB_33 { $$ = make_SEnd(); }
| _SYMB_48 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SSubr($2, $4); }
| _SYMB_48 _SYMB_51 { $$ = make_SSubrNil($2); }
| _SYMB_36 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_SFunct($2, $4); }
| _SYMB_36 _SYMB_51 { $$ = make_SFunctNil($2); }
| _SYMB_28 { $$ = make_SContinue(); }
| _SYMB_46 { $$ = make_SReturn(); }
| _SYMB_34 _SYMB_1 _SYMB_51 _SYMB_5 NameOrArrRef _SYMB_3 { $$ = make_SEquiv($3, $5); }
;
Type_Qual : _SYMB_4 _INTEGER_ { $$ = make_QType($2); }
;
ListNameValue : NameValue { $$ = make_ListNameValue($1, 0); }
| NameValue _SYMB_5 ListNameValue { $$ = make_ListNameValue($1, $3); }
;
NameValue : _SYMB_51 _SYMB_6 _INTEGER_ { $$ = make_NVPair($1, $3); }
;
ListNameDim : NameDim { $$ = make_ListNameDim($1, 0); }
| NameDim _SYMB_5 ListNameDim { $$ = make_ListNameDim($1, $3); }
;
NameDim : _SYMB_51 _SYMB_1 ListDExp _SYMB_3 { $$ = make_PNameDim($1, $3); }
| _SYMB_51 { $$ = make_PNameDim2($1); }
;
ListDExp : DExp { $$ = make_ListDExp($1, 0); }
| DExp _SYMB_5 ListDExp { $$ = make_ListDExp($1, $3); }
;
DExp : DExp _SYMB_7 DExp1 { $$ = make_EDplus($1, $3); }
| DExp _SYMB_2 DExp1 { $$ = make_EDminus($1, $3); }
| DExp1 { $$ = $1; }
;
DExp1 : DExp1 _SYMB_4 DExp2 { $$ = make_EDtimes($1, $3); }
| DExp1 _SYMB_8 DExp2 { $$ = make_EDdiv($1, $3); }
| DExp2 { $$ = $1; }
;
DExp2 : _SYMB_1 DExp _SYMB_3 { $$ = $2; }
| _INTEGER_ { $$ = make_EDInt($1); }
| _SYMB_51 { $$ = make_EDName($1); }
;
ListDataSeg : DataSeg { $$ = make_ListDataSeg($1, 0); }
| DataSeg _SYMB_5 ListDataSeg { $$ = make_ListDataSeg($1, $3); }
;
DataSeg : ListVars _SYMB_8 ListDataVal _SYMB_8 { $$ = make_PDSeg($1, $3); }
;
ListVars : Vars { $$ = make_ListVars($1, 0); }
| Vars _SYMB_5 ListVars { $$ = make_ListVars($1, $3); }
;
Vars : _SYMB_51 { $$ = make_PVars($1); }
;
ListDataVal : DataVal { $$ = make_ListDataVal($1, 0); }
| DataVal _SYMB_5 ListDataVal { $$ = make_ListDataVal($1, $3); }
;
DataVal : _SYMB_7 DataValType { $$ = make_PDValPls($2); }
| _SYMB_2 DataValType { $$ = make_PDValNeg($2); }
| DataValType { $$ = make_PDValNil($1); }
;
DataValType : _INTEGER_ { $$ = make_PDVInt($1); }
| _SYMB_53 { $$ = make_PDVFloat($1); }
| _SYMB_52 { $$ = make_PDVChar($1); }
;
ListName : _SYMB_51 { $$ = make_ListName($1, 0); }
| _SYMB_51 _SYMB_5 ListName { $$ = make_ListName($1, $3); }
;
ListFmtSpecs : FmtSpecs { $$ = make_ListFmtSpecs($1, 0); }
| FmtSpecs _SYMB_5 ListFmtSpecs { $$ = make_ListFmtSpecs($1, $3); }
;
FmtSpecs : _SYMB_52 { $$ = make_FSString($1); }
| _SYMB_51 { $$ = make_FSName($1); }
| _SYMB_9 { $$ = make_FSINNL(); }
| _SYMB_8 { $$ = make_FSSlash(); }
;
ListNameOrArray : NameOrArray { $$ = make_ListNameOrArray($1, 0); }
| NameOrArray _SYMB_5 ListNameOrArray { $$ = make_ListNameOrArray($1, $3); }
;
NameOrArray : _SYMB_51 { $$ = make_PNALName($1); }
| _SYMB_1 _SYMB_51 _SYMB_1 ListName _SYMB_3 _SYMB_5 DoRangePart _SYMB_3 { $$ = make_PNALArry($2, $4, $7); }
;
IfThenPart : _SYMB_37 _SYMB_49 _INTEGER_ { $$ = make_PIfGoto($3); }
| _SYMB_51 _SYMB_6 LExp { $$ = make_PIfAsgn($1, $3); }
| _SYMB_51 _SYMB_1 ListLExp _SYMB_3 _SYMB_6 LExp { $$ = make_PIFAsnArr($1, $3, $6); }
| _SYMB_46 { $$ = make_PIfRetn(); }
| _SYMB_24 _SYMB_51 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_PIfCall($2, $4); }
| _SYMB_24 _SYMB_51 { $$ = make_PIfCallNil($2); }
;
LExp : LExp _SYMB_10 LExp2 { $$ = make_Elor($1, $3); }
| LExp _SYMB_11 LExp2 { $$ = make_Eland($1, $3); }
| LExp2 { $$ = $1; }
;
LExp2 : LExp2 _SYMB_12 LExp3 { $$ = make_Eeq($1, $3); }
| LExp2 _SYMB_13 LExp3 { $$ = make_Eneq($1, $3); }
| LExp3 { $$ = $1; }
;
LExp3 : LExp3 _SYMB_14 LExp4 { $$ = make_Elthen($1, $3); }
| LExp3 _SYMB_15 LExp4 { $$ = make_Egrthen($1, $3); }
| LExp3 _SYMB_16 LExp4 { $$ = make_Ele($1, $3); }
| LExp3 _SYMB_17 LExp4 { $$ = make_Ege($1, $3); }
| LExp4 { $$ = $1; }
;
LExp4 : LExp4 _SYMB_7 LExp5 { $$ = make_Eplus($1, $3); }
| LExp4 _SYMB_2 LExp5 { $$ = make_Eminus($1, $3); }
| LExp5 { $$ = $1; }
;
LExp5 : LExp5 _SYMB_4 LExp6 { $$ = make_Etimes($1, $3); }
| LExp5 _SYMB_8 LExp6 { $$ = make_Ediv($1, $3); }
| LExp6 { $$ = $1; }
;
LExp6 : Unary_operator LExp8 { $$ = make_Epreop($1, $2); }
| LExp8 { $$ = $1; }
;
LExp8 : LExp5 _SYMB_18 LExp8 { $$ = make_Epower($1, $3); }
| LExp8 _SYMB_1 _SYMB_3 { $$ = make_Efunk($1); }
| LExp8 _SYMB_1 ListSpecLExp _SYMB_3 { $$ = make_Efunkpar($1, $3); }
| LExp9 { $$ = $1; }
;
LExp9 : TIntVar RangePart { $$ = make_Evar($1, $2); }
| _SYMB_52 { $$ = make_Estr($1); }
| LExp10 { $$ = $1; }
;
RangePart : /* empty */ { $$ = make_ERangeNull(); }
| _SYMB_19 TIntVar { $$ = make_ERange($2); }
;
TIntVar : _INTEGER_ { $$ = make_ETInt($1); }
| _SYMB_20 { $$ = make_ETTrue(); }
| _SYMB_21 { $$ = make_ETFalse(); }
| _SYMB_51 { $$ = make_ETNameVar($1); }
| _SYMB_44 { $$ = make_ETRead(); }
;
ListLExp : LExp { $$ = make_ListLExp($1, 0); }
| LExp _SYMB_5 ListLExp { $$ = make_ListLExp($1, $3); }
;
LExp10 : LExp11 { $$ = $1; }
;
LExp11 : _SYMB_1 LExp _SYMB_3 { $$ = $2; }
;
Unary_operator : _SYMB_7 { $$ = make_OUnaryPlus(); }
| _SYMB_2 { $$ = make_OUnaryMinus(); }
| _SYMB_22 { $$ = make_OUnaryNot(); }
;
ListSpecLExp : SpecLExp { $$ = make_ListSpecLExp($1, 0); }
| SpecLExp _SYMB_5 ListSpecLExp { $$ = make_ListSpecLExp($1, $3); }
;
SpecLExp : LExp { $$ = make_SpLExpNot($1); }
;
ListAssignName : AssignName { $$ = make_ListAssignName($1, 0); }
| AssignName _SYMB_5 ListAssignName { $$ = make_ListAssignName($1, $3); }
;
AssignName : _SYMB_51 { $$ = make_PAsgnNm($1); }
| _INTEGER_ { $$ = make_PAsgnInt($1); }
| _SYMB_51 _SYMB_6 LExp { $$ = make_PAssign($1, $3); }
;
DoRangePart : _SYMB_51 _SYMB_6 LExp _SYMB_5 LExp { $$ = make_PDoRange($1, $3, $5); }
;
NameOrArrRef : _SYMB_51 { $$ = make_PNOAName($1); }
| _SYMB_51 _SYMB_1 ListLExp _SYMB_3 { $$ = make_PNOAArr($1, $3); }
;
Type_Spec : _SYMB_40 { $$ = make_TInt(); }
| _SYMB_45 { $$ = make_TFloat(); }
| _SYMB_32 { $$ = make_TDouble(); }
| _SYMB_25 { $$ = make_TChar(); }
| _SYMB_23 { $$ = make_TByte(); }
| _SYMB_41 { $$ = make_TLogi(); }
;
Fortunately, linux helps with that. I built a little script file that changes these cryptic symbols for something a little more tolerable:
sed -f symbols Fortran.y >Fortran.yy
cp Fortran.y Fortran.y.bkp
cp Fortran.yy Fortran.y
sed -f symbols Fortran.l >Fortran.ll
cp Fortran.l Fortran.l.bkp
cp Fortran.ll Fortran.l
… and the associated ‘symbols’ file. NOTE the order of the ‘symbols’ commands. The _SYMB_1 was changed after the _SYMB_1? symbols otherwise the sed editor would have changed partial symbols:
s/_SYMB_11/T_AND/g
s/_SYMB_12/T_EQ/g
s/_SYMB_13/T_NE/g
s/_SYMB_14/T_LT/g
s/_SYMB_15/T_GT/g
s/_SYMB_16/T_LE/g
s/_SYMB_17/T_GE/g
s/_SYMB_18/T_POW/g
s/_SYMB_19/T_COLON/g
s/_SYMB_20/T_TRUE/g
s/_SYMB_21/T_FALSE/g
s/_SYMB_22/T_NOT/g
s/_SYMB_23/T_BYTE/g
s/_SYMB_24/T_CALL/g
s/_SYMB_25/T_CHAR/g
s/_SYMB_26/T_CLOSE/g
s/_SYMB_27/T_COMM/g
s/_SYMB_28/T_CONT/g
s/_SYMB_29/T_DATA/g
s/_SYMB_30/T_DIMS/g
s/_SYMB_31/T_DO/g
s/_SYMB_32/T_DBL/g
s/_SYMB_33/T_END/g
s/_SYMB_34/T_EQU/g
s/_SYMB_35/T_FMT/g
s/_SYMB_36/T_FUNC/g
s/_SYMB_37/T_GO/g
s/_SYMB_38/T_IF/g
s/_SYMB_39/T_IMPL/g
s/_SYMB_40/T_INT/g
s/_SYMB_41/T_LOGI/g
s/_SYMB_42/T_OPEN/g
s/_SYMB_43/T_PARM/g
s/_SYMB_44/T_READ/g
s/_SYMB_45/T_REAL/g
s/_SYMB_46/T_RTN/g
s/_SYMB_47/T_STOP/g
s/_SYMB_48/T_SUBR/g
s/_SYMB_49/T_TO/g
s/_SYMB_50/T_WRITE/g
s/_SYMB_51/T_NAME/g
s/_SYMB_52/T_SQSTR/g
s/_SYMB_53/T_CFLT/g
s/_SYMB_0/T_NEWLINE/g
s/_SYMB_1/T_LPAREN/g
s/_SYMB_2/T_MINUS/g
s/_SYMB_3/T_RPAREN/g
s/_SYMB_4/T_MULT/g
s/_SYMB_5/T_COMMA/g
s/_SYMB_6/T_EQUALS/g
s/_SYMB_7/T_PLUS/g
s/_SYMB_8/T_DIV/g
s/_SYMB_9/T_DOLLAR/g
Makefile:
The Makefile was also a problem (Makefile.old). It is auto generated by BNFC which is great but it had no way to turn on parser debugging which is rather important when creating a frontend from scratch. Bison will often complain about shift/reduce and reduce/reduce errors and you need Bison’s output file to debug these. So my next little bit of bash script dealt with that … it just rewrites the Makefile with the appropriate flags set.
# ----
# Modify the Makefile to add debug flags so output is more verbose.
cp Makefile Makefile.old
cat Makefile \
| sed "s/-PFortran$/-PFortran --debug/g" \
| sed "s/-pFortran$/-pFortran --debug -r all -g/g" \
> Makefile.new
cp Makefile.new Makefile
# Show user the difference in the Makefiles
echo "--- Makefile ---"
diff Makefile Makefile.old | sed "s/^/ /g"
fi
This then produces the Parser.output file which we use to debug the BNFC input grammar. I’ll explain the process of debugging the grammar when there are reduce/reduce and shift/reduce errors in another post:
_ERROR_
State 177 conflicts: 3 reduce/reduce
State 225 conflicts: 3 reduce/reduce
State 226 conflicts: 3 reduce/reduce
State 227 conflicts: 1 shift/reduce, 3 reduce/reduce
Grammar
0 $accept: Program $end
1 Program: ListLblStm
2 ListLblStm: %empty
3 | ListLblStm LblStm _SYMB_0
4 LblStm: Labeled_stm
5 | Simple_stm
6 | %empty
7 Labeled_stm: _INTEGER_ Simple_stm
8 Simple_stm: _SYMB_39 Type_Spec Type_Qual _SYMB_1 _SYMB_51 _SYMB_2 _SYMB_51 _SYMB_3
9 | _SYMB_43 ListNameValue
10 | _SYMB_30 ListNameDim
11 | Type_Spec Type_Qual ListNameDim
12 | Type_Spec ListNameDim
13 | _SYMB_29 ListDataSeg
14 | _SYMB_27 _SYMB_8 _SYMB_51 _SYMB_8 ListName
15 | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3
16 | _SYMB_50 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray
17 | _SYMB_35 _SYMB_1 ListFmtSpecs _SYMB_3
18 | _SYMB_44 _SYMB_1 ListAssignName _SYMB_3 ListNameOrArray
19 | _SYMB_44 _SYMB_6 LExp
...
Generated Lexer fixups:
The generated lexer had a few changes that were needed because I was trying to make it conform to what I wanted to do instead of letting it do it’s thing:
" return T_NEWLINE;
<YYINITIAL>"(" return T_LPAREN;
<YYINITIAL>"-" return T_MINUS;
<YYINITIAL>")" return T_RPAREN;
...
<YYINITIAL>"TO" return T_TO;
<YYINITIAL>"WRITE" return T_WRITE;
<YYINITIAL>"//"[^\n]*\n ++yy_mylinenumber; /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)* yylval.string_ = strdup(yytext); return T_NAME;
<YYINITIAL>'.+' yylval.string_ = strdup(yytext); return T_SQSTR;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F) yylval.string_ = strdup(yytext); return T_CFLT;
<YYINITIAL>{DIGIT}+ yylval.int_ = atoi(yytext); return _INTEGER_;
\n ++yy_mylinenumber ;
<YYINITIAL>[ \t\r\n\f] /* ignore white space. */;
<YYINITIAL>. return _ERROR_;
%%
void initialize_lexer(FILE *inp) { yyrestart(inp); BEGIN YYINITIAL; }
First BNFC doesn’t know what to do with a newline character as a token. So you see in the first two lines a broken Flex statement. These two lines were replaced with a line derived from the file ‘l1’:
“\n” { ++yy_mylinenumber; return T_NEWLINE; };
which treats the newline better. The other change is to get rid of the line that counts the line numbers (the ++yy_mylinenumber; line) and remove newlines and form-feeds from the “ignore white space” line. The following bash script segment does this:
cp Fortran.l Fortran.l.old
cat Fortran.l \
| grep -v "^<YYINITIAL>"$" \
| grep -v "^..\ ++yy_mylinenumber.;$" \
| sed "s/^"[ \t]*return T_NEWLINE;$/${l1}/g" \
| sed "s/\\\n\\\f]/]+/g" \
> Fortran.l.new
cp Fortran.l.new Fortran.l
touch Fortran.y
# Show changes to user.
echo "--- Fortran.l ---"
diff Fortran.l Fortran.l.old | sed "s/^/ /g"
This produces the following lexer:
<YYINITIAL>"(" return T_LPAREN;
<YYINITIAL>"-" return T_MINUS;
<YYINITIAL>")" return T_RPAREN;
...
<YYINITIAL>"TO" return T_TO;
<YYINITIAL>"WRITE" return T_WRITE;
<YYINITIAL>"//"[^\n]*\n ++yy_mylinenumber; /* BNFC single-line comment */;
<YYINITIAL>\%*{CAPITAL}({CAPITAL}|{DIGIT}|\$|\_)* yylval.string_ = strdup(yytext); return T_NAME;
<YYINITIAL>'.+' yylval.string_ = strdup(yytext); return T_SQSTR;
<YYINITIAL>{DIGIT}+\.{DIGIT}+((e|E)\-?{DIGIT}+)?(f|F)|{DIGIT}+(e|E)\-?{DIGIT}+(f|F) yylval.string_ = strdup(yytext); return T_CFLT;
<YYINITIAL>{DIGIT}+ yylval.int_ = atoi(yytext); return _INTEGER_;
<YYINITIAL>[ \t\r]+ /* ignore white space. */;
<YYINITIAL>. return _ERROR_;
%%
Code Generation:
The generation of all the Flex/Bison and the pretty printer code is the saving grace. BNFC gets you most of the way to producing a front-end for your compiler and makes life a lot easier if, like me, you tend to forget how to write a Flex and Bison driver file in between doing other coding.
However, if you are new to building a compiler then BNFC may just make things worse by adding another level of complexity to an already complex problem. For this reason it might be best for novice compiler writers to ignore BNFC and read a good book on Flex/Bison and maybe a text book on compiler writing.
In my next post I will describe the code that is generated by BNFC.
Here are all the files (complete) that I’ve talked about above.