Compiler/code generator
A code generator translates the output of the syntax analyzer and/or semantic analyzer into lower level code, either assembly, object, or virtual.
Take the output of the Syntax analyzer task - which is a flattened Abstract Syntax Tree (AST) - and convert it to virtual machine code, that can be run by the Virtual machine interpreter. The output is in text format, and represents virtual assembly code.
The program should read input from a file and/or stdin, and write output to a file and/or stdout.
- Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
- Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
- while.ast can be input into the code generator.
- The following table shows the input to lex, lex output, the AST produced by the parser, and the generated virtual assembly code.
Run as: lex < while.t | parse | gen
Input to lex | Output from lex, input to parse | Output from parse | Output from gen, input to VM |
---|---|---|---|
<lang c>count = 1; while (count < 10) { print("count is: ", count, "\n"); count = count + 1; }</lang> |
1 1 Identifier count 1 7 Op_assign 1 9 Integer 1 1 10 Semicolon 2 1 Keyword_while 2 7 LeftParen 2 8 Identifier count 2 14 Op_less 2 16 Integer 10 2 18 RightParen 2 20 LeftBrace 3 5 Keyword_print 3 10 LeftParen 3 11 String "count is: " 3 23 Comma 3 25 Identifier count 3 30 Comma 3 32 String "\n" 3 36 RightParen 3 37 Semicolon 4 5 Identifier count 4 11 Op_assign 4 13 Identifier count 4 19 Op_add 4 21 Integer 1 4 22 Semicolon 5 1 RightBrace 6 1 End_of_input |
Sequence Sequence ; Assign Identifier count Integer 1 While Less Identifier count Integer 10 Sequence Sequence ; Sequence Sequence Sequence ; Prts String "count is: " ; Prti Identifier count ; Prts String "\n" ; Assign Identifier count Add Identifier count Integer 1 |
Datasize: 1 Strings: 2 "count is: " "\n" 0 push 1 5 store [0] 10 fetch [0] 15 push 10 20 lt 21 jz (43) 65 26 push 0 31 prts 32 fetch [0] 37 prti 38 push 1 43 prts 44 fetch [0] 49 push 1 54 add 55 store [0] 60 jmp (-51) 10 65 halt |
- Input format
As shown in the table, above, the output from the syntax analyzer is a flattened AST.
In the AST, Identifier, Integer, and String, are terminal nodes, e.g, they do not have child nodes.
Loading this data into an internal parse tree should be as simple as:
<lang python> def load_ast()
line = readline() # Each line has at least one token line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";" return None
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node # Leaf nodes are: Identifier, Integer String # The 2nd token is the value if len(line_list) > 1 return make_leaf(node_type, line_list[1])
left = load_ast() right = load_ast() return make_node(node_type, left, right)
</lang>
- Output format - refer to the table above
- The first line is the header: Size of data, and number of constant strings.
- size of data is the number of 32-bit unique variables used. In this example, one variable, count
- number of constant strings is just that - how many there are
- After that, the constant strings
- Finally, the assembly code
- Registers
- sp: the stack pointer - points to the next top of stack. The stack is a 32-bit integer array.
- pc: the program counter - points to the current instruction to be performed. The code is an array of bytes.
- Data
32-bit integers and strings
- Instructions
Each instruction is one byte. The following instructions also have a 32-bit integer operand:
fetch [index]
where index is an index into the data array.
store [index]
where index is an index into the data array.
push n
where value is a 32-bit integer that will be pushed onto the stack.
jmp (n) addr
where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.
jz (n) addr
where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.
The following instructions do not have an operand. They perform their operation directly against the stack:
For the following instructions, the operation is performed against the top two entries in the stack:
add sub mul div lt gt le ne and
For the following instructions, the operation is performed against the top entry in the stack:
neg
prtc
Print the word at stack top as a character.
prti
Print the word at stack top as an integer.
prts
Stack top points to an index into the string pool. Print that entry.
halt
Unconditional stop.
- Additional examples
Your solution should pass all the test cases above and the additional tests found Here.
The C and Python versions can be considered reference implementations.
- Related Tasks
ALGOL W
<lang algolw>begin % code generator %
% parse tree nodes % record node( integer type ; reference(node) left, right ; integer iValue % nString/nIndentifier number or nInteger value % ); integer nIdentifier, nString, nInteger, nSequence, nIf, nPrtc, nPrts , nPrti, nWhile, nAssign, nNegate, nNot, nMultiply , nDivide, nMod, nAdd, nSubtract, nLess, nLessEqual , nGreater, nGreaterEqual, nEqual, nNotEqual, nAnd, nOr ; string(14) array ndName ( 1 :: 25 ); integer array nOp ( 1 :: 25 ); integer MAX_NODE_TYPE; % string literals and identifiers - uses a linked list - a hash table might be better... % string(1) array text ( 0 :: 4095 ); integer textNext, TEXT_MAX; record textElement ( integer start, length; reference(textElement) next ); reference(textElement) idList, stList; % op codes % integer oFetch, oStore, oPush , oAdd, oSub, oMul, oDiv, oMod, oLt, oGt, oLe, oGe, oEq, oNe , oAnd, oOr, oNeg, oNot, oJmp, oJz, oPrtc, oPrts, oPrti, oHalt ; string(6) array opName ( 1 :: 24 ); % code - although this is intended to be byte code, as we are going to output % % an assembler source, we use integers for convenience % % labelLocations are: ( - referencing location + 1 ) if they have been referenced but not defined yet, % % zero if they are unreferenced and undefined, % % ( referencing location + 1 ) if they are defined % integer array byteCode ( 0 :: 4095 ); integer array labelLocation( 1 :: 4096 ); integer nextLocation, MAX_LOCATION, nextLabelNumber, MAX_LABEL_NUMBER;
% returns a new node with left and right branches % reference(node) procedure opNode ( integer value opType; reference(node) value opLeft, opRight ) ; begin node( opType, opLeft, opRight, 0 ) end opNode ;
% returns a new operand node % reference(node) procedure operandNode ( integer value opType, opValue ) ; begin node( opType, null, null, opValue ) end operandNode ;
% reports an error and stops % procedure genError( string(80) value message ); begin integer errorPos; write( s_w := 0, "**** Code generation error: " ); errorPos := 0; while errorPos < 80 and message( errorPos // 1 ) not = "." do begin writeon( s_w := 0, message( errorPos // 1 ) ); errorPos := errorPos + 1 end while_not_at_end_of_message ; writeon( s_w := 0, "." ); assert( false ) end genError ;
% reads a node from standard input % reference(node) procedure readNode ; begin reference(node) resultNode;
% parses a string from line and stores it in a string in the text array % % - if it is not already present in the specified textElement list. % % returns the position of the string in the text array % integer procedure readString ( reference(textElement) value result txList; string(1) value terminator ) ; begin string(256) str; integer sLen, sPos, ePos; logical found; reference(textElement) txPos, txLastPos; % get the text of the string % str := " "; sLen := 0; str( sLen // 1 ) := line( lPos // 1 ); sLen := sLen + 1; lPos := lPos + 1; while lPos <= 255 and line( lPos // 1 ) not = terminator do begin str( sLen // 1 ) := line( lPos // 1 ); sLen := sLen + 1; lPos := lPos + 1 end while_more_string ; if lPos > 255 then genError( "Unterminated String in node file." ); % attempt to find the text in the list of strings/identifiers % txLastPos := txPos := txList; found := false; ePos := 0; while not found and txPos not = null do begin ePos := ePos + 1; found := ( length(txPos) = sLen ); sPos := 0; while found and sPos < sLen do begin found := str( sPos // 1 ) = text( start(txPos) + sPos ); sPos := sPos + 1 end while_not_found ; txLastPos := txPos; if not found then txPos := next(txPos) end while_string_not_found ; if not found then begin % the string/identifier is not in the list - add it % ePos := ePos + 1; if txList = null then txList := textElement( textNext, sLen, null ) else next(txLastPos) := textElement( textNext, sLen, null ); if textNext + sLen > TEXT_MAX then genError( "Text space exhausted." ) else begin for cPos := 0 until sLen - 1 do begin text( textNext ) := str( cPos // 1 ); textNext := textNext + 1 end for_cPos end end if_not_found ; ePos end readString ;
% gets an integer from the line - no checks for valid digits % integer procedure readInteger ; begin integer n; n := 0; while line( lPos // 1 ) not = " " do begin n := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) ); lPos := lPos + 1 end while_not_end_of_integer ; n end readInteger ;
string(256) line; string(16) name; integer lPos, tPos, ndType; tPos := lPos := 0; readcard( line ); % get the node type name % while line( lPos // 1 ) = " " do lPos := lPos + 1; name := ""; while lPos < 256 and line( lPos // 1 ) not = " " do begin name( tPos // 1 ) := line( lPos // 1 ); lPos := lPos + 1; tPos := tPos + 1 end while_more_name ; % determine the node type % ndType := 1; resultNode := null; if name not = ";" then begin % not a null node % while ndType <= MAX_NODE_TYPE and name not = ndName( ndType ) do ndType := ndType + 1; if ndType > MAX_NODE_TYPE then genError( "Malformed node." ); % handle the additional parameter for identifier/string/integer, or sub-nodes for operator nodes % if ndType = nInteger or ndType = nIdentifier or ndType = nString then begin while line( lPos // 1 ) = " " do lPos := lPos + 1; if ndType = nInteger then resultNode := operandNode( ndType, readInteger ) else if ndType = nIdentifier then resultNode := operandNode( ndType, readString( idList, " " ) ) else % ndType = nString % resultNode := operandNode( ndType, readString( stList, """" ) ) end else begin % operator node % reference(node) leftNode; leftNode := readNode; resultNode := opNode( ndType, leftNode, readNode ) end end if_non_null_node ; resultNode end readNode ;
% returns the next free label number % integer procedure newLabel ; begin nextLabelNumber := nextLabelNumber + 1; if nextLabelNumber > MAX_LABEL_NUMBER then genError( "Program too complex" ); nextLabelNumber end newLabel ;
% defines the specified label to be at the next location % procedure defineLabel ( integer value labelNumber ) ; begin if labelLocation( labelNumber ) > 0 then genError( "Label already defined" ) else begin % this is the first definition of the label, define it and if it has already been referenced, fill in the reference % integer currValue; currValue := labelLocation( labelNumber ); labelLocation( labelNumber ) := nextLocation + 1; % we store pc + 1 to ensure the label location is positive % if currValue < 0 then % already referenced % byteCode( - ( currValue + 1 ) ) := labelLocation( labelNumber ) end end defineLabel ;
% stores a byte in the code % procedure genByte ( integer value byteValue ) ; begin if nextLocation > MAX_LOCATION then genError( "Program too large" ); byteCode( nextLocation ) := byteValue; nextLocation := nextLocation + 1 end genByte ;
% stores an integer in the code % procedure genInteger ( integer value integerValue ) ; begin % we are storing the bytes of the code in separate integers for convenience % genByte( integerValue ); genByte( 0 ); genByte( 0 ); genByte( 0 ) end genInteger ;
% generates an operation acting on an address % procedure genDataOp ( integer value opCode, address ) ; begin genByte( opCode ); genInteger( address ) end genDataOp ;
% generates a nullary operation % procedure genOp0 ( integer value opCode ) ; begin genByte( opCode ) end genOp0 ;
% generates a unary/binary operation % procedure genOp ( reference(node) value n ) ; begin gen( left(n) ); gen( right(n) ); % right will be null for a unary op so no code will be generated % genByte( nOp( type(n) ) ) end genOp ;
% generates a jump operation % procedure genJump ( integer value opCode, labelNumber ) ; begin genByte( opCode ); % if the label is not defined yet - set it's location to the negative of the referencing location % % so it can be resolved later % if labelLocation( labelNumber ) = 0 then labelLocation( labelNumber ) := - ( nextLocation + 1 ); genInteger( labelLocation( labelNumber ) ) end genJump ;
% generates code for the node n % procedure gen ( reference(node) value n ) ; begin
if n = null then % empty node % begin end else if type(n) = nIdentifier then genDataOp( oFetch, iValue(n) ) else if type(n) = nString then genDataOp( oPush, iValue(n) - 1 ) else if type(n) = nInteger then genDataOp( oPush, iValue(n) ) else if type(n) = nSequence then begin gen( left(n) ); gen( right(n) ) end else if type(n) = nIf then % if-else % begin integer elseLabel; elseLabel := newLabel; gen( left(n) ); genJump( oJz, elseLabel ); gen( left( right(n) ) ); if right(right(n)) = null then % no "else" part % defineLabel( elseLabel ) else begin % have an "else" part % integer endIfLabel; endIfLabel := newLabel; genJump( oJmp, endIfLabel ); defineLabel( elseLabel ); gen( right(right(n)) ); defineLabel( endIfLabel ) end end else if type(n) = nWhile then % while-loop % begin integer loopLabel, exitLabel; loopLabel := newLabel; exitLabel := newLabel; defineLabel( loopLabel ); gen( left(n) ); genJump( oJz, exitLabel ); gen( right(n) ); genJump( oJmp, loopLabel ); defineLabel( exitLabel ) end else if type(n) = nAssign then % assignment % begin gen( right( n ) ); genDataOp( oStore, iValue(left(n)) ) end else genOp( n ) end gen ;
% outputs the generated code to standard output % procedure emitCode ; begin
% counts the number of elements in a text element list % integer procedure countElements ( reference(textElement) value txHead ) ; begin integer count; reference(textElement) txPos; count := 0; txPos := txHead; while txPos not = null do begin count := count + 1; txPos := next(txPos) end while_txPos_not_null ; count end countElements ;
integer pc, op; reference(textElement) txPos;
% code header % write( i_w := 1, s_w := 0 , "Datasize: ", countElements( idList ) , " Strings: ", countElements( stList ) ); % output the string literals % txPos := stList; while txPos not = null do begin integer cPos; write( """" ); cPos := 1; % start from 1 to skip over the leading " % while cPos < length(txPos) do begin writeon( s_w := 0, text( start(txPos) + cPos ) ); cPos := cPos + 1 end while_not_end_of_string ; writeon( s_w := 0, """" ); txPos := next(txPos) end while_not_at_end_of_literals ;
% code body % pc := 0; while pc < nextLocation do begin op := byteCode( pc ); write( i_w := 4, s_w := 0, pc, " ", opName( op ) ); pc := pc + 1; if op = oFetch or op = oStore then begin % data load/store - add the address in square brackets % writeon( i_w := 1, s_w := 0, "[", byteCode( pc ) - 1, "]" ); pc := pc + 4 end else if op = oPush then begin % push constant - add the constant % writeon( i_w := 1, s_w := 0, byteCode( pc ) ); pc := pc + 4 end else if op = oJmp or op = oJz then begin % jump - show the relative address in brackets and the absolute address % writeon( i_w := 1, s_w := 0, "(", ( byteCode( pc ) - 1 ) - pc, ") ", byteCode( pc ) - 1 ); pc := pc + 4 end end while_pc_lt_nextLocation end emitCode ;
oFetch := 1; opName( oFetch ) := "fetch"; oStore := 2; opName( oStore ) := "store"; oPush := 3; opName( oPush ) := "push"; oAdd := 4; opName( oAdd ) := "add"; oSub := 5; opName( oSub ) := "sub"; oMul := 6; opName( oMul ) := "mul"; oDiv := 7; opName( oDiv ) := "div"; oMod := 8; opName( oMod ) := "mod"; oLt := 9; opName( oLt ) := "lt"; oGt := 10; opName( oGt ) := "gt"; oLe := 11; opName( oLe ) := "le"; oGe := 12; opName( oGe ) := "ge"; oEq := 13; opName( oEq ) := "eq"; oNe := 14; opName( oNe ) := "ne"; oAnd := 15; opName( oAnd ) := "and"; oOr := 16; opName( oOr ) := "or"; oNeg := 17; opName( oNeg ) := "neg"; oNot := 18; opName( oNot ) := "not"; oJmp := 19; opName( oJmp ) := "jmp"; oJz := 20; opName( oJz ) := "jz"; oPrtc := 21; opName( oPrtc ) := "prtc"; oPrts := 22; opName( oPrts ) := "prts"; oPrti := 23; opName( oPrti ) := "prti"; oHalt := 24; opName( oHalt ) := "halt";
nIdentifier := 1; ndName( nIdentifier ) := "Identifier"; nString := 2; ndName( nString ) := "String"; nInteger := 3; ndName( nInteger ) := "Integer"; nSequence := 4; ndName( nSequence ) := "Sequence"; nIf := 5; ndName( nIf ) := "If"; nPrtc := 6; ndName( nPrtc ) := "Prtc"; nPrts := 7; ndName( nPrts ) := "Prts"; nPrti := 8; ndName( nPrti ) := "Prti"; nWhile := 9; ndName( nWhile ) := "While"; nAssign := 10; ndName( nAssign ) := "Assign"; nNegate := 11; ndName( nNegate ) := "Negate"; nNot := 12; ndName( nNot ) := "Not"; nMultiply := 13; ndName( nMultiply ) := "Multiply"; nDivide := 14; ndName( nDivide ) := "Divide"; nMod := 15; ndName( nMod ) := "Mod"; nAdd := 16; ndName( nAdd ) := "Add"; nSubtract := 17; ndName( nSubtract ) := "Subtract"; nLess := 18; ndName( nLess ) := "Less"; nLessEqual := 19; ndName( nLessEqual ) := "LessEqual"; nGreater := 20; ndName( nGreater ) := "Greater"; nGreaterEqual := 21; ndName( nGreaterEqual ) := "GreaterEqual"; nEqual := 22; ndName( nEqual ) := "Equal"; nNotEqual := 23; ndName( nNotEqual ) := "NotEqual"; nAnd := 24; ndName( nAnd ) := "And"; nOr := 25; ndName( nOr ) := "Or"; MAX_NODE_TYPE := 25; TEXT_MAX := 4095; textNext := 0; stList := idList := null; for nPos := 1 until MAX_NODE_TYPE do nOp( nPos ) := -1; nOp( nPrtc ) := oPrtc; nOp( nPrts ) := oPrts; nOp( nPrti ) := oPrti; nOp( nNegate ) := oNeg; nOp( nNot ) := oNot; nOp( nMultiply ) := oMul; nOp( nDivide ) := oDiv; nOp( nMod ) := oMod; nOp( nAdd ) := oAdd; nOp( nSubtract ) := oSub; nOp( nLess ) := oLt; nOp( nLessEqual ) := oLe; nOp( nGreater ) := oGt; nOp( nGreaterEqual ) := oGe; nOp( nEqual ) := oEq; nOp( nNotEqual ) := oNe; nOp( nAnd ) := oAnd; nOp( nOr ) := oOr; nextLocation := 0; MAX_LOCATION := 4095; for pc := 0 until MAX_LOCATION do byteCode( pc ) := 0; nextLabelNumber := 0; MAX_LABEL_NUMBER := 4096; for lPos := 1 until MAX_LABEL_NUMBER do labelLocation( lPos ) := 0;
% parse the output from the syntax analyser and generate code from the parse tree % gen( readNode ); genOp0( oHalt ); emitCode
end.</lang>
- Output:
The While Counter example
Datasize: 1 Strings: 2 "count is: " "\n" 0 push 1 5 store [0] 10 fetch [0] 15 push 10 20 lt 21 jz (43) 65 26 push 0 31 prts 32 fetch [0] 37 prti 38 push 1 43 prts 44 fetch [0] 49 push 1 54 add 55 store [0] 60 jmp (-51) 10 65 halt
C
Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra <lang C>#include <stdlib.h>
- include <stdio.h>
- include <string.h>
- include <stdarg.h>
- include <stdint.h>
- include <ctype.h>
typedef unsigned char uchar;
typedef enum {
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or
} NodeType;
typedef enum { FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND,
OR, NEG, NOT, JMP, JZ, PRTC, PRTS, PRTI, HALT
} Code_t;
typedef uchar code;
typedef struct Tree {
NodeType node_type; struct Tree *left; struct Tree *right; char *value;
} Tree;
- define da_dim(name, type) type *name = NULL; \
int _qy_ ## name ## _p = 0; \ int _qy_ ## name ## _max = 0
- define da_redim(name) do {if (_qy_ ## name ## _p >= _qy_ ## name ## _max) \
name = realloc(name, (_qy_ ## name ## _max += 32) * sizeof(name[0]));} while (0)
- define da_rewind(name) _qy_ ## name ## _p = 0
- define da_append(name, x) do {da_redim(name); name[_qy_ ## name ## _p++] = x;} while (0)
- define da_len(name) _qy_ ## name ## _p
- define da_add(name) do {da_redim(name); _qy_ ## name ## _p++;} while (0)
FILE *source_fp, *dest_fp; static int here; da_dim(object, code); da_dim(globals, const char *); da_dim(string_pool, const char *);
// dependency: Ordered by NodeType, must remain in same order as NodeType enum struct {
char *enum_text; NodeType node_type; Code_t opcode;
} atr[] = {
{"Identifier" , nd_Ident, -1 }, {"String" , nd_String, -1 }, {"Integer" , nd_Integer, -1 }, {"Sequence" , nd_Sequence, -1 }, {"If" , nd_If, -1 }, {"Prtc" , nd_Prtc, -1 }, {"Prts" , nd_Prts, -1 }, {"Prti" , nd_Prti, -1 }, {"While" , nd_While, -1 }, {"Assign" , nd_Assign, -1 }, {"Negate" , nd_Negate, NEG}, {"Not" , nd_Not, NOT}, {"Multiply" , nd_Mul, MUL}, {"Divide" , nd_Div, DIV}, {"Mod" , nd_Mod, MOD}, {"Add" , nd_Add, ADD}, {"Subtract" , nd_Sub, SUB}, {"Less" , nd_Lss, LT }, {"LessEqual" , nd_Leq, LE }, {"Greater" , nd_Gtr, GT }, {"GreaterEqual", nd_Geq, GE }, {"Equal" , nd_Eql, EQ }, {"NotEqual" , nd_Neq, NE }, {"And" , nd_And, AND}, {"Or" , nd_Or, OR },
};
void error(const char *fmt, ... ) {
va_list ap; char buf[1000];
va_start(ap, fmt); vsprintf(buf, fmt, ap); va_end(ap); printf("error: %s\n", buf); exit(1);
}
Code_t type_to_op(NodeType type) {
return atr[type].opcode;
}
Tree *make_node(NodeType node_type, Tree *left, Tree *right) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->left = left; t->right = right; return t;
}
Tree *make_leaf(NodeType node_type, char *value) {
Tree *t = calloc(sizeof(Tree), 1); t->node_type = node_type; t->value = strdup(value); return t;
}
/*** Code generator ***/
void emit_byte(int c) {
da_append(object, (uchar)c); ++here;
}
void emit_int(int32_t n) {
union { int32_t n; unsigned char c[sizeof(int32_t)]; } x;
x.n = n;
for (size_t i = 0; i < sizeof(x.n); ++i) { emit_byte(x.c[i]); }
}
int hole() {
int t = here; emit_int(0); return t;
}
void fix(int src, int dst) {
*(int32_t *)(object + src) = dst-src;
}
int fetch_var_offset(const char *id) {
for (int i = 0; i < da_len(globals); ++i) { if (strcmp(id, globals[i]) == 0) return i; } da_add(globals); int n = da_len(globals) - 1; globals[n] = strdup(id); return n;
}
int fetch_string_offset(const char *st) {
for (int i = 0; i < da_len(string_pool); ++i) { if (strcmp(st, string_pool[i]) == 0) return i; } da_add(string_pool); int n = da_len(string_pool) - 1; string_pool[n] = strdup(st); return n;
}
void code_gen(Tree *x) {
int p1, p2, n;
if (x == NULL) return; switch (x->node_type) { case nd_Ident: emit_byte(FETCH); n = fetch_var_offset(x->value); emit_int(n); break; case nd_Integer: emit_byte(PUSH); emit_int(atoi(x->value)); break; case nd_String: emit_byte(PUSH); n = fetch_string_offset(x->value); emit_int(n); break; case nd_Assign: n = fetch_var_offset(x->left->value); code_gen(x->right); emit_byte(STORE); emit_int(n); break; case nd_If: code_gen(x->left); // if expr emit_byte(JZ); // if false, jump p1 = hole(); // make room for jump dest code_gen(x->right->left); // if true statements if (x->right->right != NULL) { emit_byte(JMP); p2 = hole(); } fix(p1, here); if (x->right->right != NULL) { code_gen(x->right->right); fix(p2, here); } break; case nd_While: p1 = here; code_gen(x->left); // while expr emit_byte(JZ); // if false, jump p2 = hole(); // make room for jump dest code_gen(x->right); // statements emit_byte(JMP); // back to the top fix(hole(), p1); // plug the top fix(p2, here); // plug the 'if false, jump' break; case nd_Sequence: code_gen(x->left); code_gen(x->right); break; case nd_Prtc: code_gen(x->left); emit_byte(PRTC); break; case nd_Prti: code_gen(x->left); emit_byte(PRTI); break; case nd_Prts: code_gen(x->left); emit_byte(PRTS); break; case nd_Lss: case nd_Gtr: case nd_Leq: case nd_Geq: case nd_Eql: case nd_Neq: case nd_And: case nd_Or: case nd_Sub: case nd_Add: case nd_Div: case nd_Mul: case nd_Mod: code_gen(x->left); code_gen(x->right); emit_byte(type_to_op(x->node_type)); break; case nd_Negate: case nd_Not: code_gen(x->left); emit_byte(type_to_op(x->node_type)); break; default: error("error in code generator - found %d, expecting operator\n", x->node_type); }
}
void code_finish() {
emit_byte(HALT);
}
void list_code() {
fprintf(dest_fp, "Datasize: %d Strings: %d\n", da_len(globals), da_len(string_pool)); for (int i = 0; i < da_len(string_pool); ++i) fprintf(dest_fp, "%s\n", string_pool[i]);
code *pc = object;
again: fprintf(dest_fp, "%5d ", (int)(pc - object)); switch (*pc++) { case FETCH: fprintf(dest_fp, "fetch [%d]\n", *(int32_t *)pc); pc += sizeof(int32_t); goto again; case STORE: fprintf(dest_fp, "store [%d]\n", *(int32_t *)pc); pc += sizeof(int32_t); goto again; case PUSH : fprintf(dest_fp, "push %d\n", *(int32_t *)pc); pc += sizeof(int32_t); goto again; case ADD : fprintf(dest_fp, "add\n"); goto again; case SUB : fprintf(dest_fp, "sub\n"); goto again; case MUL : fprintf(dest_fp, "mul\n"); goto again; case DIV : fprintf(dest_fp, "div\n"); goto again; case MOD : fprintf(dest_fp, "mod\n"); goto again; case LT : fprintf(dest_fp, "lt\n"); goto again; case GT : fprintf(dest_fp, "gt\n"); goto again; case LE : fprintf(dest_fp, "le\n"); goto again; case GE : fprintf(dest_fp, "ge\n"); goto again; case EQ : fprintf(dest_fp, "eq\n"); goto again; case NE : fprintf(dest_fp, "ne\n"); goto again; case AND : fprintf(dest_fp, "and\n"); goto again; case OR : fprintf(dest_fp, "or\n"); goto again; case NOT : fprintf(dest_fp, "not\n"); goto again; case NEG : fprintf(dest_fp, "neg\n"); goto again; case JMP : fprintf(dest_fp, "jmp (%d) %d\n", *(int32_t *)pc, (int32_t)(pc + *(int32_t *)pc - object)); pc += sizeof(int32_t); goto again; case JZ : fprintf(dest_fp, "jz (%d) %d\n", *(int32_t *)pc, (int32_t)(pc + *(int32_t *)pc - object)); pc += sizeof(int32_t); goto again; case PRTC : fprintf(dest_fp, "prtc\n"); goto again; case PRTI : fprintf(dest_fp, "prti\n"); goto again; case PRTS : fprintf(dest_fp, "prts\n"); goto again; case HALT : fprintf(dest_fp, "halt\n"); break; default:error("listcode:Unknown opcode %d\n", *(pc - 1)); }
}
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0') *fp = std; else if ((*fp = fopen(fn, mode)) == NULL) error(0, 0, "Can't open %s\n", fn);
}
NodeType get_enum_value(const char name[]) {
for (size_t i = 0; i < sizeof(atr) / sizeof(atr[0]); i++) { if (strcmp(atr[i].enum_text, name) == 0) { return atr[i].node_type; } } error("Unknown token %s\n", name); return -1;
}
char *read_line(int *len) {
static char *text = NULL; static int textmax = 0;
for (*len = 0; ; (*len)++) { int ch = fgetc(source_fp); if (ch == EOF || ch == '\n') { if (*len == 0) return NULL; break; } if (*len + 1 >= textmax) { textmax = (textmax == 0 ? 128 : textmax * 2); text = realloc(text, textmax); } text[*len] = ch; } text[*len] = '\0'; return text;
}
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len)) ;
text[*len] = '\0'; return text;
}
Tree *load_ast() {
int len; char *yytext = read_line(&len); yytext = rtrim(yytext, &len);
// get first token char *tok = strtok(yytext, " ");
if (tok[0] == ';') { return NULL; } NodeType node_type = get_enum_value(tok);
// if there is extra data, get it char *p = tok + strlen(tok); if (p != &yytext[len]) { for (++p; isspace(*p); ++p) ; return make_leaf(node_type, p); }
Tree *left = load_ast(); Tree *right = load_ast(); return make_node(node_type, left, right);
}
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : ""); init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : "");
code_gen(load_ast()); code_finish(); list_code();
return 0;
}</lang>
- Output — While counter example:
Datasize: 1 Strings: 2 "count is: " "\n" 0 push 1 5 store [0] 10 fetch [0] 15 push 10 20 lt 21 jz (43) 65 26 push 0 31 prts 32 fetch [0] 37 prti 38 push 1 43 prts 44 fetch [0] 49 push 1 54 add 55 store [0] 60 jmp (-51) 10 65 halt
Phix
Reusing parse.e from the Syntax Analyzer task
Deviates somewhat from the task specification in that it generates executable machine code.
<lang Phix>--
-- demo\rosetta\Compiler\cgen.e
-- ============================
--
-- The reusable part of cgen.exw
--
include parse.e
global sequence vars = {},
strings = {}, stringptrs = {}
global integer chain = 0 global sequence code = {}
function var_idx(sequence inode)
if inode[1]!=tk_Identifier then ?9/0 end if string ident = inode[2] integer n = find(ident,vars) if n=0 then vars = append(vars,ident) n = length(vars) end if return n
end function
function string_idx(sequence inode)
if inode[1]!=tk_String then ?9/0 end if string s = inode[2] integer n = find(s,strings) if n=0 then strings = append(strings,s) stringptrs = append(stringptrs,0) n = length(strings) end if return n
end function
function gen_size(object t) -- note: must be kept precisely in sync with gen_rec! -- (relentlessly tested via estsize/actsize) integer size = 0
if t!=NULL then integer n_type = t[1] string node_type = tkNames[n_type] switch n_type do case tk_Sequence: size += gen_size(t[2]) size += gen_size(t[3]) case tk_assign: size += gen_size(t[3])+6 case tk_Integer: size += 5 case tk_Identifier: size += 6 case tk_String: size += 5 case tk_while: -- emit: @@:<condition><topjmp(@f)><body><tailjmp(@b)>@@: size += gen_size(t[2])+3 integer body = gen_size(t[3]) integer stail = iff(size+body+2>128?5:2) integer stop = iff(body+stail >127?6:2) size += stop+body+stail case tk_lt: case tk_le: case tk_ne: case tk_eq: case tk_gt: case tk_ge: size += gen_size(t[2]) size += gen_size(t[3]) size += 10 case tk_add: case tk_and: case tk_sub: size += gen_size(t[2]) size += gen_size(t[3]) size += 4 case tk_mul: size += gen_size(t[2]) size += gen_size(t[3]) size += 5 case tk_div: case tk_mod: size += gen_size(t[2]) size += gen_size(t[3]) size += 6 case tk_putc: case tk_Printi: case tk_Prints: size += gen_size(t[2]) size += 5 case tk_if: size += gen_size(t[2])+3 if t[3][1]!=tk_if then ?9/0 end if integer truesize = gen_size(t[3][2]) integer falsesize = gen_size(t[3][3]) integer elsejmp = iff(falsesize=0?0:iff(falsesize>127?5:2)) integer mainjmp = iff(truesize+elsejmp>127?6:2) size += mainjmp+truesize+elsejmp+falsesize case tk_not: size += gen_size(t[2]) size += 9 case tk_neg: size += gen_size(t[2]) size += 4 else: ?9/0 end switch end if return size
end function
procedure gen_rec(object t) -- the recursive part of code_gen
if t!=NULL then integer initsize = length(code) integer estsize = gen_size(t) -- (test the gen_size function) integer n_type = t[1] string node_type = tkNames[n_type] switch n_type do case tk_Sequence: gen_rec(t[2]) gen_rec(t[3]) case tk_assign: integer n = var_idx(t[2]) gen_rec(t[3]) code &= {0o217,0o005,chain,1,n,0} -- pop [i] chain = length(code)-3 case tk_Integer: integer n = t[2] code &= 0o150&int_to_bytes(n) -- push imm32 case tk_while: -- emit: @@:<condition><topjmp(@f)><body><tailjmp(@b)>@@: integer looptop = length(code) gen_rec(t[2]) code &= {0o130, -- pop eax 0o205,0o300} -- test eax,eax integer bodysize = gen_size(t[3]) -- can we use short jumps? -- disclaimer: size calcs are not heavily tested; if in -- doubt reduce 128/7 by 8, and if that works -- then yep, you just found a boundary case. integer stail = iff(length(code)+bodysize+4-looptop>128?5:2) integer offset = bodysize+stail integer stop = iff(offset>127?6:2) if stop=2 then code &= {0o164,offset} -- jz (short) end else code &= {0o017,0o204}&int_to_bytes(offset) -- jz (long) end end if gen_rec(t[3]) offset = looptop-(length(code)+stail) if stail=2 then code &= 0o353&offset -- jmp looptop (short) else code &= 0o351&int_to_bytes(offset) -- jmp looptop (long) end if case tk_lt: case tk_le: case tk_gt: case tk_ge: case tk_ne: case tk_eq: gen_rec(t[2]) gen_rec(t[3]) integer xrm if n_type=tk_ne then xrm = 0o225 -- (#95) elsif n_type=tk_lt then xrm = 0o234 -- (#9C) elsif n_type=tk_ge then xrm = 0o235 -- (#9D) elsif n_type=tk_le then xrm = 0o236 -- (#9E) elsif n_type=tk_gt then xrm = 0o237 -- (#9F) else ?9/0 end if code &= { 0o061,0o300, -- xor eax,eax 0o132, -- pop edx 0o131, -- pop ecx 0o071,0o321, -- cmp ecx,edx 0o017,xrm,0o300, -- setcc al 0o120} -- push eax case tk_add: case tk_or: case tk_and: case tk_sub: gen_rec(t[2]) gen_rec(t[3]) integer op = find(n_type,{tk_add,tk_or,0,0,tk_and,tk_sub}) op = 0o001 + (op-1)*0o010 code &= { 0o130, -- pop eax op,0o004,0o044} -- add/or/and/sub [esp],eax case tk_mul: gen_rec(t[2]) gen_rec(t[3]) code &= { 0o131, -- pop ecx 0o130, -- pop eax 0o367,0o341, -- mul ecx 0o120} -- push eax case tk_div: case tk_mod: gen_rec(t[2]) gen_rec(t[3]) integer push = 0o120+(n_type=tk_mod)*2 code &= { 0o131, -- pop ecx 0o130, -- pop eax 0o231, -- cdq (eax -> edx:eax) 0o367,0o371, -- idiv ecx push} -- push eax|edx case tk_Identifier: integer n = var_idx(t) code &= {0o377,0o065,chain,1,n,0} -- push [n] chain = length(code)-3 case tk_putc: case tk_Printi: case tk_Prints: gen_rec(t[2]) integer n = find(n_type,{tk_putc,tk_Printi,tk_Prints}) code &= {0o350,chain,3,n,0} -- call :printc/i/s chain = length(code)-3 case tk_String: integer n = string_idx(t) code &= {0o150,chain,2,n,0} -- push RawStringPtr(string) chain = length(code)-3 case tk_if: -- emit: <condition><mainjmp><truepart>[<elsejmp><falsepart>] gen_rec(t[2]) code &= {0o130, -- pop eax 0o205,0o300} -- test eax,eax if t[3][1]!=tk_if then ?9/0 end if integer truesize = gen_size(t[3][2]) integer falsesize = gen_size(t[3][3]) integer elsejmp = iff(falsesize=0?0:iff(falsesize>127?5:2)) integer offset = truesize+elsejmp integer mainjmp = iff(offset>127?6:2) if mainjmp=2 then code &= {0o164,offset} -- jz (short) else/end else code &= {0o017,0o204}&int_to_bytes(offset) -- jz (long) else/end end if gen_rec(t[3][2]) if falsesize!=0 then offset = falsesize if elsejmp=2 then code &= 0o353&offset -- jmp end if (short) else code &= 0o351&int_to_bytes(offset) -- jmp end if (long) end if gen_rec(t[3][3]) end if case tk_not: gen_rec(t[2]) code &= {0o132, -- pop edx 0o061,0o300, -- xor eax,eax 0o205,0o322, -- test edx,edx 0o017,0o224,0o300, -- setz al 0o120} -- push eax case tk_neg: gen_rec(t[2]) code &= {0o130, -- pop eax 0o367,0o330, -- neg eax 0o120} -- push eax else: error("error in code generator - found %d, expecting operator\n", {n_type}) end switch integer actsize = length(code) if initsize+estsize!=actsize then ?"9/0" end if -- (test gen_size) end if
end procedure
global procedure code_gen(object t) -- -- Generates proper machine code. -- -- Example: i=10; print "\n"; print i; print "\n" -- Result in vars, strings, chain, code (declared above) -- where vars is: {"i"}, -- strings is {"\n"}, -- code is { 0o150,#0A,#00,#00,#00, -- 1: push 10 -- 0o217,0o005,0,1,1,0 -- 6: pop [i] -- 0o150,8,2,1,0, -- 12: push ("\n") -- 0o350,13,3,3,0, -- 17: call :prints -- 0o377,0o065,18,1,1,0, -- 22: push [i] -- 0o350,24,3,2,0, -- 28: call :printi -- 0o150,29,2,1,0, -- 33: push ("\n") -- 0o350,34,3,3,0, -- 38: call :prints -- 0o303} -- 43: ret -- and chain is 39 (->34->29->24->18->13->8->0) -- The chain connects all places where we need an actual address before -- the code is executed, with the byte after the link differentiating -- between var(1), string(2), and builtin(3), and the byte after that -- determining the instance of the given type - not that any of them -- are actually limited to a byte in the above intermediate form, and -- of course the trailing 0 of each {link,type,id,0} is just there to -- reserve the space we will need. --
gen_rec(t) code = append(code,0o303) -- ret (0o303=#C3)
end procedure
include builtins/VM/puts1.e -- low-level console i/o routines
function setbuiltins() atom printc,printi,prints
#ilASM{ jmp :setbuiltins ::printc lea edi,[esp+4] mov esi,1 call :%puts1ediesi -- (edi=raw text, esi=length) ret 4 ::printi mov eax,[esp+4] push 0 -- no cr call :%putsint -- (nb limited to +/-9,999,999,999) ret 4 ::prints mov edi,[esp+4] mov esi,[edi-12] call :%puts1ediesi -- (edi=raw text, esi=length) ret 4 ::setbuiltins mov eax,:printc lea edi,[printc] call :%pStoreMint mov eax,:printi lea edi,[printi] call :%pStoreMint mov eax,:prints lea edi,[prints] call :%pStoreMint } return {printc,printi,prints}
end function
global constant builtin_names = {"printc","printi","prints"} global constant builtins = setbuiltins()
global atom var_mem, code_mem
function RawStringPtr(integer n) -- (based on IupRawStringPtr from pGUI.e) -- -- Returns a raw string pointer for s, somewhat like allocate_string(s), but using the existing memory. -- NOTE: The return is only valid as long as the value passed as the parameter remains in existence. -- atom res
string s = strings[n] #ilASM{ mov eax,[s] lea edi,[res] shl eax,2 call :%pStoreMint } stringptrs[n] = res return res
end function
global procedure fixup()
var_mem = allocate(length(vars)*4) mem_set(var_mem,0,length(vars)*4) code_mem = allocate(length(code)) poke(code_mem,code) while chain!=0 do integer this = chain chain = code[this] integer ftype = code[this+1] integer id = code[this+2] switch ftype do case 1: -- vars poke4(code_mem+this-1,var_mem+(id-1)*4) case 2: -- strings poke4(code_mem+this-1,RawStringPtr(id)) case 3: -- builtins poke4(code_mem+this-1,builtins[id]-(code_mem+this+3)) end switch end while
end procedure</lang> And a simple test driver for the specific task: <lang Phix>-- -- demo\rosetta\Compiler\cgen.exw -- ============================== -- -- Generates 32-bit machine code (see note in vm.exw) --
include cgen.e
function get_var_name(atom addr)
integer n = (addr-var_mem)/4+1 if n<1 or n>length(vars) then ?9/0 end if return vars[n]
end function
function hxl(integer pc, object oh, string fmt, sequence args={}) -- helper routine to display the octal/hex bytes just decoded, -- along with the code offset and the human-readable text.
if length(args) then fmt = sprintf(fmt,args) end if sequence octhex = {} atom base = code_mem+pc integer len = 0 if integer(oh) then -- all octal for i=1 to oh do octhex = append(octhex,sprintf("0o%03o",peek(base))) base += 1 end for len = oh else -- some octal and some hex for i=1 to length(oh) by 2 do for j=1 to oh[i] do octhex = append(octhex,sprintf("0o%03o",peek(base))) base += 1 end for len += oh[i] for j=1 to oh[i+1] do octhex = append(octhex,sprintf("#%02x",peek(base))) base += 1 end for len += oh[i+1] end for end if printf(output_file,"%4d: %-30s %s\n",{pc+1,join(octhex,","),fmt}) return len
end function
constant cccodes = {"o?" ,"no?","b?" ,"ae?","z" ,"ne" ,"be?","a?", -- 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ,
"s?" ,"ns?","pe?","po?","l" ,"ge" ,"le" ,"g" }
-- 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15
constant regs = {"eax","ecx","edx"} -- (others as/when needed)
procedure decode() -- for a much more complete (and better organised) disassembler, see p2asm.e integer pc = 0, -- nb 0-based
opcode, xrm
while pc<length(code) do opcode = peek(code_mem+pc) xrm = -1 switch opcode do case 0o150: atom vaddr = peek4s(code_mem+pc+1) integer n = find(vaddr,stringptrs) object arg = iff(n?enquote(strings[n]) :sprintf("%d",vaddr)) pc += hxl(pc,{1,4},"push %s",{arg}) case 0o217: case 0o377: integer n = find(opcode,{0o217,0o377}) string op = {"pop","push"}[n] xrm = peek(code_mem+pc+1) if n!=find(xrm,{0o005,0o065}) then exit end if atom addr = peek4u(code_mem+pc+2) pc += hxl(pc,{2,4},"pop [%s]",{get_var_name(addr)}) case 0o061: case 0o071: case 0o205: integer n = find(opcode,{0o061,0o071,0o205}) string op = {"xor","cmp","test"}[n] xrm = peek(code_mem+pc+1) if and_bits(xrm,0o300)!=0o300 then exit end if string r1 = regs[and_bits(xrm,0o070)/0o010+1] string r2 = regs[and_bits(xrm,0o007)+1] pc += hxl(pc,2,"%s %s,%s",{op,r1,r2}) case 0o017: xrm = peek(code_mem+pc+1) switch xrm do case 0o224: case 0o225: case 0o234: case 0o235: case 0o236: case 0o237: string cc = cccodes[and_bits(xrm,0o017)+1] if peek(code_mem+pc+2)=0o300 then pc += hxl(pc,3,"set%s al",{cc}) else exit end if case 0o204: integer offset = peek4s(code_mem+pc+2) pc += hxl(pc,{2,4},"jz %d",{pc+6+offset+1}) else exit end switch case 0o120: case 0o122: case 0o130: case 0o131: case 0o132: string op = {"push","pop"}[find(and_bits(opcode,0o070),{0o020,0o030})] string reg = regs[and_bits(opcode,0o007)+1] pc += hxl(pc,1,"%s %s",{op,reg}) case 0o231: pc += hxl(pc,1,"cdq") case 0o164: case 0o353: string jop = iff(opcode=0o164?"jz":"jmp") integer offset = peek1s(code_mem+pc+1) pc += hxl(pc,{1,1},"%s %d",{jop,pc+2+offset+1}) case 0o351: integer offset = peek4s(code_mem+pc+1) pc += hxl(pc,{1,4},"jmp %d",{pc+5+offset+1}) case 0o303: pc += hxl(pc,1,"ret") case 0o350: integer offset = peek4s(code_mem+pc+1) atom addr = offset+code_mem+pc+5 integer n = find(addr,builtins) pc += hxl(pc,{1,4},"call :%s",{builtin_names[n]}) case 0o001: case 0o041: case 0o051: integer n = find(opcode,{0o001,0o041,0o051}) string op = {"add","and","sub"}[n] xrm = peek(code_mem+pc+1) switch xrm do case 0o004: if peek(code_mem+pc+2)=0o044 then pc += hxl(pc,3,"%s [esp],eax",{op}) else exit end if else exit end switch case 0o367: xrm = peek(code_mem+pc+1) if and_bits(xrm,0o300)!=0o300 then exit end if integer n = find(and_bits(xrm,0o070),{0o030,0o040,0o070}) if n=0 then exit end if string op = {"neg","mul","idiv"}[n] string reg = regs[and_bits(xrm,0o007)+1] pc += hxl(pc,2,"%s %s",{op,reg}) else exit end switch end while if pc<length(code) then ?"incomplete:" if xrm=-1 then ?{pc+1,sprintf("0o%03o",opcode)} else ?{pc+1,sprintf("0o%03o 0o%03o",{opcode,xrm})} end if end if
end procedure
procedure main(sequence cl)
open_files(cl) toks = lex() object t = parse() code_gen(t) fixup() decode() free({var_mem,code_mem}) close_files()
end procedure
--main(command_line()) main({0,0,"gcd.c"})</lang>
- Output:
1: 0o150,#2F,#04,#00,#00 push 1071 6: 0o217,0o005,#70,#BE,#73,#00 pop [a] 12: 0o150,#05,#04,#00,#00 push 1029 17: 0o217,0o005,#74,#BE,#73,#00 pop [b] 23: 0o377,0o065,#74,#BE,#73,#00 push [b] 29: 0o150,#00,#00,#00,#00 push 0 34: 0o061,0o300 xor eax,eax 36: 0o132 pop edx 37: 0o131 pop ecx 38: 0o071,0o321 cmp edx,ecx 40: 0o017,0o225,0o300 setne al 43: 0o120 push eax 44: 0o130 pop eax 45: 0o205,0o300 test eax,eax 47: 0o164,#32 jz 99 49: 0o377,0o065,#74,#BE,#73,#00 push [b] 55: 0o217,0o005,#78,#BE,#73,#00 pop [new_a] 61: 0o377,0o065,#70,#BE,#73,#00 push [a] 67: 0o377,0o065,#74,#BE,#73,#00 push [b] 73: 0o131 pop ecx 74: 0o130 pop eax 75: 0o231 cdq 76: 0o367,0o371 idiv ecx 78: 0o122 push edx 79: 0o217,0o005,#74,#BE,#73,#00 pop [b] 85: 0o377,0o065,#78,#BE,#73,#00 push [new_a] 91: 0o217,0o005,#70,#BE,#73,#00 pop [a] 97: 0o353,#B4 jmp 23 99: 0o377,0o065,#70,#BE,#73,#00 push [a] 105: 0o350,#2F,#49,#0B,#00 call :printi 110: 0o303 ret
Python
Tested with Python 2.7 and 3.x <lang Python>from __future__ import print_function import sys, struct, shlex, operator
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \ nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \ nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)
all_syms = {
"Identifier" : nd_Ident, "String" : nd_String, "Integer" : nd_Integer, "Sequence" : nd_Sequence, "If" : nd_If, "Prtc" : nd_Prtc, "Prts" : nd_Prts, "Prti" : nd_Prti, "While" : nd_While, "Assign" : nd_Assign, "Negate" : nd_Negate, "Not" : nd_Not, "Multiply" : nd_Mul, "Divide" : nd_Div, "Mod" : nd_Mod, "Add" : nd_Add, "Subtract" : nd_Sub, "Less" : nd_Lss, "LessEqual" : nd_Leq, "Greater" : nd_Gtr, "GreaterEqual": nd_Geq, "Equal" : nd_Eql, "NotEqual" : nd_Neq, "And" : nd_And, "Or" : nd_Or}
FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND, OR, NEG, NOT, \ JMP, JZ, PRTC, PRTS, PRTI, HALT = range(24)
operators = {nd_Lss: LT, nd_Gtr: GT, nd_Leq: LE, nd_Geq: GE, nd_Eql: EQ, nd_Neq: NE,
nd_And: AND, nd_Or: OR, nd_Sub: SUB, nd_Add: ADD, nd_Div: DIV, nd_Mul: MUL, nd_Mod: MOD}
unary_operators = {nd_Negate: NEG, nd_Not: NOT}
input_file = None code = bytearray() string_pool = {} globals = {} string_n = 0 globals_n = 0 word_size = 4
- show error and exit
def error(msg):
print("%s" % (msg)) exit(1)
def int_to_bytes(val):
return struct.pack("<i", val)
def bytes_to_int(bstr):
return struct.unpack("<i", bstr)
class Node:
def __init__(self, node_type, left = None, right = None, value = None): self.node_type = node_type self.left = left self.right = right self.value = value
def make_node(oper, left, right = None):
return Node(oper, left, right)
def make_leaf(oper, n):
return Node(oper, value = n)
def emit_byte(x):
code.append(x)
def emit_word(x):
s = int_to_bytes(x) for x in s: code.append(x)
def emit_word_at(at, n):
code[at:at+word_size] = int_to_bytes(n)
def hole():
t = len(code) emit_word(0) return t
def fetch_var_offset(name):
global globals_n
n = globals.get(name, None) if n == None: globals[name] = globals_n n = globals_n globals_n += 1 return n
def fetch_string_offset(the_string):
global string_n
n = string_pool.get(the_string, None) if n == None: string_pool[the_string] = string_n n = string_n string_n += 1 return n
def code_gen(x):
if x == None: return elif x.node_type == nd_Ident: emit_byte(FETCH) n = fetch_var_offset(x.value) emit_word(n) elif x.node_type == nd_Integer: emit_byte(PUSH) emit_word(x.value) elif x.node_type == nd_String: emit_byte(PUSH) n = fetch_string_offset(x.value) emit_word(n) elif x.node_type == nd_Assign: n = fetch_var_offset(x.left.value) code_gen(x.right) emit_byte(STORE) emit_word(n) elif x.node_type == nd_If: code_gen(x.left) # expr emit_byte(JZ) # if false, jump p1 = hole() # make room for jump dest code_gen(x.right.left) # if true statements if (x.right.right != None): emit_byte(JMP) # jump over else statements p2 = hole() emit_word_at(p1, len(code) - p1) if (x.right.right != None): code_gen(x.right.right) # else statements emit_word_at(p2, len(code) - p2) elif x.node_type == nd_While: p1 = len(code) code_gen(x.left) emit_byte(JZ) p2 = hole() code_gen(x.right) emit_byte(JMP) # jump back to the top emit_word(p1 - len(code)) emit_word_at(p2, len(code) - p2) elif x.node_type == nd_Sequence: code_gen(x.left) code_gen(x.right) elif x.node_type == nd_Prtc: code_gen(x.left) emit_byte(PRTC) elif x.node_type == nd_Prti: code_gen(x.left) emit_byte(PRTI) elif x.node_type == nd_Prts: code_gen(x.left) emit_byte(PRTS) elif x.node_type in operators: code_gen(x.left) code_gen(x.right) emit_byte(operators[x.node_type]) elif x.node_type in unary_operators: code_gen(x.left) emit_byte(unary_operators[x.node_type]) else: error("error in code generator - found %d, expecting operator" % (x.node_type))
def code_finish():
emit_byte(HALT)
def list_code():
print("Datasize: %d Strings: %d" % (len(globals), len(string_pool)))
for k in sorted(string_pool, key=string_pool.get): print(k)
pc = 0 while pc < len(code): print("%4d " % (pc), end=) op = code[pc] pc += 1 if op == FETCH: x = bytes_to_int(code[pc:pc+word_size])[0] print("fetch [%d]" % (x)); pc += word_size elif op == STORE: x = bytes_to_int(code[pc:pc+word_size])[0] print("store [%d]" % (x)); pc += word_size elif op == PUSH: x = bytes_to_int(code[pc:pc+word_size])[0] print("push %d" % (x)); pc += word_size elif op == ADD: print("add") elif op == SUB: print("sub") elif op == MUL: print("mul") elif op == DIV: print("div") elif op == MOD: print("mod") elif op == LT: print("lt") elif op == GT: print("gt") elif op == LE: print("le") elif op == GE: print("ge") elif op == EQ: print("eq") elif op == NE: print("ne") elif op == AND: print("and") elif op == OR: print("or") elif op == NEG: print("neg") elif op == NOT: print("not") elif op == JMP: x = bytes_to_int(code[pc:pc+word_size])[0] print("jmp (%d) %d" % (x, pc + x)); pc += word_size elif op == JZ: x = bytes_to_int(code[pc:pc+word_size])[0] print("jz (%d) %d" % (x, pc + x)); pc += word_size elif op == PRTC: print("prtc") elif op == PRTI: print("prti") elif op == PRTS: print("prts") elif op == HALT: print("halt") else: error("list_code: Unknown opcode %d", (op));
def load_ast():
line = input_file.readline() line_list = shlex.split(line, False, False)
text = line_list[0] if text == ";": return None node_type = all_syms[text]
if len(line_list) > 1: value = line_list[1] if value.isdigit(): value = int(value) return make_leaf(node_type, value)
left = load_ast() right = load_ast() return make_node(node_type, left, right)
- main driver
input_file = sys.stdin if len(sys.argv) > 1:
try: input_file = open(sys.argv[1], "r", 4096) except IOError as e: error("Can't open %s" % sys.argv[1])
n = load_ast() code_gen(n) code_finish() list_code()</lang>
- Output — While counter example:
Datasize: 1 Strings: 2 "count is: " "\n" 0 push 1 5 store [0] 10 fetch [0] 15 push 10 20 lt 21 jz (43) 65 26 push 0 31 prts 32 fetch [0] 37 prti 38 push 1 43 prts 44 fetch [0] 49 push 1 54 add 55 store [0] 60 jmp (-51) 10 65 halt
zkl
<lang zkl>// This is a little endian machine
const WORD_SIZE=4; const{ var _n=-1; var[proxy]N=fcn{ _n+=1 }; } // enumerator const FETCH=N, STORE=N, PUSH=N, ADD=N, SUB=N, MUL=N, DIV=N, MOD=N,
LT=N, GT=N, LE=N, GE=N, EQ=N, NE=N, AND=N, OR=N, NEG=N, NOT=N, JMP=N, JZ=N, PRTC=N, PRTS=N, PRTI=N, HALT=N;
const nd_String=N, nd_Sequence=N, nd_If=N, nd_While=N; var all_syms=Dictionary(
"Identifier" ,FETCH, "String" ,nd_String, "Integer" ,PUSH, "Sequence" ,nd_Sequence, "If" ,nd_If, "Prtc" ,PRTC, "Prts" ,PRTS, "Prti" ,PRTI, "While" ,nd_While, "Assign" ,STORE, "Negate" ,NEG, "Not" ,NOT, "Multiply" ,MUL, "Divide" ,DIV, "Mod" ,MOD, "Add" ,ADD, "Subtract" ,SUB, "Less" ,LT, "LessEqual" ,LE, "Greater" ,GT, "GreaterEqual",GE, "Equal" ,EQ, "NotEqual" ,NE, "And" ,AND, "Or" ,OR, "halt" ,HALT);
var binOps=T(LT,GT,LE,GE,EQ,NE, AND,OR, SUB,ADD,DIV,MUL,MOD),
unaryOps=T(NEG,NOT);
class Node{
fcn init(_node_type, _value, _left=Void, _right=Void){ var type=_node_type, left=_left, right=_right, value=_value; }
}
var vars=Dictionary(), strings=Dictionary(); // ( value:offset, ...) fcn doVar(value){
var offset=-1; // fcn local static var offset=_doValue(value,vars,offset)
} fcn doString(str){ str=str[1,-1]; // str is \"text\"
var offset=-1; // fcn local static var str=str.replace("\\n","\n"); offset=_doValue(str,strings,offset)
} fcn _doValue(value,vars,offset){ //--> offset of value in vars
if(Void!=(n:=vars.find(value))) return(n); // fetch existing value vars[value]=offset+=1; // store new value
}
fcn asm(node,code){
if(Void==node) return(code); emitB:='wrap(n){ code.append(n) }; emitW:='wrap(n){ code.append(n.toLittleEndian(WORD_SIZE)) }; // signed switch(node.type){ case(FETCH) { emitB(FETCH); emitW(doVar(node.value)); } case(PUSH) { emitB(PUSH); emitW(node.value); } case(nd_String){ emitB(PUSH); emitW(doString(node.value)); } case(STORE){ asm(node.right,code);
emitB(STORE); emitW(doVar(node.left.value));
} case(nd_If){
asm(node.left,code); # expr emitB(JZ); # if false, jump p1,p2 := code.len(),0; emitW(0); # place holder for jump dest asm(node.right.left,code); # if true statements if (node.right.right!=Void){ emitB(JMP); # jump over else statements p2=code.len(); emitW(0); } code[p1,WORD_SIZE]=(code.len() - p1).toLittleEndian(WORD_SIZE); if(node.right.right!=Void){ asm(node.right.right,code); # else statements code[p2,WORD_SIZE]=(code.len() - p2).toLittleEndian(WORD_SIZE) }
} case(nd_While){
p1:=code.len(); asm(node.left,code); emitB(JZ); p2:=code.len(); emitW(0); # place holder asm(node.right,code); emitB(JMP); # jump back to the top emitW(p1 - code.len()); code[p2,WORD_SIZE]=(code.len() - p2).toLittleEndian(WORD_SIZE);
} case(nd_Sequence){ asm(node.left,code); asm(node.right,code); } case(PRTC,PRTI,PRTS){ asm(node.left,code); emitB(node.type); } else{
if(binOps.holds(node.type)){ asm(node.left,code); asm(node.right,code); emitB(node.type); } else if(unaryOps.holds(node.type)) { asm(node.left,code); emitB(node.type); } else throw(Exception.AssertionError( "error in code generator - found %d, expecting operator" .fmt(node.type)))
} } code
} fcn code_finish(code){
code.append(HALT); // prepend the strings to the code, // using my magic [66,1 byte len,text], no trailing '\0' needed idxs:=strings.pump(Dictionary(),"reverse"); idxs.keys.sort().reverse().pump(Void,'wrap(n){ text:=idxs[n]; code.insert(0,66,text.len(),text); })
}</lang> <lang zkl>fcn unasm(code){
all_ops,nthString := all_syms.pump(Dictionary(),"reverse"),-1; println("Datasize: %d bytes, Strings: %d bytes" .fmt(vars.len()*WORD_SIZE,strings.reduce(fcn(s,[(k,v)]){ s+k.len() },0))); word:='wrap(pc){ code.toLittleEndian(pc,WORD_SIZE,False) }; // signed pc:=0; while(pc<code.len()){ op:=code[pc]; print("%4d: %2d ".fmt(pc,op)); pc+=1; switch(op){ case(66){
n,str := code[pc], code[pc+=1,n].text; println("String #%d %3d \"%s\"".fmt(nthString+=1,n, Compiler.Asm.quotify(str))); pc+=n; }
case(FETCH,STORE,PUSH){
println("%s [%d]".fmt(all_ops[op],word(pc))); pc+=WORD_SIZE; } case(ADD,SUB,MUL,DIV,MOD,LT,GT,LE,GE,EQ,NE,AND,OR,NEG,NOT, PRTC,PRTI,PRTS,HALT){ println(all_ops[op]) }
case(JMP){
n:=word(pc);
println("jmp (%d) %d".fmt(n, pc + n)); pc+=WORD_SIZE;
} case(JZ){ n:=word(pc);
println("jz (%d) %d".fmt(n, pc + n)); pc+=WORD_SIZE;
} else throw(Exception.AssertionError("Unknown opcode %d".fmt(op)));
} }
}</lang> <lang zkl>fcn load_ast(file){
line:=file.readln().strip(); // one or two tokens if(line[0]==";") return(Void); parts,type,value := line.split(),parts[0],parts[1,*].concat(" "); type=all_syms[type]; if(value){ try{ value=value.toInt() }catch{} return(Node(type,value)); } left,right := load_ast(file),load_ast(file); Node(type,Void,left,right)
}</lang> <lang zkl>ast:=load_ast(File(vm.nthArg(0))); code:=asm(ast,Data()); code_finish(code); unasm(code); File("code.bin","wb").write(code); println("Wrote %d bytes to code.bin".fmt(code.len()));</lang> File ast.txt is the text at the start of this task.
- Output:
$ zkl codeGen.zkl ast.txt Datasize: 4 bytes, Strings: 11 bytes 0: 66 String #0 10 "\ncount is:" 12: 66 String #1 1 "\n" 15: 2 Integer [1] 20: 1 Assign [0] 25: 0 Identifier [0] 30: 2 Integer [10] 35: 8 LessEqual 36: 19 jz (43) 80 41: 2 Integer [0] 46: 21 Prts 47: 0 Identifier [0] 52: 22 Prti 53: 2 Integer [1] 58: 21 Prts 59: 0 Identifier [0] 64: 2 Integer [1] 69: 3 Add 70: 1 Assign [0] 75: 18 jmp (-51) 25 80: 23 halt Wrote 81 bytes to code.bin $ zkl hexDump code1.bin 0: 42 0a 63 6f 75 6e 74 20 | 69 73 3a 20 42 01 0a 02 B.count is: B... 16: 01 00 00 00 01 00 00 00 | 00 00 00 00 00 00 02 0a ................ 32: 00 00 00 08 13 2b 00 00 | 00 02 00 00 00 00 15 00 .....+.......... 48: 00 00 00 00 16 02 01 00 | 00 00 15 00 00 00 00 00 ................ 64: 02 01 00 00 00 03 01 00 | 00 00 00 12 cd ff ff ff ................ 80: 17