deliverables/report/gen.tex

   1 \section{Code generation}\label{sec:gen}
   2 \subsection{Binary Representation}
   3 \SPLC{} uses a simple stack layout in which all functions have their own stack
   4 frame and all the arguments and local variables are present within this frame.
   5 Arguments and local variables are either saved on the stack directly, or as a
   6 pointer to the heap. Simple values, i.e. Ints, Bools and Chars are stored
   7 directly on the stack, while other values are stored on the heap.
   8
   9 The heap simply grows with each added value, currently no garbage collection is
  10 done. Lists are represented simply as linked lists, by first storing the pointer
  11 to the next list element and then the current list element. Tuples are simply
  12 stored by staring the two values in succession.
  13
  14 When calling a functions its arguments are placed on the stack, after which
  15 a \tt{bsr} instruction calls the function. The functions can then load its
  16 arguments, if and when these are needed in an expression, by copying them
  17 from below its stack frame.
  18
  19 Note that the combination of this copying behaviour and passing complex values
  20 as heap pointers means that in some cases \SPLC{} programs behave as
  21 pass-by-value while in others it behave as pass-by-reference. Simple values
  22 are trivially pass-by-value, whilst lists and tuples can behave as both,
  23 depending on the usage. E.g. assigning a new list/tuple to an argument will
  24 behave as pass-by-value, whilst changing the argument through a field selector
  25 will behave as pass by reference.
  26
  27 Listing~\ref{lst:stackFunCall} shows the layout of the stack executing the
  28 two functions. the layout shown is the layout of the stack just before
  29 the \tt{add} instruction to add x and y. Note that, in compliance with \SSM{}s
  30 calling model, the return address of a function is saved below the stack frame
  31 of that function. When plus has returned the two arguments are popped of the
  32 stack and the value in \tt{RR} (for functions which do not return \tt{Void})
  33 is placed on the stack.
  34
  35 \begin{SPLCode} \label{lst:stackFunCall}
  36 plus(x, y}{ return x + y; }
  37
  38 main(){ var a = plus(3, 4); }
  39
  40 /*# | Stack
  41 111 | 002   ;return address of main
  42 ;begin stack frame of main
  43 112 | 110       ;Frame pointer for main; points to bottom of stack
  44 113 | 003       ;arg1 for plus
  45 114 | 004       ;arg2 for plus
  46 115 | 0fa       ;return address of plus
  47 ;begin stack frame of plus
  48 116 | 112       ;Frame pointer for plus; points to mains stack frame
  49 117 | 003       ;x, loaded by ldl -3
  50 118 | 004       ;y, loaded by ldl -2
  51 */
  52 \end{SPLCode}
  53
  54 \subsection{Generation}
  55 The code generation phase transforms a well typed \AST{} to either an error,
  56 or an \tt{SSMProgram}The latter is a list of labels and instructions. The
  57 generation happens within a
  58 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}
  59 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
  60 reading environment. In this \tt{Addressbook} is a simple map from
  61 identifiers to \tt{Address}, which in turn is either a label to jump to or an
  62 address where a variable can be found (relative to the current frame). The list
  63 of labels is simply an infinite generator for fresh labels, used when generating
  64 code for e.g. an if-statement.
  65
  66 Code generation is done by walking through the \AST{} depth first and generating
  67 instructions for each encountered element. E.g. generation for an \tt{Op2Expr}
  68 is simply recursively generating for each element of the expression:
  69 \CI{g e1 >>| g e2 >>| g op}.
  70
  71 \subsection{Higher order functions}
  72 The nature of the type checking algorithm already included type checking and
  73 inferring the type of higher order functions. Since we allow constants there is
  74 a difference between \SI{read} and \SI{read()}. To make this difference we
  75 introduced a separate type of expression called a \SI{FuncType}, this type
  76 encodes a function with arity $0$.
  77
  78 As all other data types a function type object occupies one item on the stack
  79 which contains a heap pointer. The heap pointer points to a position on the
  80 heap that stores, in increasing addresses, the function identification number,
  81 the number of arguments already given and finally the argument values.
  82 For example the following snippet shows the representation of the incomplete
  83 function \SI{plus} that already has two arguments.
  84 \begin{SPLCode}
  85 //SPLCode
  86 plus(x, y, z){ return x + y + z; }
  87
  88 main(){
  89         var a = plus(4, 3);
  90 }
  91
  92 /*# | Stack
  93 001 | 001    //Pointer to the
  94 ... |
  95
  96   # | Heap
  97 001 | 008    //Function identification number
  98 002 | 002    //Available arguments
  99 003 | 004    //Argument 1
 100 004 | 003    //Argument 2
 101 */
 102 \end{SPLCode}
 103
 104 When an incomplete function is declared we save the remaining arity in our
 105 state to be able to know when a function is complete and can be executed. When
 106 a function is still not complete the function is copied in totality and the
 107 available arguments position on the heap is increased accordingly and the
 108 arguments are added on the heap.
 109
 110 Function numbers are integers that identify the function. Normally we address
 111 functions using labels but it is not possible to store labels on the heap or on
 112 the stack. Therefore we include a function in the preamble that is basically a
 113 dictionary from integers to labels. All functions have their unique number that
 114 identifies them via the function dictionary called \SI{1func}. When we want to
 115 call a function that is identified by an address instead of a label we first
 116 collect all the arguments from the heap and add the arguments given in the
 117 function call. Calculating the number of arguments that have to be popped from
 118 the heap is done at runtime using the \emph{Available Arguments} value stored
 119 on the heap. Instead of jumping to a label we jump to the \SI{1func} function
 120 and make sure the identification is situated in register R5. \SI{1func} then
 121 makes sure the correct function is branched to and the caller can resume the
 122 normal execution path.