deliverables/report/gen.tex

   1 \section{Code generation}\label{sec:gen}
   2 \subsection{Binary Representation}
   3 \SPLC{} uses a simple stack layout in which all functions have their own stack
   4 frame and all the arguments and local variables are present within this frame.
   5 Arguments and local variables are either saved on the stack directly, or as a
   6 pointer to the heap. This means that every allocated variable uses precisely
   7 one stack address.
   8
   9 Simple values, i.e. \SI{Int}s, \SI{Bool}s and \SI{Char}s are stored directly on
  10 the stack in their representative numeric value. Values on the stack in \SSM{}
  11 are always numbers. For \SI{Int}s we use standard signed integers. For
  12 characters \textsc{ASCII} values are used. A boolean true is
  13 stored as a negative one and the boolean false is stored as a zero.
  14
  15 The heap simply grows with each added value, currently no garbage collection is
  16 done. Functions, lists and tuples are currently stored in the heap. Lists are
  17 represented simply as linked lists, by first storing the pointer to the next
  18 list element and having the current list element in front of that pointer. The
  19 empty list is represented as the null pointer. Tuples are stored by
  20 storing the two values in succession on the heap and pointing to the first of
  21 the two. Detailed description of function storage is explained in
  22 Subsection~\ref{ssec:high}.
  23
  24 When calling a functions its arguments are placed on the stack, after which a
  25 \tt{bsr} instruction calls the function. The function can then load its
  26 arguments, if and when these are needed in an expression, by copying them from
  27 below its stack frame. The \tt{bsr} instruction is an instruction that works
  28 together with the \tt{ret} instruction. \tt{bsr} places the previous program
  29 counter on the stack so that it can be restored with a \tt{ret}.
  30
  31 Note that the combination of this copying behaviour and passing complex values
  32 as heap pointers means that in some cases \SPLC{} programs behave as
  33 pass-by-value while in others it behave as pass-by-reference. Simple values
  34 are trivially pass-by-value, whilst lists and tuples can behave as both,
  35 depending on the usage. E.g. assigning a new list/tuple to an argument will
  36 behave as pass-by-value, whilst changing the argument through a field selector
  37 will behave as pass by reference, thus the pointer is passed by value.
  38
  39 Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the
  40 two functions. the layout shown is the layout of the stack just before the
  41 \tt{add} instruction to add x and y in plus. Note that, in compliance with
  42 \SSM{}s calling model, the return address of a function is saved below the
  43 stack frame of that function. When plus has returned, the two arguments are
  44 popped of the stack and the value in \tt{RR} (for functions which do not return
  45 \tt{Void}) is placed on the stack.
  46
  47 \begin{lstlisting}[
  48         label={lst:stackFunCall},
  49         caption={Stack layout during function calls},
  50         language=SPLCode]
  51 plus(x, y}{ return x + y; }
  52
  53 main(){ var a = plus(3, 4); }
  54
  55 /*# | Stack
  56 111 | 002   ;return address of main
  57 ;begin stack frame of main
  58 112 | 110       ;Frame pointer for main; points to bottom of stack
  59 113 | 003       ;arg1 for plus
  60 114 | 004       ;arg2 for plus
  61 115 | 0fa       ;return address of plus
  62 ;begin stack frame of plus
  63 116 | 112       ;Frame pointer for plus; points to mains stack frame
  64 117 | 003       ;x, loaded by ldl -3
  65 118 | 004       ;y, loaded by ldl -2
  66 */
  67 \end{lstlisting}
  68
  69 \subsection{Generation}
  70 The code generation phase transforms a well typed \AST{} to either an error,
  71 or a \tt{SSMProgram}. The latter is a list of labels and instructions. The
  72 generation happens within a\\
  73 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}\\
  74 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
  75 reading environment. In this \tt{Addressbook} is a simple map from
  76 identifiers to \tt{Address}, which in turn is either a label to jump to or an
  77 address where a variable can be found (relative to the current frame). The list
  78 of labels is simply an infinite generator for fresh labels, used when
  79 generating code for e.g. an if-statement.
  80
  81 Code generation is done by walking through the \AST{} depth first and
  82 generating instructions for each encountered element. E.g. generation for an
  83 \tt{Op2Expr} is simply recursively generating for each element of the
  84 expression: \CI{g e1 >>| g e2 >>| g op}. For example the snippet in
  85 Listing~\ref{lst:codegen} shows the code generation instance for the
  86 implementation of the \SI{:} operator. First we generate the code for the left
  87 part of the cons operator, the head, and afterwards the right side of the
  88 operator, the tail. They are glued together by pushing the head and the tail
  89 subsequently on the heap and what we are left with is the pointer to the tail.
  90
  91 \begin{lstlisting}[
  92         language=Clean,
  93         label={lst:codegen},
  94         caption={Code generation of the cons operator}]
  95 instance g Expr where
  96         ...
  97         g (Op2Expr _ e1 BiCons e2) = g e2 >>| g e1
  98                 >>| tell [Instr "sth" [] ""]
  99                 >>| tell [Instr "ajs" [Lit -1] ""]
 100                 >>| tell [Instr "sth" [] ""]
 101         ...
 102 \end{lstlisting}
 103
 104 \subsection{Higher order functions}\label{ssec:high}
 105 As extension this compiler can handle function types. Function types are
 106 pointers to functions that may or may not have arguments already curried in. We
 107 use the simple approach instead of the full fledged closure approach because of
 108 time reasons. Adding the functionality in the lexer and parser was trivial
 109 because of the modular nature. There was one small distinction that we had to
 110 make which was the difference between a $0$-arity function call and a $0$-arity
 111 function pointer. Currently \SI{read} is parsed as a function pointer and
 112 \SI{read()} is parsed as a function call. Adding the functionality in the
 113 semantic analysis was actually done from the start since the
 114 \emph{Hindley-Milner} algorithm already supported it.
 115
 116 As all other data types, a function type object occupies one item on the stack
 117 which contains a heap pointer. The heap pointer points to a position on the
 118 heap that stores the function identification number, the number of arguments
 119 already given and finally the argument values.  For example the snippet in
 120 Listing~\ref{lst:highex} shows the representation of the incomplete function
 121 \SI{plus} that already has two arguments.
 122
 123 \begin{lstlisting}[
 124         language=SPLCode,
 125         label={lst:highex},
 126         caption={Heap layout of curried functions}]
 127 //SPLCode
 128 plus(x, y, z){ return x + y + z; }
 129
 130 main(){
 131         var a = plus(4, 3);
 132 }
 133
 134 /*# | Heap
 135 001 | 008    //Function identification number
 136 002 | 002    //Available arguments
 137 003 | 004    //Argument 1
 138 004 | 003    //Argument 2
 139 */
 140 \end{lstlisting}
 141
 142 When an incomplete function is declared we save the remaining arity in our
 143 state to be able to know when a function is complete and can be executed. When
 144 a function is still not complete the function is copied in totality and the
 145 available arguments position on the heap is increased accordingly and the
 146 arguments are added on the heap. The reason we store the number of already
 147 given arguments in the heap is because we need that information at run-time
 148 since you can not know this at compile-time.
 149
 150 Normally we jump to functions using labels. But since we can not put labels on
 151 the heap or stack we have introduced a function in the preamble that translate
 152 integers to the corresponding label called \SI{1func}. When we want to
 153 call a function that is identified by an address instead of a label we first
 154 collect all the arguments from the heap and add the arguments given in the
 155 function call. Calculating the number of arguments that have to be popped from
 156 the heap is done at runtime using the \emph{Available Arguments} value stored
 157 on the heap. Instead of jumping to a label we jump with a \texttt{bsr} to the
 158 \SI{1func} function and make sure the identification is situated in register
 159 R5. \SI{1func} then makes sure the correct function is branched to with a
 160 \texttt{bra} and the caller can resume the normal execution path. In this way
 161 the function can still use the \texttt{ret} instruction to leave the function.