deliverables/report/gen.tex

   1 \section{Code generation}\label{sec:gen}
   2 \subsection{Binary Representation}
   3 \SPLC{} uses a simple stack layout in which all functions have their own stack
   4 frame and all the arguments and local variables are present within this frame.
   5 Arguments and local variables are either saved on the stack directly, or as a
   6 pointer to the heap. This means that every allocated variable uses precisely
   7 one stack address.
   8
   9 Simple values, i.e. \SI{Int}s, \SI{Bool}s and \SI{Char}s are stored directly on
  10 the stack in their representative numeric value. Values on the stack in \SSM{}
  11 are always numbers. For \SI{Int}s we use standard signed integers. For
  12 characters \textsc{ASCII} values are used. A boolean true is
  13 stored as a negative one and the boolean false is stored as a zero.
  14
  15 The heap simply grows with each added value, currently no garbage collection is
  16 done. Functions, lists and tuples are currently stored in the heap. Lists are
  17 represented simply as linked lists, by first storing the pointer to the next
  18 list element and having the current list element in front of that pointer. The
  19 empty list is represented as the null pointer. Tuples are stored by
  20 storing the two values in succession on the heap and pointing to the first of
  21 the two. Detailed description of function storage is explained in
  22 Subsection~\ref{ssec:high}.
  23
  24 When calling a functions its arguments are placed on the stack, after which a
  25 \tt{bsr} instruction calls the function. The function can then load its
  26 arguments, if and when these are needed in an expression, by copying them from
  27 below its stack frame. The \tt{bsr} instruction is an instruction that works
  28 together with the \tt{ret} instruction. \tt{bsr} places the previous program
  29 counter on the stack so that it can be restored with a \tt{ret}.
  30
  31 Note that the combination of this copying behaviour and passing complex values
  32 as heap pointers means that in some cases \SPLC{} programs behave as
  33 pass-by-value while in others it behave as pass-by-reference. Simple values
  34 are trivially pass-by-value, whilst lists and tuples can behave as both,
  35 depending on the usage. E.g. assigning a new list/tuple to an argument will
  36 behave as pass-by-value, whilst changing the argument through a field selector
  37 will behave as pass by reference, thus the pointer is passed by value.
  38
  39 Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the
  40 two functions. the layout shown is the layout of the stack just before the
  41 \tt{add} instruction to add x and y in plus. Note that, in compliance with
  42 \SSM{}s calling model, the return address of a function is saved below the
  43 stack frame of that function. When plus has returned, the two arguments are
  44 popped of the stack and the value in \tt{RR} (for functions which do not return
  45 \tt{Void}) is placed on the stack.
  46
  47 \begin{SPLCode} \label{lst:stackFunCall}
  48 plus(x, y}{ return x + y; }
  49
  50 main(){ var a = plus(3, 4); }
  51
  52 /*# | Stack
  53 111 | 002   ;return address of main
  54 ;begin stack frame of main
  55 112 | 110       ;Frame pointer for main; points to bottom of stack
  56 113 | 003       ;arg1 for plus
  57 114 | 004       ;arg2 for plus
  58 115 | 0fa       ;return address of plus
  59 ;begin stack frame of plus
  60 116 | 112       ;Frame pointer for plus; points to mains stack frame
  61 117 | 003       ;x, loaded by ldl -3
  62 118 | 004       ;y, loaded by ldl -2
  63 */
  64 \end{SPLCode}
  65
  66 \subsection{Generation}
  67 The code generation phase transforms a well typed \AST{} to either an error,
  68 or a \tt{SSMProgram}. The latter is a list of labels and instructions. The
  69 generation happens within a
  70 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}
  71 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
  72 reading environment. In this \tt{Addressbook} is a simple map from
  73 identifiers to \tt{Address}, which in turn is either a label to jump to or an
  74 address where a variable can be found (relative to the current frame). The list
  75 of labels is simply an infinite generator for fresh labels, used when
  76 generating code for e.g. an if-statement.
  77
  78 Code generation is done by walking through the \AST{} depth first and
  79 generating instructions for each encountered element. E.g. generation for an
  80 \tt{Op2Expr} is simply recursively generating for each element of the
  81 expression: \CI{g e1 >>| g e2 >>| g op}. For example the snippet in
  82 Listing~\ref{lst:codegen} shows the code generation instance for the
  83 implementation of the \SI{:} operator. First we generate the code for the left
  84 part of the cons operator, the head, and afterwards the right side of the
  85 operator, the tail. They are glued together by pushing the head and the tail
  86 subsequently on the heap and what we are left with is the pointer to the tail.
  87
  88 \begin{lstlisting}[
  89         language=Clean,
  90         label={lst:codegen},
  91         caption={Code generation of the cons operator}]
  92 instance g Expr where
  93         ...
  94         g (Op2Expr _ e1 BiCons e2) = g e2 >>| g e1
  95                 >>| tell [Instr "sth" [] ""]
  96                 >>| tell [Instr "ajs" [Lit -1] ""]
  97                 >>| tell [Instr "sth" [] ""]
  98         ...
  99 \end{lstlisting}
 100
 101 \subsection{Higher order functions}\label{ssec:high}
 102 The nature of the type checking algorithm already included type checking and
 103 inferring the type of higher order functions. Since we allow constants there is
 104 a difference between \SI{read} and \SI{read()}. To make this difference we
 105 introduced a separate type of expression called a \SI{FuncType}, this type
 106 encodes a function with arity $0$.
 107
 108 As all other data types, a function type object occupies one item on the stack
 109 which contains a heap pointer. The heap pointer points to a position on the
 110 heap that stores, in increasing addresses, the function identification number,
 111 the number of arguments already given and finally the argument values.
 112 For example the following snippet shows the representation of the incomplete
 113 function \SI{plus} that already has two arguments.
 114 \begin{SPLCode}
 115 //SPLCode
 116 plus(x, y, z){ return x + y + z; }
 117
 118 main(){
 119         var a = plus(4, 3);
 120 }
 121
 122 /*# | Stack
 123 001 | 001    //Pointer to the
 124 ... |
 125
 126   # | Heap
 127 001 | 008    //Function identification number
 128 002 | 002    //Available arguments
 129 003 | 004    //Argument 1
 130 004 | 003    //Argument 2
 131 */
 132 \end{SPLCode}
 133
 134 When an incomplete function is declared we save the remaining arity in our
 135 state to be able to know when a function is complete and can be executed. When
 136 a function is still not complete the function is copied in totality and the
 137 available arguments position on the heap is increased accordingly and the
 138 arguments are added on the heap.
 139
 140 Function numbers are integers that identify the function. Normally we address
 141 functions using labels but it is not possible to store labels on the heap or on
 142 the stack. Therefore we include a function in the preamble that is basically a
 143 dictionary from integers to labels. All functions have their unique number that
 144 identifies them via the function dictionary called \SI{1func}. When we want to
 145 call a function that is identified by an address instead of a label we first
 146 collect all the arguments from the heap and add the arguments given in the
 147 function call. Calculating the number of arguments that have to be popped from
 148 the heap is done at runtime using the \emph{Available Arguments} value stored
 149 on the heap. Instead of jumping to a label we jump to the \SI{1func} function
 150 and make sure the identification is situated in register R5. \SI{1func} then
 151 makes sure the correct function is branched to and the caller can resume the
 152 normal execution path.