update gen
[cc1516.git] / deliverables / report / gen.tex
1 \section{Code generation}\label{sec:gen}
2 \subsection{Binary Representation}
3 \SPLC{} uses a simple stack layout in which all functions have their own stack
4 frame and all the arguments and local variables are present within this frame.
5 Arguments and local variables are either saved on the stack directly, or as a
6 pointer to the heap. This means that every allocated variable uses precisely
7 one stack address.
8
9 Simple values, i.e. \SI{Int}s, \SI{Bool}s and \SI{Char}s are stored directly on
10 the stack in their representative numeric value. Values on the stack in \SSM{}
11 are always numbers. For \SI{Int}s we use standard signed integers. For
12 characters \textsc{ASCII} values are used. A boolean true is
13 stored as a negative one and the boolean false is stored as a zero.
14
15 The heap simply grows with each added value, currently no garbage collection is
16 done. Functions, lists and tuples are currently stored in the heap. Lists are
17 represented simply as linked lists, by first storing the pointer to the next
18 list element and having the current list element in front of that pointer. The
19 empty list is represented as the null pointer. Tuples are stored by
20 storing the two values in succession on the heap and pointing to the first of
21 the two. Detailed description of function storage is explained in
22 Subsection~\ref{ssec:high}.
23
24 When calling a functions its arguments are placed on the stack, after which a
25 \tt{bsr} instruction calls the function. The function can then load its
26 arguments, if and when these are needed in an expression, by copying them from
27 below its stack frame. The \tt{bsr} instruction is an instruction that works
28 together with the \tt{ret} instruction. \tt{bsr} places the previous program
29 counter on the stack so that it can be restored with a \tt{ret}.
30
31 Note that the combination of this copying behaviour and passing complex values
32 as heap pointers means that in some cases \SPLC{} programs behave as
33 pass-by-value while in others it behave as pass-by-reference. Simple values
34 are trivially pass-by-value, whilst lists and tuples can behave as both,
35 depending on the usage. E.g. assigning a new list/tuple to an argument will
36 behave as pass-by-value, whilst changing the argument through a field selector
37 will behave as pass by reference, thus the pointer is passed by value.
38
39 Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the
40 two functions. the layout shown is the layout of the stack just before the
41 \tt{add} instruction to add x and y in plus. Note that, in compliance with
42 \SSM{}s calling model, the return address of a function is saved below the
43 stack frame of that function. When plus has returned, the two arguments are
44 popped of the stack and the value in \tt{RR} (for functions which do not return
45 \tt{Void}) is placed on the stack.
46
47 \begin{SPLCode} \label{lst:stackFunCall}
48 plus(x, y}{ return x + y; }
49
50 main(){ var a = plus(3, 4); }
51
52 /*# | Stack
53 111 | 002 ;return address of main
54 ;begin stack frame of main
55 112 | 110 ;Frame pointer for main; points to bottom of stack
56 113 | 003 ;arg1 for plus
57 114 | 004 ;arg2 for plus
58 115 | 0fa ;return address of plus
59 ;begin stack frame of plus
60 116 | 112 ;Frame pointer for plus; points to mains stack frame
61 117 | 003 ;x, loaded by ldl -3
62 118 | 004 ;y, loaded by ldl -2
63 */
64 \end{SPLCode}
65
66 \subsection{Generation}
67 The code generation phase transforms a well typed \AST{} to either an error,
68 or a \tt{SSMProgram}. The latter is a list of labels and instructions. The
69 generation happens within a
70 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}
71 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
72 reading environment. In this \tt{Addressbook} is a simple map from
73 identifiers to \tt{Address}, which in turn is either a label to jump to or an
74 address where a variable can be found (relative to the current frame). The list
75 of labels is simply an infinite generator for fresh labels, used when
76 generating code for e.g. an if-statement.
77
78 Code generation is done by walking through the \AST{} depth first and
79 generating instructions for each encountered element. E.g. generation for an
80 \tt{Op2Expr} is simply recursively generating for each element of the
81 expression: \CI{g e1 >>| g e2 >>| g op}. For example the snippet in
82 Listing~\ref{lst:codegen} shows the code generation instance for the
83 implementation of the \SI{:} operator. First we generate the code for the left
84 part of the cons operator, the head, and afterwards the right side of the
85 operator, the tail. They are glued together by pushing the head and the tail
86 subsequently on the heap and what we are left with is the pointer to the tail.
87
88 \begin{lstlisting}[
89 language=Clean,
90 label={lst:codegen},
91 caption={Code generation of the cons operator}]
92 instance g Expr where
93 ...
94 g (Op2Expr _ e1 BiCons e2) = g e2 >>| g e1
95 >>| tell [Instr "sth" [] ""]
96 >>| tell [Instr "ajs" [Lit -1] ""]
97 >>| tell [Instr "sth" [] ""]
98 ...
99 \end{lstlisting}
100
101 \subsection{Higher order functions}\label{ssec:high}
102 The nature of the type checking algorithm already included type checking and
103 inferring the type of higher order functions. Since we allow constants there is
104 a difference between \SI{read} and \SI{read()}. To make this difference we
105 introduced a separate type of expression called a \SI{FuncType}, this type
106 encodes a function with arity $0$.
107
108 As all other data types, a function type object occupies one item on the stack
109 which contains a heap pointer. The heap pointer points to a position on the
110 heap that stores, in increasing addresses, the function identification number,
111 the number of arguments already given and finally the argument values.
112 For example the following snippet shows the representation of the incomplete
113 function \SI{plus} that already has two arguments.
114 \begin{SPLCode}
115 //SPLCode
116 plus(x, y, z){ return x + y + z; }
117
118 main(){
119 var a = plus(4, 3);
120 }
121
122 /*# | Stack
123 001 | 001 //Pointer to the
124 ... |
125
126 # | Heap
127 001 | 008 //Function identification number
128 002 | 002 //Available arguments
129 003 | 004 //Argument 1
130 004 | 003 //Argument 2
131 */
132 \end{SPLCode}
133
134 When an incomplete function is declared we save the remaining arity in our
135 state to be able to know when a function is complete and can be executed. When
136 a function is still not complete the function is copied in totality and the
137 available arguments position on the heap is increased accordingly and the
138 arguments are added on the heap.
139
140 Function numbers are integers that identify the function. Normally we address
141 functions using labels but it is not possible to store labels on the heap or on
142 the stack. Therefore we include a function in the preamble that is basically a
143 dictionary from integers to labels. All functions have their unique number that
144 identifies them via the function dictionary called \SI{1func}. When we want to
145 call a function that is identified by an address instead of a label we first
146 collect all the arguments from the heap and add the arguments given in the
147 function call. Calculating the number of arguments that have to be popped from
148 the heap is done at runtime using the \emph{Available Arguments} value stored
149 on the heap. Instead of jumping to a label we jump to the \SI{1func} function
150 and make sure the identification is situated in register R5. \SI{1func} then
151 makes sure the correct function is branched to and the caller can resume the
152 normal execution path.