fix nog meer boxen en referenties
[cc1516.git] / deliverables / report / gen.tex
1 \section{Code generation}\label{sec:gen}
2 \subsection{Binary Representation}
3 \SPLC{} uses a simple stack layout in which all functions have their own stack
4 frame and all the arguments and local variables are present within this frame.
5 Arguments and local variables are either saved on the stack directly, or as a
6 pointer to the heap. This means that every allocated variable uses precisely
7 one stack address.
8
9 Simple values, i.e. \SI{Int}s, \SI{Bool}s and \SI{Char}s are stored directly on
10 the stack in their representative numeric value. Values on the stack in \SSM{}
11 are always numbers. For \SI{Int}s we use standard signed integers. For
12 characters \textsc{ASCII} values are used. A boolean true is
13 stored as a negative one and the boolean false is stored as a zero.
14
15 The heap simply grows with each added value, currently no garbage collection is
16 done. Functions, lists and tuples are currently stored in the heap. Lists are
17 represented simply as linked lists, by first storing the pointer to the next
18 list element and having the current list element in front of that pointer. The
19 empty list is represented as the null pointer. Tuples are stored by
20 storing the two values in succession on the heap and pointing to the first of
21 the two. Detailed description of function storage is explained in
22 Subsection~\ref{ssec:high}.
23
24 When calling a functions its arguments are placed on the stack, after which a
25 \tt{bsr} instruction calls the function. The function can then load its
26 arguments, if and when these are needed in an expression, by copying them from
27 below its stack frame. The \tt{bsr} instruction is an instruction that works
28 together with the \tt{ret} instruction. \tt{bsr} places the previous program
29 counter on the stack so that it can be restored with a \tt{ret}.
30
31 Note that the combination of this copying behaviour and passing complex values
32 as heap pointers means that in some cases \SPLC{} programs behave as
33 pass-by-value while in others it behave as pass-by-reference. Simple values
34 are trivially pass-by-value, whilst lists and tuples can behave as both,
35 depending on the usage. E.g. assigning a new list/tuple to an argument will
36 behave as pass-by-value, whilst changing the argument through a field selector
37 will behave as pass by reference, thus the pointer is passed by value.
38
39 Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the
40 two functions. the layout shown is the layout of the stack just before the
41 \tt{add} instruction to add x and y in plus. Note that, in compliance with
42 \SSM{}s calling model, the return address of a function is saved below the
43 stack frame of that function. When plus has returned, the two arguments are
44 popped of the stack and the value in \tt{RR} (for functions which do not return
45 \tt{Void}) is placed on the stack.
46
47 \begin{lstlisting}[
48 label={lst:stackFunCall},
49 caption={Stack layout during function calls},
50 language=SPLCode]
51 plus(x, y}{ return x + y; }
52
53 main(){ var a = plus(3, 4); }
54
55 /*# | Stack
56 111 | 002 ;return address of main
57 ;begin stack frame of main
58 112 | 110 ;Frame pointer for main; points to bottom of stack
59 113 | 003 ;arg1 for plus
60 114 | 004 ;arg2 for plus
61 115 | 0fa ;return address of plus
62 ;begin stack frame of plus
63 116 | 112 ;Frame pointer for plus; points to mains stack frame
64 117 | 003 ;x, loaded by ldl -3
65 118 | 004 ;y, loaded by ldl -2
66 */
67 \end{lstlisting}
68
69 \subsection{Generation}
70 The code generation phase transforms a well typed \AST{} to either an error,
71 or a \tt{SSMProgram}. The latter is a list of labels and instructions. The
72 generation happens within a\\
73 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}\\
74 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
75 reading environment. In this \tt{Addressbook} is a simple map from
76 identifiers to \tt{Address}, which in turn is either a label to jump to or an
77 address where a variable can be found (relative to the current frame). The list
78 of labels is simply an infinite generator for fresh labels, used when
79 generating code for e.g. an if-statement.
80
81 Code generation is done by walking through the \AST{} depth first and
82 generating instructions for each encountered element. E.g. generation for an
83 \tt{Op2Expr} is simply recursively generating for each element of the
84 expression: \CI{g e1 >>| g e2 >>| g op}. For example the snippet in
85 Listing~\ref{lst:codegen} shows the code generation instance for the
86 implementation of the \SI{:} operator. First we generate the code for the left
87 part of the cons operator, the head, and afterwards the right side of the
88 operator, the tail. They are glued together by pushing the head and the tail
89 subsequently on the heap and what we are left with is the pointer to the tail.
90
91 \begin{lstlisting}[
92 language=Clean,
93 label={lst:codegen},
94 caption={Code generation of the cons operator}]
95 instance g Expr where
96 ...
97 g (Op2Expr _ e1 BiCons e2) = g e2 >>| g e1
98 >>| tell [Instr "sth" [] ""]
99 >>| tell [Instr "ajs" [Lit -1] ""]
100 >>| tell [Instr "sth" [] ""]
101 ...
102 \end{lstlisting}
103
104 \subsection{Higher order functions}\label{ssec:high}
105 As extension this compiler can handle function types. Function types are
106 pointers to functions that may or may not have arguments already curried in. We
107 use the simple approach instead of the full fledged closure approach because of
108 time reasons. Adding the functionality in the lexer and parser was trivial
109 because of the modular nature. There was one small distinction that we had to
110 make which was the difference between a $0$-arity function call and a $0$-arity
111 function pointer. Currently \SI{read} is parsed as a function pointer and
112 \SI{read()} is parsed as a function call. Adding the functionality in the
113 semantic analysis was actually done from the start since the
114 \emph{Hindley-Milner} algorithm already supported it.
115
116 As all other data types, a function type object occupies one item on the stack
117 which contains a heap pointer. The heap pointer points to a position on the
118 heap that stores the function identification number, the number of arguments
119 already given and finally the argument values. For example the snippet in
120 Listing~\ref{lst:highex} shows the representation of the incomplete function
121 \SI{plus} that already has two arguments.
122
123 \begin{lstlisting}[
124 language=SPLCode,
125 label={lst:highex},
126 caption={Heap layout of curried functions}]
127 //SPLCode
128 plus(x, y, z){ return x + y + z; }
129
130 main(){
131 var a = plus(4, 3);
132 }
133
134 /*# | Heap
135 001 | 008 //Function identification number
136 002 | 002 //Available arguments
137 003 | 004 //Argument 1
138 004 | 003 //Argument 2
139 */
140 \end{lstlisting}
141
142 When an incomplete function is declared we save the remaining arity in our
143 state to be able to know when a function is complete and can be executed. When
144 a function is still not complete the function is copied in totality and the
145 available arguments position on the heap is increased accordingly and the
146 arguments are added on the heap. The reason we store the number of already
147 given arguments in the heap is because we need that information at run-time
148 since you can not know this at compile-time.
149
150 Normally we jump to functions using labels. But since we can not put labels on
151 the heap or stack we have introduced a function in the preamble that translate
152 integers to the corresponding label called \SI{1func}. When we want to
153 call a function that is identified by an address instead of a label we first
154 collect all the arguments from the heap and add the arguments given in the
155 function call. Calculating the number of arguments that have to be popped from
156 the heap is done at runtime using the \emph{Available Arguments} value stored
157 on the heap. Instead of jumping to a label we jump with a \texttt{bsr} to the
158 \SI{1func} function and make sure the identification is situated in register
159 R5. \SI{1func} then makes sure the correct function is branched to with a
160 \texttt{bra} and the caller can resume the normal execution path. In this way
161 the function can still use the \texttt{ret} instruction to leave the function.