ee377e6f86f3f2d60df1547f7fa1fe79058ea94f
[cc1516.git] / deliverables / report / gen.tex
1 \section{Code generation}\label{sec:gen}
2 \subsection{Binary Representation}
3 \SPLC{} uses a simple stack layout in which all functions have their own stack
4 frame and all the arguments and local variables are present within this frame.
5 Arguments and local variables are either saved on the stack directly, or as a
6 pointer to the heap. Simple values, i.e. Ints, Bools and Chars are stored
7 directly on the stack, while other values are stored on the heap.
8
9 The heap simply grows with each added value, currently no garbage collection is
10 done. Lists are represented simply as linked lists, by first storing the pointer
11 to the next list element and then the current list element. Tuples are simply
12 stored by staring the two values in succession.
13
14 When calling a functions its arguments are placed on the stack, after which
15 a \tt{bsr} instruction calls the function. The functions can then load its
16 arguments, if and when these are needed in an expression, by copying them
17 from below its stack frame.
18
19 Note that the combination of this copying behaviour and passing complex values
20 as heap pointers means that in some cases \SPLC{} programs behave as
21 pass-by-value while in others it behave as pass-by-reference. Simple values
22 are trivially pass-by-value, whilst lists and tuples can behave as both,
23 depending on the usage. E.g. assigning a new list/tuple to an argument will
24 behave as pass-by-value, whilst changing the argument through a field selector
25 will behave as pass by reference.
26
27 Listing~\ref{lst:stackFunCall} shows the layout of the stack executing the
28 two functions. the layout shown is the layout of the stack just before
29 the \tt{add} instruction to add x and y. Note that, in compliance with \SSM{}s
30 calling model, the return address of a function is saved below the stack frame
31 of that function. When plus has returned the two arguments are popped of the
32 stack and the value in \tt{RR} (for functions which do not return \tt{Void})
33 is placed on the stack.
34
35 \begin{SPLCode} \label{lst:stackFunCall}
36 plus(x, y}{ return x + y; }
37
38 main(){ var a = plus(3, 4); }
39
40 /*# | Stack
41 111 | 002 ;return address of main
42 ;begin stack frame of main
43 112 | 110 ;Frame pointer for main; points to bottom of stack
44 113 | 003 ;arg1 for plus
45 114 | 004 ;arg2 for plus
46 115 | 0fa ;return address of plus
47 ;begin stack frame of plus
48 116 | 112 ;Frame pointer for plus; points to mains stack frame
49 117 | 003 ;x, loaded by ldl -3
50 118 | 004 ;y, loaded by ldl -2
51 */
52 \end{SPLCode}
53
54 \subsection{Generation}
55 The code generation phase transforms a well typed \AST{} to either an error,
56 or an \tt{SSMProgram}The latter is a list of labels and instructions. The
57 generation happens within a
58 \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}
59 environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
60 reading environment. In this \tt{Addressbook} is a simple map from
61 identifiers to \tt{Address}, which in turn is either a label to jump to or an
62 address where a variable can be found (relative to the current frame). The list
63 of labels is simply an infinite generator for fresh labels, used when generating
64 code for e.g. an if-statement.
65
66 Code generation is done by walking through the \AST{} depth first and generating
67 instructions for each encountered element. E.g. generation for an \tt{Op2Expr}
68 is simply recursively generating for each element of the expression:
69 \CI{g e1 >>| g e2 >>| g op}.
70
71 \subsection{Higher order functions}
72 The nature of the type checking algorithm already included type checking and
73 inferring the type of higher order functions. Since we allow constants there is
74 a difference between \SI{read} and \SI{read()}. To make this difference we
75 introduced a separate type of expression called a \SI{FuncType}, this type
76 encodes a function with arity $0$.
77
78 As all other data types a function type object occupies one item on the stack
79 which contains a heap pointer. The heap pointer points to a position on the
80 heap that stores, in increasing addresses, the function identification number,
81 the number of arguments already given and finally the argument values.
82 For example the following snippet shows the representation of the incomplete
83 function \SI{plus} that already has two arguments.
84 \begin{SPLCode}
85 //SPLCode
86 plus(x, y, z){ return x + y + z; }
87
88 main(){
89 var a = plus(4, 3);
90 }
91
92 /*# | Stack
93 001 | 001 //Pointer to the
94 ... |
95
96 # | Heap
97 001 | 008 //Function identification number
98 002 | 002 //Available arguments
99 003 | 004 //Argument 1
100 004 | 003 //Argument 2
101 */
102 \end{SPLCode}
103
104 When an incomplete function is declared we save the remaining arity in our
105 state to be able to know when a function is complete and can be executed. When
106 a function is still not complete the function is copied in totality and the
107 available arguments position on the heap is increased accordingly and the
108 arguments are added on the heap.
109
110 Function numbers are integers that identify the function. Normally we address
111 functions using labels but it is not possible to store labels on the heap or on
112 the stack. Therefore we include a function in the preamble that is basically a
113 dictionary from integers to labels. All functions have their unique number that
114 identifies them via the function dictionary called \SI{1func}. When we want to
115 call a function that is identified by an address instead of a label we first
116 collect all the arguments from the heap and add the arguments given in the
117 function call. Calculating the number of arguments that have to be popped from
118 the heap is done at runtime using the \emph{Available Arguments} value stored
119 on the heap. Instead of jumping to a label we jump to the \SI{1func} function
120 and make sure the identification is situated in register R5. \SI{1func} then
121 makes sure the correct function is branched to and the caller can resume the
122 normal execution path.