sdfsdf
[cc1516.git] / deliverables / report / sem.tex
1 \section{Semantic analysis}\label{sec:sem}
2 %The grammar used to parse
3 %SPL
4 %, if it is di erent from the given grammar.
5 %{
6 %The scoping rules you use and check in your compiler.
7 %{
8 %The typing rules used for
9 %SPL
10 %.
11 %{
12 %A brief guide telling what each of the examples tests (see next point).
13 %{
14 %A concise but precise description of how the work was divided amongst the
15 %members of
16 %the team.
17
18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
19
20 The semantic analysis phase asses if a -grammatically correct- program is also
21 semantically correct and decorates the program with extra information it
22 finds during this checking. The input of the semantic analysis is an \AST{} of
23 the
24 parsed program, its output is either a semantically correct and completely typed
25 \AST{} plus the environment in which it was typed (for debug purposes), or it is
26 an error describing the problem with the \AST{}.
27
28 During this analysis \SPLC{} checks four properties of the program:
29 \begin{enumerate}
30 \item That no functions are declared with the same name
31 \item That \tt{main} is of type \tt{(-> Void)}
32 \item That the program includes a main function
33 \item AST transformation for lambdas
34 \item That the program is correctly typed
35 \end{enumerate}
36 The first three steps are simple sanity checks which are coded in slightly over
37 two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
38 inference
39 algorithm, which is a magnitude more complex. This last step also decorates the
40 \AST{} with inferred type information.
41
42 \subsection{AST transformation for lambdas}
43
44 % \subsection{Sanity checks}
45 % The sanity checks are defined as simple recursive functions over the function
46 % declarations in the \AST. We will very shortly describe them here.
47
48 % \begin{description}
49 % \item [No duplicate functions] This function checks that the program does
50 % not contain top level functions sharing the same name. Formally:
51 % $ \forall f \in \textrm{functions} \ldotp \neg
52 % \exists g \in \textrm{functions} \setminus \{f\} \ldotp
53 % f.\textrm{name} = g.\textrm{name}$
54 % \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
55 % require the main function to take no arguments and to not return a
56 % value. Formally: $\forall f \in \textrm{functions} \ldotp
57 % f.\textrm{name} = \textrm{'main'} \Rightarrow
58 % \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
59 % \item [The program should have a main function] $\exists f \in
60 % \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
61 % \end{description}
62
63 \subsection{Type inference}
64 \SPLC{} features a Hindley-Milner type inference algorithm based on the
65 algorithm presented in the slides. It supports full polymorphism and can
66 infer omitted types. It does not support multiple recursion and all
67 functions and identifiers need to be declared before they can be used. In
68 essence the program is typed as nested \tt{let .. in ..} statements.
69
70 \subsubsection{Types}
71 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
72 these are self explanatory, however typing of functions requires some special
73 attention. \SPLC{}s extension is higher order functions, this is described
74 in more detail in Section~\ref{sec:ext}. These require a change to the
75 type system, which we will already describe here.
76
77 Higher order functions require that the type system can distinguish if an
78 expression represents a type $a$ or a functions which takes no input and returns
79 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
80 the \CI{FuncType}.
81
82 Another important difference with standard \SPL{} is that \SPLC{} supports
83 currying of functions, functions are not typed with a list of argument types
84 and a return type, but with just a single argument and return type. This return
85 type then in itself can be a function.
86
87 \begin{lstlisting}[
88 label={lst:types},
89 caption={SPLC types},
90 language=Clean]
91 :: Type
92 = TupleType (Type, Type) -- (t1, t2)
93 | ListType type -- [t]
94 | IdType TVar -- polymorphic type
95 | IntType
96 | BoolType
97 | CharType
98 | VoidType
99 | FuncType Type -- (-> t)
100 | (\->>) infixl 7 Type Type -- (t1 -> t2)
101 \end{lstlisting}
102
103 \subsubsection{Environment}
104 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
105 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
106 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
107 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
108 schemes.
109
110 In \SPLC{} type inference and unification are done in one pass. They are unified
111 in the \tt{Typing} monad, which is
112 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
113 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
114 just an alias of \tt{String}.
115
116 \subsubsection{Substitution and Unification}
117 In \SPLC{} substitutions are defined as map from type variables to concrete
118 types. Composing of substitutions is then simply taking the union of two
119 substitutions and substituting the first substitution over the types in the
120 second substitution. We write $\star$ for a substitution and $\star_0$ for the
121 empty substitution.
122
123 Unification ($\unif$) takes two types and either provides a substitution which
124 could
125 unify them, or an error of the types are not unifiable. The interesting case
126 of unification is when trying to unify an type variable with a more
127 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
128 are trivially promoting unification to the contained types, or checking directly
129 if the types to be unified are equal.
130
131 \begin{equation} \label{eq:unifyId}
132 \infer{\alpha \unif \tau = [tv \mapsto \tau]}
133 {\alpha \notin ftv(\tau)}
134 \end{equation}
135
136 \subsubsection{Inferring}
137 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
138 Substitution,
139 a Type and an updated version of that element. In the case of expressions the
140 yielded type is the actual type of the expression. In the case of statements the
141 yielded type is the type returned by a return statement if one is present, or
142 Void otherwise.
143
144 In the Clean implementation the new Gamma in the result of an inference is
145 carried through the Typing Monad.
146
147 Below we will show the inference rules for expressions and statements. In these
148 lowercase Greek letters are always fresh type variables.
149
150 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
151 a few to express the basics and show some interesting cases.
152
153 \paragraph{Op2 expressions}
154 Equation~\ref{eq:inferOp2} shows the inference rule for op2 expressions.
155 \textit{op2type} is a function which takes an operator and returns the
156 corresponding function type, i.e. $\textit{op2type}(\&\&) = Bool \rightarrow
157 Bool \rightarrow Bool$.
158
159 \begin{equation} \label{eq:inferOp2}
160 \infer[Op2]
161 {\Gamma \vdash e_1 \oplus e_2 \Rightarrow
162 (\Gamma^{\star}, \star, \alpha^\star, (e_1' \oplus e_2')^\star)}
163 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
164 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
165 \star_2, \tau_2, e_2')
166 &(\tau_1 \rightarrow \tau_2 \rightarrow \alpha) \unif
167 \textit{op2type}(\oplus) = \star_3
168 &\star = \star_3 \cdot \star_2 \cdot \star_1
169 }
170 \end{equation}
171
172 \paragraph{Op1 expressions}
173 \begin{equation} \label{eq:inferOp1}
174 \infer[Op1]
175 {\Gamma \vdash \ominus e \Rightarrow
176 (\Gamma^{\star}, \star, \alpha^\star, (\ominus e')^\star)}
177 {\Gamma \vdash e \Rightarrow (\star_1, \tau, e_1')
178 &(\tau \rightarrow \alpha) \unif
179 \textit{op1type}(\ominus) = \star_2
180 &\star = \star_2 \cdot \star_1
181 }
182 \end{equation}
183
184 \paragraph{Tuples}
185 \begin{equation} \label{eq:tuples}
186 \infer[Tuple]
187 {\Gamma \vdash (e_1, e_2) \Rightarrow
188 (\Gamma^{\star}, \star, (\tau_1, \tau_2)^\star,
189 (e_1', e_2')^\star)}
190 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
191 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
192 \star_2, \tau_2, e_2')
193 &\star = \star_3 \cdot \star_2 \cdot \star_1
194 }
195 \end{equation}
196
197 \paragraph{Variables}
198 Inferring variable requires a bit of care. First of all the variable has to
199 be present in $\Gamma$, secondly field selectors need to be respected. To apply
200 the field selectors on a type the function \tt{apfs} is used, which maps a
201 type and a field selector to a new type: $\texttt{apfs} : \tau \times
202 \textit{fieldselector} \rightarrow \tau$.
203
204 Note that the inference rule only checks if a field selector can be applied on
205 a certain type, and does not use the field selector to infer the type. This means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
206 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
207 specialised) by the programmer. This could be improved by
208 changing the \tt{apfs} function to infer for type variables.
209
210 \begin{equation}
211 \infer[Var]
212 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
213 (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
214 {\Gamma(id) = \lfloor \tau \rfloor
215 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
216 }
217 \end{equation}
218
219 \paragraph{Functions}
220 When inferring two functions we distinguish two rules: one for a function which
221 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
222 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
223 list of expressions is done by folding the infer function over the expressions
224 and accumulating the resulting types, substitutions, etc.
225
226 \begin{equation} \label{eq:inferApp0}
227 \infer[App 0]
228 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
229 ((\Gamma^\star, \star, \tau^r,
230 \textit{id}().\textit{fs}^*)}
231 {\Gamma(id) = \lfloor \tau^e \rfloor
232 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
233 }
234 \end{equation}
235
236 \begin{equation} \label{eq:inferAppN}
237 \infer[AppN]
238 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
239 ((\Gamma^\star, \star, \tau^r,
240 \textit{id}({e^*}').\textit{fs}^*)}
241 {\Gamma(id) = \lfloor \tau^e \rfloor
242 &\Gamma \vdash e^* \Rightarrow
243 (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
244 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
245 \unif \tau^e = \star_2
246 &\star = \star_2 \cdot \star_1
247 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
248 }
249 \end{equation}
250
251 \paragraph{Literals}
252 The types of literals -Ints, Bools, Chars and empty lists- are trivially
253 determined and are shown below.
254
255 \begin{equation}
256 \infer[Int]{\Gamma \vdash n \Rightarrow (\Gamma, \star_0, Int, n)}{}
257 \end{equation}
258
259 \begin{equation}
260 \infer[Bool]{\Gamma \vdash b \Rightarrow (\Gamma, \star_0, Bool, b)}{}
261 \end{equation}
262
263 \begin{equation}
264 \infer[Char]{\Gamma \vdash c \Rightarrow (\Gamma, \star_0, Char, c)}{}
265 \end{equation}
266
267 \begin{equation}
268 \infer[\emptyset]
269 {\Gamma \vdash [] \Rightarrow (\Gamma, \star_0, \alpha, [])}{}
270 \end{equation}
271
272 \paragraph{If statements}
273 balbal de if en else mzelfed type, jwz statements hebben he tupen wat ze
274 returnen.
275 \begin{equation}
276 \infer[If]
277 {\Gamma \vdash \underline{\textrm{if }} e
278 \underline{\textrm{ then }} s_1
279 \underline{\textrm{ else }} s_2
280 \Rightarrow
281 (\Gamma^\star, \star, \tau_e^\star, \underline{\textrm{if }} e'
282 \underline{\textrm{ then }} s_1'
283 \underline{\textrm{ else }} s_2')
284 }
285 {\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_1, e')
286 &\tau_1 \unif \textrm{Bool} = \star_2
287 &\Gamma^{\star_2\cdot \star_1} \vdash s_1 \Rightarrow
288 (\Gamma^{\star_3}, \star_3, \tau_t, s_1')
289 &\Gamma^{\star_3} \vdash s_2 \Rightarrow
290 (\Gamma^{\star_4}, \star_4, \tau_e, s_2')
291 &\tau_t \unif \tau_e = \star_5
292 }
293 \end{equation}
294
295 \paragraph{While statements}
296 \begin{equation}
297 \infer[While]
298 {\Gamma \vdash \underline{\textrm{while }} e \textrm{ } s
299 \Rightarrow
300 (\Gamma^\star, \star, \tau_s^\star,
301 \underline{\textrm{while }} e' \textrm{ } s_1')
302 }
303 {\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_1, e')
304 &\tau_1 \unif \textrm{Bool} = \star_2
305 &\Gamma^{\star_2\cdot \star_1} \vdash s \Rightarrow
306 (\Gamma^{\star_3}, \star_3, \tau_t, s')
307 &\star = \star_3 \cdot \star_2 \cdot \star_1
308 }
309 \end{equation}
310
311 \paragraph{Assignment statement}
312 blabla unapfs, en extenden van Gamma
313 \begin{equation}
314 \infer[Ass]
315 {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
316 (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
317 \textit{id.fs}^* \underline{=} e')
318 }
319 {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
320 &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
321 &\texttt{fold unapfs } \tau_g \textit{reverse fs} = \tau^r
322 &\tau_e \unif \tau_g = \star_2
323 &\star = \star_2 \cdot \star_1
324 }
325 \end{equation}
326
327 \paragraph{Function Statements}
328 Blabl, praktisch gelijk aan function call, behalve dat ze altijd Void als
329 type hebben.
330
331 \begin{equation} \label{eq:inferStmtApp0}
332 \infer[Stmt App 0]
333 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
334 ((\Gamma^\star, \star, \textrm{Void},
335 \textit{id}().\textit{fs}^*)}
336 {\Gamma(id) = \lfloor \tau^e \rfloor
337 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
338 }
339 \end{equation}
340
341 \begin{equation} \label{eq:inferAppN}
342 \infer[Stmt AppN]
343 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
344 ((\Gamma^\star, \star, \textrm{Void},
345 \textit{id}({e^*}').\textit{fs}^*)}
346 {\Gamma(id) = \lfloor \tau^e \rfloor
347 &\Gamma \vdash e^* \Rightarrow
348 (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
349 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
350 \unif \tau^e = \star_2
351 &\star = \star_2 \cdot \star_1
352 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
353 }
354 \end{equation}
355
356 \paragraph{Return statements}
357 blabla, verschil tussen wel en niet iets returnen
358
359 \begin{equation}
360 \infer[Return \emptyset]
361 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
362 (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
363 }
364 {}
365 \end{equation}
366
367 \begin{equation}
368 \infer[Return]
369 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
370 (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
371 }
372 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
373 \end{equation}