9897c0a2a0431630d72cdb2a441bfa3d8d6851b7
[cc1516.git] / deliverables / report / sem.tex
1 \section{Semantic analysis}\label{sec:sem}
2 %The grammar used to parse
3 %SPL
4 %, if it is di erent from the given grammar.
5 %{
6 %The scoping rules you use and check in your compiler.
7 %{
8 %The typing rules used for
9 %SPL
10 %.
11 %{
12 %A brief guide telling what each of the examples tests (see next point).
13 %{
14 %A concise but precise description of how the work was divided amongst the
15 %members of
16 %the team.
17
18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
19 %\subsubsection{Anonymous functions}
20 %When the programmers wants to use small functions that consist of one line to
21 %for example use the operators in a functional way the programmer has to make a
22 %named function for every operator. In some cases it is easy to do this inline
23 %using anonymous functions. A small syntactic sugar has been added that will
24 %inline anonymous functions to non-anonymous functions during the parsing phase.
25 %The following snippet is an example of a filter on even values before and after
26 %transformation.
27 %
28 %\begin{SPLCode}
29 %//Before transformation
30 %main(){
31 % var a = filter(\x->x % 2 == 0, 1 : 2 : 3 : 4 : 5 : []);
32 %}
33 %
34 %//After transformation
35 %1lambda_23(x){
36 % return x % 2 == 0;
37 %}
38 %
39 %main(){
40 % var a = filter(1lambda_23, 1 : 2 : 3 : 4 : 5 : []);
41 %}
42 %\end{SPLCode}
43
44 The semantic analysis phase asses if a -grammatically correct- program is also
45 semantically correct and decorates the program with extra information it
46 finds during this checking. The input of the semantic analysis is an \AST{} of
47 the
48 parsed program, its output is either a semantically correct and completely typed
49 \AST{} plus the environment in which it was typed (for debug purposes), or it is
50 an error describing the problem with the \AST{}.
51
52 During this analysis \SPLC{} applies one \AST{} transformation and checks four
53 properties of the program:
54 \begin{enumerate}
55 \item That no functions are declared with the same name
56 \item That \tt{main} is of type \tt{(-> Void)}
57 \item That the program includes a main function
58 \item That the program is correctly typed
59 \end{enumerate}
60 The first three steps are simple sanity checks which are coded in slightly over
61 two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
62 inference
63 algorithm, which is a magnitude more complex. This last step also decorates the
64 \AST{} with inferred type information.
65
66 \subsection{AST transformation for lambdas}
67 Before checking the semantic correctness of the program one small \AST{}
68 transformations needs to be done. Currently this transformation is build into
69 the semantic analysis phase, however, it would be more in place as a separate
70 compiler phase, and if more transformations are needed in the future these will
71 be placed in a separate phase.
72
73 The implementation of lambdas in \SPLC{} is implemented through a clever trick.
74 All \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
75 transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
76 has the advantage that, with our support for higher order functions, lambda
77 functions do not require any special attention after this transformation. One
78 downside is that in the current implementation lambdas can not access variables
79 in the scope in which they are declared. This could be solved by adding all
80 free variables in the lambda expression as parameters to the lifted function.
81
82 \begin{lstlisting}[
83 label={lst:lambdaLift},
84 caption={SPLC types},
85 language=Clean]
86 main() {
87 var x = \a b -> a + b;
88 }
89 --is transformed to:
90 2lambda_12(a, b) {
91 return a + b;
92 }
93 main() {
94 var x = 2lambda_12;
95 }
96 \end{lstlisting}
97
98 % \subsection{Sanity checks}
99 % The sanity checks are defined as simple recursive functions over the function
100 % declarations in the \AST. We will very shortly describe them here.
101
102 % \begin{description}
103 % \item [No duplicate functions] This function checks that the program does
104 % not contain top level functions sharing the same name. Formally:
105 % $ \forall f \in \textrm{functions} \ldotp \neg
106 % \exists g \in \textrm{functions} \setminus \{f\} \ldotp
107 % f.\textrm{name} = g.\textrm{name}$
108 % \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
109 % require the main function to take no arguments and to not return a
110 % value. Formally: $\forall f \in \textrm{functions} \ldotp
111 % f.\textrm{name} = \textrm{'main'} \Rightarrow
112 % \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
113 % \item [The program should have a main function] $\exists f \in
114 % \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
115 % \end{description}
116
117 \subsection{Type inference}
118 \SPLC{} features a Hindley-Milner type inference algorithm based on the
119 algorithm presented in the slides. It supports full polymorphism and can
120 infer omitted types. It does not support multiple recursion and all
121 functions and identifiers need to be declared before they can be used. In
122 essence the program is typed as nested \tt{let .. in ..} statements.
123
124 \subsubsection{Types}
125 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
126 these are self explanatory, however typing of functions requires some special
127 attention. \SPLC{}s extension of higher order functions requires that the type
128 system can distinguish if an
129 expression represents a type $a$ or a functions which takes no input and returns
130 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
131 the \CI{FuncType}.
132
133 Another important difference with standard \SPL{} is that \SPLC{} supports
134 currying of functions, functions are not typed with a list of argument types
135 and a return type, but with just a single argument and return type. This return
136 type then in itself can be a function.
137
138 \begin{lstlisting}[
139 label={lst:types},
140 caption={SPLC types},
141 language=Clean]
142 :: Type
143 = TupleType (Type, Type) -- (t1, t2)
144 | ListType type -- [t]
145 | IdType TVar -- type variable
146 | IntType
147 | BoolType
148 | CharType
149 | VoidType
150 | FuncType Type -- (-> t)
151 | (->>) infixl 7 Type Type -- (t1 -> t2)
152 \end{lstlisting}
153
154 \subsubsection{Environment}
155 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
156 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
157 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
158 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
159 schemes.
160
161 In \SPLC{} type inference and unification are done in one pass. They are unified
162 in the \tt{Typing} monad, which is
163 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
164 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
165 just an alias of \tt{String}.
166
167 \subsubsection{Substitution and Unification}
168 In \SPLC{} substitutions are defined as map from type variables to concrete
169 types. Composing of substitutions is then simply taking the union of two
170 substitutions and substituting the first substitution over the types in the
171 second substitution. We write $\star$ for a substitution and $\star_0$ for the
172 empty substitution.
173
174 Unification ($\unif$) takes two types and either provides a substitution which
175 could
176 unify them, or an error of the types are not unifiable. The interesting case
177 of unification is when trying to unify an type variable with a more
178 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
179 are trivially promoting unification to the contained types, or checking directly
180 if the types to be unified are equal.
181
182 \begin{equation} \label{eq:unifyId}
183 \infer{\alpha \unif \tau = [tv \mapsto \tau]}
184 {\alpha \notin ftv(\tau)}
185 \end{equation}
186
187 \subsubsection{Inferring}
188 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
189 Substitution,
190 a Type and an updated version of that element. In the case of expressions the
191 yielded type is the actual type of the expression. In the case of statements the
192 yielded type is the type returned by a return statement if one is present, or
193 Void otherwise.
194
195 In the Clean implementation the new Gamma in the result of an inference is
196 carried through the Typing Monad.
197
198 Below we will show the inference rules for expressions and statements. In these
199 lowercase Greek letters are always fresh type variables.
200
201 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
202 a few to express the basics and show some interesting cases.
203
204 \paragraph{Variables}
205 Inferring variable requires a bit of care. First of all the variable has to
206 be present in $\Gamma$, secondly field selectors need to be respected. To apply
207 the field selectors on a type the function \tt{apfs} is used, which maps a
208 type and a field selector to a new type: $\texttt{apfs} : \tau \times
209 \textit{fieldselector} \rightarrow \tau$.
210
211 Note that the inference rule only checks if a field selector can be applied on
212 a certain type, and does not use the field selector to infer the type. This
213 means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
214 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
215 specialised) by the programmer. This could be improved by
216 changing the \tt{apfs} function to infer for type variables.
217
218 \begin{equation}
219 \infer[Var]
220 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
221 (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
222 {\Gamma(id) = \lfloor \tau \rfloor
223 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
224 }
225 \end{equation}
226
227 \paragraph{Functions}
228 When inferring two functions we distinguish two rules: one for a function which
229 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
230 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
231 list of expressions is done by folding the infer function over the expressions
232 and accumulating the resulting types, substitutions, etc.
233
234 \begin{equation} \label{eq:inferApp0}
235 \infer[App 0]
236 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
237 ((\Gamma^\star, \star, \tau^r,
238 \textit{id}().\textit{fs}^*)}
239 {\Gamma(id) = \lfloor \tau^e \rfloor
240 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
241 }
242 \end{equation}
243
244 \begin{equation} \label{eq:inferAppN}
245 \infer[AppN]
246 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
247 ((\Gamma^\star, \star, \tau^r,
248 \textit{id}({e^*}').\textit{fs}^*)}
249 {\Gamma(id) = \lfloor \tau^e \rfloor
250 &\Gamma \vdash e^* \Rightarrow
251 (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
252 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
253 \unif \tau^e = \star_2
254 &\star = \star_2 \cdot \star_1
255 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
256 }
257 \end{equation}
258
259 \paragraph{Return statements}
260 Statements are typed with the type they return. The base case for this is
261 trivially a return statement. An empty return statement is of type \tt{Void}
262 and other return statements are typed with the type or their expression.
263
264 When checking combinations of statements, e.g. if-then-else statements, it is
265 checked that all branches return the same type, then the
266 type of the combination is set to this type.
267
268 \begin{equation}
269 \infer[Return \emptyset]
270 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
271 (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
272 }
273 {}
274 \end{equation}
275
276 \begin{equation}
277 \infer[Return]
278 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
279 (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
280 }
281 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
282 \end{equation}
283
284 \paragraph{Assignment statement}
285 When assigning a new value to a variable this variable needs to be present in
286 Gamma and its type needs to unify with the type of the expression.
287
288 One thing that needs to be taken care of when assigning are the field selectors.
289 \tt{unapfs} is a function which takes a type and a fieldselector and returns the
290 type of the variable needed to assign to that fieldselector, i.e. \CI{unapfs
291 a FieldHd = ListType a}
292 \begin{equation}
293 \infer[Ass]
294 {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
295 (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
296 \textit{id.fs}^* \underline{=} e')
297 }
298 {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
299 &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
300 &\texttt{fold unapfs } \tau_g \textit{ reverse fs} = \tau^r
301 &\tau_e \unif \tau_g = \star_2
302 &\star = \star_2 \cdot \star_1
303 }
304 \end{equation}