36877775de10f6801e66b5894d660b79db4bbfae
[cc1516.git] / deliverables / report / sem.tex
1 \section{Semantic analysis}\label{sec:sem}
2 %The grammar used to parse
3 %SPL
4 %, if it is di erent from the given grammar.
5 %{
6 %The scoping rules you use and check in your compiler.
7 %{
8 %The typing rules used for
9 %SPL
10 %.
11 %{
12 %A brief guide telling what each of the examples tests (see next point).
13 %{
14 %A concise but precise description of how the work was divided amongst the
15 %members of
16 %the team.
17
18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
19
20 The semantic analysis phase asses if a -grammatically correct- program is also
21 semantically correct and decorates the program with extra information it
22 finds during this checking. The input of the semantic analysis is an \AST{} of
23 the
24 parsed program, its output is either a semantically correct and completely typed
25 \AST{} plus the environment in which it was typed (for debug purposes), or it is
26 an error describing the problem with the \AST{}.
27
28 During this analysis \SPLC{} checks four properties of the program:
29 \begin{enumerate}
30 \item That no functions are declared with the same name
31 \item That \tt{main} is of type \tt{(-> Void)}
32 \item That the program includes a main function
33 \item That the program is correctly typed
34 \end{enumerate}
35 The first three steps are simple sanity checks which are coded in slightly over
36 two dozen lines of Clean code. The last step is a Hindley Milner type inference
37 algorithm, which is a magnitude more complex. This last step also decorates the
38 \AST{} with inferred type information.
39
40 \subsection{Sanity checks}
41 The sanity checks are defined as simple recursive functions over the function
42 declarations in the \AST. We will very shortly describe them here.
43
44 \begin{description}
45 \item [No duplicate functions] This function checks that the program does
46 not contain top level functions sharing the same name. Formally:
47 $ \forall f \in \textrm{functions} \ldotp \neg
48 \exists g \in \textrm{functions} \setminus \{f\} \ldotp
49 f.\textrm{name} = g.\textrm{name}$
50 \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
51 require the main function to take no arguments and to not return a
52 value. Formally: $\forall f \in \textrm{functions} \ldotp
53 f.\textrm{name} = \textrm{'main'} \Rightarrow
54 \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
55 \item [The program should have a main function] $\exists f \in
56 \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
57 \end{description}
58
59 \subsection{Type inference}
60 \SPLC{} features a Hindley-Milner type inference algorithm based on the
61 algorithm presented in the slides. It supports full polymorphism and can
62 infer omitted types. It does not support multiple recursion and all
63 functions and identifiers need to be declared before they can be used. In
64 essence the program is typed as nested \tt{let .. in ..} statements.
65
66 \subsubsection{Types}
67 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
68 these are self explanatory, however typing of functions requires some special
69 attention. \SPLC{}s extension is Higher order functions, this is described
70 in more detail in Section~\ref{sec:ext}. These require a change to the
71 type system, which we will already describe here.
72
73 Higher order functions require that the type system can distinguish if an
74 expression represents a type $a$ or a functions which takes no input and returns
75 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
76 the \CI{FuncType}.
77
78 Another important difference with standard \SPL{} is that \SPLC{} supports
79 currying of functions, functions are not typed with a list of argument types
80 and a return type, but with just a single argument and return type. This return
81 type then in itself can be a function.
82
83 \begin{lstlisting}[
84 label={lst:types},
85 caption={SPLC types},
86 language=clean]
87 :: Type
88 = TupleType (Type, Type) -- (t1, t2)
89 | ListType type -- [t]
90 | IdType TVar -- polymorphic type
91 | IntType
92 | BoolType
93 | CharType
94 | VoidType
95 | FuncType Type -- (-> t)
96 | (->>) infixl 7 Type Type -- (t1 -> t2)
97 \end{lstlisting}
98
99 \subsubsection{Environment}
100 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
101 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
102 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
103 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
104 schemes.
105
106 In \SPLC{} type inference and unification are done in one pass. They are unified
107 in the \tt{Typing} monad, which is
108 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
109 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
110 just an alias of \tt{String}.
111
112 \subsubsection{Substitution and Unification}
113 In \SPLC{} substitutions are defined as map from type variables to concrete
114 types. Composing of substitutions is then simply taking the union of two
115 substitutions and substituting the first substitution over the types in the
116 second substitution. We write $\star$ for a substitution and $\star_0$ for the
117 empty substitution.
118
119 Unification ($\unif$) takes two types and either provides a substitution which
120 could
121 unify them, or an error of the types are not unifiable. The interesting case
122 of unification is when trying to unify an type variable with a more
123 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
124 are trivially promoting unification to the contained types, or checking directly
125 if the types to be unified are equal.
126
127 \begin{equation} \label{eq:unifyId}
128 \infer{\alpha \unif \tau = [tv \mapsto \tau]}
129 {\alpha \notin ftv(\tau)}
130 \end{equation}
131
132 \subsubsection{Inferring}
133 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
134 Substitution,
135 a Type and an updated version of that element. In the case of expressions the
136 yielded type is the actual type of the expression. In the case of statements the
137 yielded type is the type returned by a return statement if one is present, or
138 Void otherwise.
139
140 In the Clean implementation the new Gamma in the result of an inference is
141 hidden in the Typing Monad.
142
143 Below we will show the inference rules for expressions and statements. In these
144 lowercase Greek letters are always fresh type variables.
145
146 \paragraph{Op2 expressions}
147 Equation~\ref{eq:inferOp2} shows the inference rule for op2 expressions.
148 \textit{op2type} is a function which takes an operator and returns the
149 corresponding function type, i.e. $\textit{op2type}(\&\&) = Bool \rightarrow
150 Bool \rightarrow Bool$.
151
152 \begin{equation} \label{eq:inferOp2}
153 \infer[Op2]
154 {\Gamma \vdash e_1 \oplus e_2 \Rightarrow
155 (\Gamma^{\star}, \star, \alpha^\star, (e_1' \oplus e_2')^\star)}
156 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
157 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
158 \star_2, \tau_2, e_2')
159 &(\tau_1 \rightarrow \tau_2 \rightarrow \alpha) \unif
160 \textit{op2type}(\oplus) = \star_3
161 &\star = \star_3 \cdot \star_2 \cdot \star_1
162 }
163 \end{equation}
164
165 \paragraph{Op1 expressions}
166 \begin{equation} \label{eq:inferOp1}
167 \infer[Op1]
168 {\Gamma \vdash \ominus e \Rightarrow
169 (\Gamma^{\star}, \star, \alpha^\star, (\ominus e')^\star)}
170 {\Gamma \vdash e \Rightarrow (\star_1, \tau, e_1')
171 &(\tau \rightarrow \alpha) \unif
172 \textit{op1type}(\ominus) = \star_2
173 &\star = \star_2 \cdot \star_1
174 }
175 \end{equation}
176
177 \paragraph{Tuples}
178 \begin{equation} \label{eq:tuples}
179 \infer[Tuple]
180 {\Gamma \vdash (e_1, e_2) \Rightarrow
181 (\Gamma^{\star}, \star, (\tau_1, \tau_2)^\star,
182 (e_1', e_2')^\star)}
183 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
184 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
185 \star_2, \tau_2, e_2')
186 &\star = \star_3 \cdot \star_2 \cdot \star_1
187 }
188 \end{equation}
189
190 \paragraph{Variables}
191 Inferring variable requires a bit of care. First of all the variable has to
192 be present in $\Gamma$, secondly field selectors need to be respected. To apply
193 the field selectors on a type the function \tt{apfs} is used, which maps a
194 type and a field selector to a new type: $\texttt{apfs} : \tau \times
195 \textit{fieldselector} \rightarrow \tau$.
196
197 Note that the inference rule only checks if a field selector can be applied on
198 a certain type, and does not use the field selector to infer the type. This means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
199 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
200 specialised) by the programmer. This could be improved by
201 changing the \tt{apfs} function to infer for type variables.
202
203 \begin{equation}
204 \infer[Var]
205 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
206 (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
207 {\Gamma(id) = \lfloor \tau \rfloor
208 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
209 }
210 \end{equation}
211
212 \paragraph{Functions}
213 When inferring two functions we distinguish two rules: one for a function which
214 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
215 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
216 list of expressions is done by folding the infer function over the expressions
217 and accumulating the resulting types, substitutions, etc.
218
219 \begin{equation} \label{eq:inferApp0}
220 \infer[App 0]
221 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
222 ((\Gamma^\star, \star, \tau^r,
223 \textit{id}().\textit{fs}^*)}
224 {\Gamma(id) = \lfloor \tau^e \rfloor
225 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
226 }
227 \end{equation}
228
229 \begin{equation} \label{eq:inferAppN}
230 \infer[AppN]
231 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
232 ((\Gamma^\star, \star, \tau^r,
233 \textit{id}({e^*}').\textit{fs}^*)}
234 {\Gamma(id) = \lfloor \tau^e \rfloor
235 &\Gamma \vdash e^* \Rightarrow
236 (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
237 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
238 \unif \tau^e = \star_2
239 &\star = \star_2 \cdot \star_1
240 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
241 }
242 \end{equation}
243
244 \paragraph{Literals}
245 The types of literals -Ints, Bools, Chars and empty lists- are trivially
246 determined and are shown below.
247
248 \begin{equation}
249 \infer[Int]{\Gamma \vdash n \Rightarrow (\Gamma, \star_0, Int, n)}{}
250 \end{equation}
251
252 \begin{equation}
253 \infer[Bool]{\Gamma \vdash b \Rightarrow (\Gamma, \star_0, Bool, b)}{}
254 \end{equation}
255
256 \begin{equation}
257 \infer[Char]{\Gamma \vdash c \Rightarrow (\Gamma, \star_0, Char, c)}{}
258 \end{equation}
259
260 \begin{equation}
261 \infer[\emptyset]
262 {\Gamma \vdash [] \Rightarrow (\Gamma, \star_0, \alpha, [])}{}
263 \end{equation}