deliverables/report/sem.tex

   1 \section{Semantic analysis}\label{sec:sem}
   2 %The grammar used to parse
   3 %SPL
   4 %, if it is di erent from the given grammar.
   5 %{
   6 %The scoping rules you use and check in your compiler.
   7 %{
   8 %The typing rules used for
   9 %SPL
  10 %.
  11 %{
  12 %A brief guide telling what each of the examples tests (see next point).
  13 %{
  14 %A concise but precise description of how the work was divided amongst the
  15 %members of
  16 %the team.
  17
  18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
  19
  20 The semantic analysis phase asses if a -grammatically correct- program is also
  21 semantically correct and decorates the program with extra information it
  22 finds during this checking. The input of the semantic analysis is an \AST{} of
  23 the
  24 parsed program, its output is either a semantically correct and completely typed
  25 \AST{} plus the environment in which it was typed (for debug purposes), or it is
  26 an error describing the problem with the \AST{}.
  27
  28 During this analysis \SPLC{} checks four properties of the program:
  29 \begin{enumerate}
  30         \item That no functions are declared with the same name
  31         \item That \tt{main} is of type \tt{(-> Void)}
  32         \item That the program includes a main function
  33         \item AST transformation for lambdas
  34         \item That the program is correctly typed
  35 \end{enumerate}
  36 The first three steps are simple sanity checks which are coded in slightly over
  37 two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
  38 inference
  39 algorithm, which is a magnitude more complex. This last step also decorates the
  40 \AST{} with inferred type information.
  41
  42 \subsection{AST transformation for lambdas}
  43
  44 % \subsection{Sanity checks}
  45 % The sanity checks are defined as simple recursive functions over the function
  46 % declarations in the \AST. We will very shortly describe them here.
  47
  48 % \begin{description}
  49 %       \item [No duplicate functions] This function checks that the program does
  50 %               not contain top level functions sharing the same name. Formally:
  51 %               $ \forall f \in \textrm{functions} \ldotp \neg
  52 %                       \exists g \in \textrm{functions} \setminus \{f\} \ldotp
  53 %                       f.\textrm{name} = g.\textrm{name}$
  54 %       \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
  55 %               require the main function to take no arguments and to not return a
  56 %               value. Formally: $\forall f \in \textrm{functions} \ldotp
  57 %                       f.\textrm{name} = \textrm{'main'} \Rightarrow
  58 %                       \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
  59 %       \item [The program should have a main function] $\exists f \in
  60 %               \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
  61 % \end{description}
  62
  63 \subsection{Type inference}
  64 \SPLC{} features a Hindley-Milner type inference algorithm based on the
  65 algorithm presented in the slides. It supports full polymorphism and can
  66 infer omitted types. It does not support multiple recursion and all
  67 functions and identifiers need to be declared before they can be used. In
  68 essence the program is typed as nested \tt{let .. in ..} statements.
  69
  70 \subsubsection{Types}
  71 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
  72 these are self explanatory, however typing of functions requires some special
  73 attention. \SPLC{}s extension is higher order functions, this is described
  74 in more detail in Section~\ref{sec:ext}. These require a change to the
  75 type system, which we will already describe here.
  76
  77 Higher order functions require that the type system can distinguish if an
  78 expression represents a type $a$ or a functions which takes no input and returns
  79 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
  80 the \CI{FuncType}.
  81
  82 Another important difference with standard \SPL{} is that \SPLC{} supports
  83 currying of functions, functions are not typed with a list of argument types
  84 and a return type, but with just a single argument and return type. This return
  85 type then in itself can be a function.
  86
  87 \begin{lstlisting}[
  88         label={lst:types},
  89         caption={SPLC types},
  90         language=Clean]
  91 :: Type
  92         = TupleType (Type, Type)        -- (t1, t2)
  93         | ListType type                         -- [t]
  94         | IdType TVar                           -- polymorphic type
  95         | IntType
  96         | BoolType
  97         | CharType
  98     | VoidType
  99         | FuncType Type                         -- (-> t)
 100     | (\->>) infixl 7 Type Type         -- (t1 -> t2)
 101 \end{lstlisting}
 102
 103 \subsubsection{Environment}
 104 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
 105 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
 106 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
 107 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
 108 schemes.
 109
 110 In \SPLC{} type inference and unification are done in one pass. They are unified
 111 in the \tt{Typing} monad, which is
 112 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
 113 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
 114 just an alias of \tt{String}.
 115
 116 \subsubsection{Substitution and Unification}
 117 In \SPLC{} substitutions are defined as map from type variables to concrete
 118 types. Composing of substitutions is then simply taking the union of two
 119 substitutions and substituting the first substitution over the types in the
 120 second substitution. We write $\star$ for a substitution and $\star_0$ for the
 121 empty substitution.
 122
 123 Unification ($\unif$) takes two types and either provides a substitution which
 124 could
 125 unify them, or an error of the types are not unifiable. The interesting case
 126 of unification is when trying to unify an type variable with a more
 127 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
 128 are trivially promoting unification to the contained types, or checking directly
 129 if the types to be unified are equal.
 130
 131 \begin{equation} \label{eq:unifyId}
 132         \infer{\alpha \unif \tau = [tv \mapsto \tau]}
 133                         {\alpha \notin ftv(\tau)}
 134 \end{equation}
 135
 136 \subsubsection{Inferring}
 137 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
 138 Substitution,
 139 a Type and an updated version of that element. In the case of expressions the
 140 yielded type is the actual type of the expression. In the case of statements the
 141 yielded type is the type returned by a return statement if one is present, or
 142 Void otherwise.
 143
 144 In the Clean implementation the new Gamma in the result of an inference is
 145 carried through the Typing Monad.
 146
 147 Below we will show the inference rules for expressions and statements. In these
 148 lowercase Greek letters are always fresh type variables.
 149
 150 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
 151 a few to express the basics and show some interesting cases.
 152
 153 \paragraph{Op2 expressions}
 154 Equation~\ref{eq:inferOp2} shows the inference rule for op2 expressions.
 155 \textit{op2type} is a function which takes an operator and returns the
 156 corresponding function type, i.e. $\textit{op2type}(\&\&) = Bool \rightarrow
 157 Bool \rightarrow Bool$.
 158
 159 \begin{equation} \label{eq:inferOp2}
 160         \infer[Op2]
 161                 {\Gamma \vdash e_1 \oplus e_2 \Rightarrow
 162                         (\Gamma^{\star}, \star, \alpha^\star, (e_1' \oplus e_2')^\star)}
 163                 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
 164                 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
 165                         \star_2, \tau_2, e_2')
 166                 &(\tau_1 \rightarrow \tau_2 \rightarrow \alpha) \unif
 167                         \textit{op2type}(\oplus) = \star_3
 168                 &\star = \star_3 \cdot \star_2 \cdot \star_1
 169                 }
 170 \end{equation}
 171
 172 \paragraph{Op1 expressions}
 173 \begin{equation} \label{eq:inferOp1}
 174         \infer[Op1]
 175                 {\Gamma \vdash \ominus e \Rightarrow
 176                         (\Gamma^{\star}, \star, \alpha^\star, (\ominus e')^\star)}
 177                 {\Gamma \vdash e \Rightarrow (\star_1, \tau, e_1')
 178                 &(\tau \rightarrow \alpha) \unif
 179                         \textit{op1type}(\ominus) = \star_2
 180                 &\star = \star_2 \cdot \star_1
 181                 }
 182 \end{equation}
 183
 184 \paragraph{Tuples}
 185 \begin{equation} \label{eq:tuples}
 186         \infer[Tuple]
 187                 {\Gamma \vdash (e_1, e_2) \Rightarrow
 188                         (\Gamma^{\star}, \star, (\tau_1, \tau_2)^\star,
 189                                 (e_1', e_2')^\star)}
 190                 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
 191                 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
 192                         \star_2, \tau_2, e_2')
 193                 &\star = \star_3 \cdot \star_2 \cdot \star_1
 194                 }
 195 \end{equation}
 196
 197 \paragraph{Variables}
 198 Inferring variable requires a bit of care. First of all the variable has to
 199 be present in $\Gamma$, secondly field selectors need to be respected. To apply
 200 the field selectors on a type the function \tt{apfs} is used, which maps a
 201 type and a field selector to a new type: $\texttt{apfs} : \tau \times
 202 \textit{fieldselector} \rightarrow \tau$.
 203
 204 Note that the inference rule only checks if a field selector can be applied on
 205 a certain type, and does not use the field selector to infer the type. This means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
 206 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
 207 specialised) by the programmer. This could be improved by
 208 changing the \tt{apfs} function to infer for type variables.
 209
 210 \begin{equation}
 211         \infer[Var]
 212                 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
 213                         (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
 214                 {\Gamma(id) = \lfloor \tau \rfloor
 215                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
 216                 }
 217 \end{equation}
 218
 219 \paragraph{Functions}
 220 When inferring two functions we distinguish two rules: one for a function which
 221 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
 222 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
 223 list of expressions is done by folding the infer function over the expressions
 224 and accumulating the resulting types, substitutions, etc.
 225
 226 \begin{equation} \label{eq:inferApp0}
 227         \infer[App 0]
 228                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 229                         ((\Gamma^\star, \star, \tau^r,
 230                                 \textit{id}().\textit{fs}^*)}
 231                 {\Gamma(id) = \lfloor \tau^e \rfloor
 232                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 233                 }
 234 \end{equation}
 235
 236 \begin{equation} \label{eq:inferAppN}
 237         \infer[AppN]
 238                 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 239                         ((\Gamma^\star, \star, \tau^r,
 240                                 \textit{id}({e^*}').\textit{fs}^*)}
 241                 {\Gamma(id) = \lfloor \tau^e \rfloor
 242                 &\Gamma \vdash e^* \Rightarrow
 243                         (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 244                 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 245                         \unif \tau^e = \star_2
 246                 &\star = \star_2 \cdot \star_1
 247                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 248                 }
 249 \end{equation}
 250
 251 \paragraph{Literals}
 252 The types of literals -Ints, Bools, Chars and empty lists- are trivially
 253 determined and are shown below.
 254
 255 \begin{equation}
 256         \infer[Int]{\Gamma \vdash n \Rightarrow (\Gamma, \star_0, Int, n)}{}
 257 \end{equation}
 258
 259 \begin{equation}
 260         \infer[Bool]{\Gamma \vdash b \Rightarrow (\Gamma, \star_0, Bool, b)}{}
 261 \end{equation}
 262
 263 \begin{equation}
 264         \infer[Char]{\Gamma \vdash c \Rightarrow (\Gamma, \star_0, Char, c)}{}
 265 \end{equation}
 266
 267 \begin{equation}
 268         \infer[\emptyset]
 269                 {\Gamma \vdash [] \Rightarrow (\Gamma, \star_0, \alpha, [])}{}
 270 \end{equation}
 271
 272 \paragraph{If statements}
 273 balbal de if en else mzelfed type, jwz statements hebben he tupen wat ze
 274 returnen.
 275 \begin{equation}
 276         \infer[If]
 277                 {\Gamma \vdash \underline{\textrm{if }} e
 278                         \underline{\textrm{ then }} s_1
 279                         \underline{\textrm{ else }} s_2
 280                         \Rightarrow
 281                         (\Gamma^\star, \star, \tau_e^\star, \underline{\textrm{if }} e'
 282                         \underline{\textrm{ then }} s_1'
 283                         \underline{\textrm{ else }} s_2')
 284                 }
 285                 {\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_1, e')
 286                 &\tau_1 \unif \textrm{Bool} = \star_2
 287                 &\Gamma^{\star_2\cdot \star_1} \vdash s_1 \Rightarrow
 288                         (\Gamma^{\star_3}, \star_3, \tau_t, s_1')
 289                 &\Gamma^{\star_3} \vdash s_2 \Rightarrow
 290                         (\Gamma^{\star_4}, \star_4, \tau_e, s_2')
 291                 &\tau_t \unif \tau_e = \star_5
 292                 }
 293 \end{equation}
 294
 295 \paragraph{While statements}
 296 \begin{equation}
 297         \infer[While]
 298                 {\Gamma \vdash \underline{\textrm{while }} e \textrm{ } s
 299                         \Rightarrow
 300                         (\Gamma^\star, \star, \tau_s^\star,
 301                                 \underline{\textrm{while }} e' \textrm{ } s_1')
 302                 }
 303                 {\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_1, e')
 304                 &\tau_1 \unif \textrm{Bool} = \star_2
 305                 &\Gamma^{\star_2\cdot \star_1} \vdash s \Rightarrow
 306                         (\Gamma^{\star_3}, \star_3, \tau_t, s')
 307                 &\star = \star_3 \cdot \star_2 \cdot \star_1
 308                 }
 309 \end{equation}
 310
 311 \paragraph{Assignment statement}
 312 blabla unapfs, en extenden van Gamma
 313 \begin{equation}
 314         \infer[Ass]
 315                 {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
 316                         (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
 317                                 \textit{id.fs}^* \underline{=} e')
 318                 }
 319                 {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
 320                 &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
 321                 &\texttt{fold unapfs } \tau_g \textit{reverse fs} = \tau^r
 322                 &\tau_e \unif \tau_g = \star_2
 323                 &\star = \star_2 \cdot \star_1
 324                 }
 325 \end{equation}
 326
 327 \paragraph{Function Statements}
 328 Blabl, praktisch gelijk aan function call, behalve dat ze altijd Void als
 329 type hebben.
 330
 331 \begin{equation} \label{eq:inferStmtApp0}
 332         \infer[Stmt App 0]
 333                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 334                         ((\Gamma^\star, \star, \textrm{Void},
 335                                 \textit{id}().\textit{fs}^*)}
 336                 {\Gamma(id) = \lfloor \tau^e \rfloor
 337                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 338                 }
 339 \end{equation}
 340
 341 \begin{equation} \label{eq:inferAppN}
 342         \infer[Stmt AppN]
 343                 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 344                         ((\Gamma^\star, \star, \textrm{Void},
 345                                 \textit{id}({e^*}').\textit{fs}^*)}
 346                 {\Gamma(id) = \lfloor \tau^e \rfloor
 347                 &\Gamma \vdash e^* \Rightarrow
 348                         (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 349                 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 350                         \unif \tau^e = \star_2
 351                 &\star = \star_2 \cdot \star_1
 352                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 353                 }
 354 \end{equation}
 355
 356 \paragraph{Return statements}
 357 blabla, verschil tussen wel en niet iets returnen
 358
 359 \begin{equation}
 360         \infer[Return \emptyset]
 361                 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
 362                         (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
 363                 }
 364                 {}
 365 \end{equation}
 366
 367 \begin{equation}
 368         \infer[Return]
 369                 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
 370                         (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
 371                 }
 372                 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
 373 \end{equation}