deliverables/report/sem.tex

   1 \section{Semantic analysis}\label{sec:sem}
   2 %The grammar used to parse
   3 %SPL
   4 %, if it is di erent from the given grammar.
   5 %{
   6 %The scoping rules you use and check in your compiler.
   7 %{
   8 %The typing rules used for
   9 %SPL
  10 %.
  11 %{
  12 %A brief guide telling what each of the examples tests (see next point).
  13 %{
  14 %A concise but precise description of how the work was divided amongst the
  15 %members of
  16 %the team.
  17
  18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
  19
  20 The semantic analysis phase asses if a -grammatically correct- program is also
  21 semantically correct and decorates the program with extra information it
  22 finds during this checking. The input of the semantic analysis is an \AST{} of
  23 the
  24 parsed program, its output is either a semantically correct and completely typed
  25 \AST{} plus the environment in which it was typed (for debug purposes), or it is
  26 an error describing the problem with the \AST{}.
  27
  28 During this analysis \SPLC{} checks four properties of the program:
  29 \begin{enumerate}
  30         \item That no functions are declared with the same name
  31         \item That \tt{main} is of type \tt{(-> Void)}
  32         \item That the program includes a main function
  33         \item That the program is correctly typed
  34 \end{enumerate}
  35 The first three steps are simple sanity checks which are coded in slightly over
  36 two dozen lines of Clean code. The last step is a Hindley Milner type inference
  37 algorithm, which is a magnitude more complex. This last step also decorates the
  38 \AST{} with inferred type information.
  39
  40 \subsection{Sanity checks}
  41 The sanity checks are defined as simple recursive functions over the function
  42 declarations in the \AST. We will very shortly describe them here.
  43
  44 \begin{description}
  45         \item [No duplicate functions] This function checks that the program does
  46                 not contain top level functions sharing the same name. Formally:
  47                 $ \forall f \in \textrm{functions} \ldotp \neg
  48                         \exists g \in \textrm{functions} \setminus \{f\} \ldotp
  49                         f.\textrm{name} = g.\textrm{name}$
  50         \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
  51                 require the main function to take no arguments and to not return a
  52                 value. Formally: $\forall f \in \textrm{functions} \ldotp
  53                         f.\textrm{name} = \textrm{'main'} \Rightarrow
  54                         \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
  55         \item [The program should have a main function] $\exists f \in
  56                 \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
  57 \end{description}
  58
  59 \subsection{Type inference}
  60 \SPLC{} features a Hindley-Milner type inference algorithm based on the
  61 algorithm presented in the slides. It supports full polymorphism and can
  62 infer omitted types. It does not support multiple recursion and all
  63 functions and identifiers need to be declared before they can be used. In
  64 essence the program is typed as nested \tt{let .. in ..} statements.
  65
  66 \subsubsection{Types}
  67 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
  68 these are self explanatory, however typing of functions requires some special
  69 attention. \SPLC{}s extension is Higher order functions, this is described
  70 in more detail in Section~\ref{sec:ext}. These require a change to the
  71 type system, which we will already describe here.
  72
  73 Higher order functions require that the type system can distinguish if an
  74 expression represents a type $a$ or a functions which takes no input and returns
  75 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
  76 the \CI{FuncType}.
  77
  78 Another important difference with standard \SPL{} is that \SPLC{} supports
  79 currying of functions, functions are not typed with a list of argument types
  80 and a return type, but with just a single argument and return type. This return
  81 type then in itself can be a function.
  82
  83 \begin{lstlisting}[
  84         label={lst:types},
  85         caption={SPLC types},
  86         language=clean]
  87 :: Type
  88         = TupleType (Type, Type)        -- (t1, t2)
  89         | ListType type                         -- [t]
  90         | IdType TVar                           -- polymorphic type
  91         | IntType
  92         | BoolType
  93         | CharType
  94     | VoidType
  95         | FuncType Type                         -- (-> t)
  96     | (->>) infixl 7 Type Type  -- (t1 -> t2)
  97 \end{lstlisting}
  98
  99 \subsubsection{Environment}
 100 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
 101 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
 102 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
 103 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
 104 schemes.
 105
 106 In \SPLC{} type inference and unification are done in one pass. They are unified
 107 in the \tt{Typing} monad, which is
 108 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
 109 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
 110 just an alias of \tt{String}.
 111
 112 \subsubsection{Substitution and Unification}
 113 In \SPLC{} substitutions are defined as map from type variables to concrete
 114 types. Composing of substitutions is then simply taking the union of two
 115 substitutions and substituting the first substitution over the types in the
 116 second substitution. We write $\star$ for a substitution and $\star_0$ for the
 117 empty substitution.
 118
 119 Unification ($\unif$) takes two types and either provides a substitution which
 120 could
 121 unify them, or an error of the types are not unifiable. The interesting case
 122 of unification is when trying to unify an type variable with a more
 123 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
 124 are trivially promoting unification to the contained types, or checking directly
 125 if the types to be unified are equal.
 126
 127 \begin{equation} \label{eq:unifyId}
 128         \infer{\alpha \unif \tau = [tv \mapsto \tau]}
 129                         {\alpha \notin ftv(\tau)}
 130 \end{equation}
 131
 132 \subsubsection{Inferring}
 133 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
 134 Substitution,
 135 a Type and an updated version of that element. In the case of expressions the
 136 yielded type is the actual type of the expression. In the case of statements the
 137 yielded type is the type returned by a return statement if one is present, or
 138 Void otherwise.
 139
 140 In the Clean implementation the new Gamma in the result of an inference is
 141 hidden in the Typing Monad.
 142
 143 Below we will show the inference rules for expressions and statements. In these
 144 lowercase Greek letters are always fresh type variables.
 145
 146 \paragraph{Op2 expressions}
 147 Equation~\ref{eq:inferOp2} shows the inference rule for op2 expressions.
 148 \textit{op2type} is a function which takes an operator and returns the
 149 corresponding function type, i.e. $\textit{op2type}(\&\&) = Bool \rightarrow
 150 Bool \rightarrow Bool$.
 151
 152 \begin{equation} \label{eq:inferOp2}
 153         \infer[Op2]
 154                 {\Gamma \vdash e_1 \oplus e_2 \Rightarrow
 155                         (\Gamma^{\star}, \star, \alpha^\star, (e_1' \oplus e_2')^\star)}
 156                 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
 157                 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
 158                         \star_2, \tau_2, e_2')
 159                 &(\tau_1 \rightarrow \tau_2 \rightarrow \alpha) \unif
 160                         \textit{op2type}(\oplus) = \star_3
 161                 &\star = \star_3 \cdot \star_2 \cdot \star_1
 162                 }
 163 \end{equation}
 164
 165 \paragraph{Op1 expressions}
 166 \begin{equation} \label{eq:inferOp1}
 167         \infer[Op1]
 168                 {\Gamma \vdash \ominus e \Rightarrow
 169                         (\Gamma^{\star}, \star, \alpha^\star, (\ominus e')^\star)}
 170                 {\Gamma \vdash e \Rightarrow (\star_1, \tau, e_1')
 171                 &(\tau \rightarrow \alpha) \unif
 172                         \textit{op1type}(\ominus) = \star_2
 173                 &\star = \star_2 \cdot \star_1
 174                 }
 175 \end{equation}
 176
 177 \paragraph{Tuples}
 178 \begin{equation} \label{eq:tuples}
 179         \infer[Tuple]
 180                 {\Gamma \vdash (e_1, e_2) \Rightarrow
 181                         (\Gamma^{\star}, \star, (\tau_1, \tau_2)^\star,
 182                                 (e_1', e_2')^\star)}
 183                 {\Gamma \vdash e_1 \Rightarrow (\Gamma^{\star_1},\star_1, \tau_1, e_1')
 184                 &\Gamma^{\star_1} \vdash e_2 \Rightarrow (\Gamma^{\star_2},
 185                         \star_2, \tau_2, e_2')
 186                 &\star = \star_3 \cdot \star_2 \cdot \star_1
 187                 }
 188 \end{equation}
 189
 190 \paragraph{Variables}
 191 Inferring variable requires a bit of care. First of all the variable has to
 192 be present in $\Gamma$, secondly field selectors need to be respected. To apply
 193 the field selectors on a type the function \tt{apfs} is used, which maps a
 194 type and a field selector to a new type: $\texttt{apfs} : \tau \times
 195 \textit{fieldselector} \rightarrow \tau$.
 196
 197 Note that the inference rule only checks if a field selector can be applied on
 198 a certain type, and does not use the field selector to infer the type. This means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
 199 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
 200 specialised) by the programmer. This could be improved by
 201 changing the \tt{apfs} function to infer for type variables.
 202
 203 \begin{equation}
 204         \infer[Var]
 205                 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
 206                         (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
 207                 {\Gamma(id) = \lfloor \tau \rfloor
 208                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
 209                 }
 210 \end{equation}
 211
 212 \paragraph{Functions}
 213 When inferring two functions we distinguish two rules: one for a function which
 214 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
 215 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
 216 list of expressions is done by folding the infer function over the expressions
 217 and accumulating the resulting types, substitutions, etc.
 218
 219 \begin{equation} \label{eq:inferApp0}
 220         \infer[App 0]
 221                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 222                         ((\Gamma^\star, \star, \tau^r,
 223                                 \textit{id}().\textit{fs}^*)}
 224                 {\Gamma(id) = \lfloor \tau^e \rfloor
 225                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 226                 }
 227 \end{equation}
 228
 229 \begin{equation} \label{eq:inferAppN}
 230         \infer[AppN]
 231                 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 232                         ((\Gamma^\star, \star, \tau^r,
 233                                 \textit{id}({e^*}').\textit{fs}^*)}
 234                 {\Gamma(id) = \lfloor \tau^e \rfloor
 235                 &\Gamma \vdash e^* \Rightarrow
 236                         (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 237                 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 238                         \unif \tau^e = \star_2
 239                 &\star = \star_2 \cdot \star_1
 240                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 241                 }
 242 \end{equation}
 243
 244 \paragraph{Literals}
 245 The types of literals -Ints, Bools, Chars and empty lists- are trivially
 246 determined and are shown below.
 247
 248 \begin{equation}
 249         \infer[Int]{\Gamma \vdash n \Rightarrow (\Gamma, \star_0, Int, n)}{}
 250 \end{equation}
 251
 252 \begin{equation}
 253         \infer[Bool]{\Gamma \vdash b \Rightarrow (\Gamma, \star_0, Bool, b)}{}
 254 \end{equation}
 255
 256 \begin{equation}
 257         \infer[Char]{\Gamma \vdash c \Rightarrow (\Gamma, \star_0, Char, c)}{}
 258 \end{equation}
 259
 260 \begin{equation}
 261         \infer[\emptyset]
 262                 {\Gamma \vdash [] \Rightarrow (\Gamma, \star_0, \alpha, [])}{}
 263 \end{equation}