deliverables/report/sem.tex

   1 \section{Semantic analysis}\label{sec:sem}
   2 The semantic analysis phase asses if a -grammatically correct- program is also
   3 semantically correct and decorates the program with extra information it finds
   4 during this checking. The input of the semantic analysis is an \AST{} of the
   5 parsed program, its output is either a semantically correct and completely
   6 typed \AST{} plus the environment in which it was typed (for debug purposes),
   7 or it is an error describing the problem with the \AST{}.
   8
   9 During this analysis \SPLC{} applies one \AST{} transformation and checks four
  10 properties of the program:
  11 \begin{enumerate}
  12         \item That no functions are declared with the same name
  13         \item That \tt{main} is of type \tt{(-> Void)}
  14         \item That the program includes a main function
  15         \item That the program is correctly typed
  16 \end{enumerate}
  17 The first three steps are simple sanity checks which are coded in slightly over
  18 two dozen lines of \Clean{} code. The last step is a \emph{Hindley-Milner} type
  19 inference algorithm, which is a magnitude more complex. This last step also
  20 decorates the \AST{} with inferred type information.
  21
  22 \subsection{\AST{} transformation for lambdas}
  23 Before checking the semantic correctness of the program one small \AST{}
  24 transformations needs to be done. Currently this transformation is build into
  25 the semantic analysis phase, however, it would be more in place as a separate
  26 compiler phase, and if more transformations are needed in the future these will
  27 be placed in a separate phase.
  28
  29 The implementation of lambdas in \SPLC{} is implemented through a clever trick.
  30 All \\
  31 \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
  32 transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
  33 has the advantage that, with our support for higher order functions, lambda
  34 functions do not require any special attention after this transformation. One
  35 downside is that in the current implementation lambdas can not access variables
  36 in the scope in which they are declared. This could be solved by adding all
  37 free variables in the lambda expression as parameters to the lifted function.
  38
  39 \begin{lstlisting}[label={lst:lambdaLift},caption={\SPLC{} types},language=SPLCode]
  40 main() {
  41         var x = \a b -> a + b;
  42 }
  43
  44 // is transformed to:
  45 2lambda_12(a, b) {
  46         return a + b;
  47 }
  48 main() {
  49         var x = 2lambda_12;
  50 }
  51 \end{lstlisting}
  52
  53 \subsection{\AST{} transformation for printing}
  54 During type inference and checking it is clear what the type is of all the
  55 functions and thus we can specify the print functions since it is overloaded.
  56 At time of compilation the overloaded print function is changed to the
  57 corresponding specific function. For example \SI{print(True)} is changed to
  58 \SI{1printbool(True)}. Printing is the only overloaded functionality built-in.
  59 While comparison and equality are overloaded too they generate the same
  60 assembly code for different types and thus need not to be specified.
  61
  62 \subsection{Type inference}
  63 \SPLC{} features a \emph{Hindley-Milner} type inference algorithm based on the
  64 algorithm presented in the slides. It supports full polymorphism and can infer
  65 omitted types. It does not support multiple recursion and all functions and
  66 identifiers need to be declared before they can be used. In essence the program
  67 is typed as nested \tt{let .. in ..} statements.
  68
  69 \subsubsection{Types}
  70 Listing~\ref{lst:types} shows the \ADT{} representing types in \SPLC{}. Most of
  71 these are self explanatory, however typing of functions requires some special
  72 attention. \SPLC{}s extension of higher order functions requires that the type
  73 system can distinguish if an
  74 expression represents a type $a$ or a functions which takes no input and returns
  75 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
  76 the \CI{FuncType}.
  77
  78 Another important difference with standard \SPL{} is that \SPLC{} supports
  79 currying of functions, functions are not typed with a list of argument types
  80 and a return type, but with just a single argument and return type. This return
  81 type then in itself can be a function.
  82
  83 \begin{lstlisting}[
  84         label={lst:types},
  85         caption={SPLC types},
  86         language=Clean]
  87 :: Type
  88         = TupleType (Type, Type)        -- (t1, t2)
  89         | ListType type                         -- [t]
  90         | IdType TVar                           -- type variable
  91         | IntType
  92         | BoolType
  93         | CharType
  94     | VoidType
  95         | FuncType Type                         -- (-> t)
  96     | (->>) infixl 7 Type Type  -- (t1 -> t2)
  97 \end{lstlisting}
  98
  99 \subsubsection{Environment}
 100 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
 101 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
 102 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
 103 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
 104 schemes.
 105
 106 In \SPLC{} type inference and unification are done in one pass. They are
 107 unified in the \tt{Typing} monad, which is defined as \CI{Typing a :== StateT
 108 (Gamma, [TVar]) (Either SemError) a}. The carried list of \tt{TVar} is an
 109 infinite list of fresh \tt{TVars}, which are just an alias of \tt{String}.
 110
 111 \subsubsection{Substitution and Unification}
 112 In \SPLC{} substitutions are defined as map from type variables to concrete
 113 types. Composing of substitutions is then simply taking the union of two
 114 substitutions and substituting the first substitution over the types in the
 115 second substitution. We write $\star$ for a substitution and $\star_0$ for the
 116 empty substitution.
 117
 118 Unification ($\unif$) takes two types and either provides a substitution which
 119 could unify them, or an error of the types are not unifiable. The interesting
 120 case of unification is when trying to unify an type variable with a more
 121 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases are
 122 trivially promoting unification to the contained types, or checking directly if
 123 the types to be unified are equal.
 124
 125 \begin{equation} \label{eq:unifyId}
 126         \infer{\alpha \unif \tau = [tv \mapsto \tau]}
 127                         {\alpha \notin ftv(\tau)}
 128 \end{equation}
 129
 130 \subsubsection{Inferring}
 131 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
 132 Substitution, a Type and an updated version of that element. In the case of
 133 expressions the yielded type is the actual type of the expression. In the case
 134 of statements the yielded type is the type returned by a return statement if
 135 one is present, or Void otherwise.
 136
 137 In the Clean implementation the new Gamma in the result of an inference is
 138 carried through the Typing Monad.
 139
 140 Below we will show the inference rules for expressions and statements. In these
 141 lowercase Greek letters are always fresh type variables.
 142
 143 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
 144 a few to express the basics and show some interesting cases.
 145
 146 \paragraph{Variables}
 147 Inferring variable requires a bit of care. First of all the variable has to
 148 be present in $\Gamma$, secondly field selectors need to be respected. To apply
 149 the field selectors on a type the function \tt{apfs} is used, which maps a
 150 type and a field selector to a new type: $\texttt{apfs} : \tau \times
 151 \textit{fieldselector} \rightarrow \tau$.
 152
 153 Note that the inference rule only checks if a field selector can be applied on
 154 a certain type, and does not use the field selector to infer the type. This
 155 means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
 156 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
 157 specialised) by the programmer. This could be improved by changing the
 158 \tt{apfs} function to infer for type variables.
 159
 160 \begin{equation}
 161         \infer[Var]
 162                 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
 163                         (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
 164                 {\Gamma(id) = \lfloor \tau \rfloor
 165                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
 166                 }
 167 \end{equation}
 168
 169 \paragraph{Functions}
 170 When inferring two functions we distinguish two rules: one for a function which
 171 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
 172 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
 173 list of expressions is done by folding the infer function over the expressions
 174 and accumulating the resulting types, substitutions, etc.
 175
 176 \begin{equation} \label{eq:inferApp0}
 177         \infer[App 0]
 178                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 179                         ((\Gamma^\star, \star, \tau^r,
 180                                 \textit{id}().\textit{fs}^*)}
 181                 {\Gamma(id) = \lfloor \tau^e \rfloor
 182                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 183                 }
 184 \end{equation}
 185
 186 \begin{equation}\label{eq:inferAppN}
 187         \resizebox{\linewidth}{!}{%
 188                 \infer[AppN]
 189                         {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 190                                 ((\Gamma^\star, \star, \tau^r,
 191                                         \textit{id}({e^*}').\textit{fs}^*)}
 192                         {\Gamma(id) = \lfloor \tau^e \rfloor
 193                         &\Gamma \vdash e^* \Rightarrow
 194                                 (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 195                         &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 196                                 \unif \tau^e = \star_2
 197                         &\star = \star_2 \cdot \star_1
 198                         &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 199                 }
 200         }
 201 \end{equation}
 202
 203 \paragraph{Return statements}
 204 Statements are typed with the type they return. The base case for this is
 205 trivially a return statement. An empty return statement is of type \tt{Void}
 206 and other return statements are typed with the type or their expression.
 207
 208 When checking combinations of statements, e.g. if-then-else statements, it is
 209 checked that all branches return the same type, then the type of the
 210 combination is set to this type.
 211
 212 \begin{equation}
 213         \infer[Return \emptyset]
 214                 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
 215                         (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
 216                 }
 217                 {}
 218 \end{equation}
 219
 220 \begin{equation}
 221         \infer[Return]
 222                 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
 223                         (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
 224                 }
 225                 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
 226 \end{equation}
 227
 228 \paragraph{Assignment statement}
 229 When assigning a new value to a variable this variable needs to be present in
 230 Gamma and its type needs to unify with the type of the expression.
 231
 232 One thing that needs to be taken care of when assigning are the field
 233 selectors.  \tt{unapfs} is a function which takes a type and a fieldselector
 234 and returns the type of the variable needed to assign to that fieldselector,
 235 i.e. \CI{unapfs a FieldHd = ListType a}
 236 \begin{equation}
 237         \resizebox{\linewidth}{!}{%
 238                 \infer[Ass]
 239                         {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
 240                                 (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
 241                                         \textit{id.fs}^* \underline{=} e')
 242                         }
 243                         {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
 244                         &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
 245                         &\texttt{fold unapfs } \tau_g \textit{ reverse fs} = \tau^r
 246                         &\tau_e \unif \tau_g = \star_2
 247                         &\star = \star_2 \cdot \star_1
 248                 }
 249         }
 250 \end{equation}