deliverables/report/sem.tex

   1 \section{Semantic analysis}\label{sec:sem}
   2 %The grammar used to parse
   3 %SPL
   4 %, if it is di erent from the given grammar.
   5 %{
   6 %The scoping rules you use and check in your compiler.
   7 %{
   8 %The typing rules used for
   9 %SPL
  10 %.
  11 %{
  12 %A brief guide telling what each of the examples tests (see next point).
  13 %{
  14 %A concise but precise description of how the work was divided amongst the
  15 %members of
  16 %the team.
  17
  18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
  19 %\subsubsection{Anonymous functions}
  20 %When the programmers wants to use small functions that consist of one line to
  21 %for example use the operators in a functional way the programmer has to make a
  22 %named function for every operator. In some cases it is easy to do this inline
  23 %using anonymous functions. A small syntactic sugar has been added that will
  24 %inline anonymous functions to non-anonymous functions during the parsing phase.
  25 %The following snippet is an example of a filter on even values before and after
  26 %transformation.
  27 %
  28 %\begin{SPLCode}
  29 %//Before transformation
  30 %main(){
  31 %       var a = filter(\x->x % 2 == 0, 1 : 2 : 3 : 4 : 5 : []);
  32 %}
  33 %
  34 %//After transformation
  35 %1lambda_23(x){
  36 %       return x % 2 == 0;
  37 %}
  38 %
  39 %main(){
  40 %       var a = filter(1lambda_23, 1 : 2 : 3 : 4 : 5 : []);
  41 %}
  42 %\end{SPLCode}
  43
  44 The semantic analysis phase asses if a -grammatically correct- program is also
  45 semantically correct and decorates the program with extra information it
  46 finds during this checking. The input of the semantic analysis is an \AST{} of
  47 the
  48 parsed program, its output is either a semantically correct and completely typed
  49 \AST{} plus the environment in which it was typed (for debug purposes), or it is
  50 an error describing the problem with the \AST{}.
  51
  52 During this analysis \SPLC{} applies one \AST{} transformation and checks four
  53 properties of the program:
  54 \begin{enumerate}
  55         \item That no functions are declared with the same name
  56         \item That \tt{main} is of type \tt{(-> Void)}
  57         \item That the program includes a main function
  58         \item That the program is correctly typed
  59 \end{enumerate}
  60 The first three steps are simple sanity checks which are coded in slightly over
  61 two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
  62 inference
  63 algorithm, which is a magnitude more complex. This last step also decorates the
  64 \AST{} with inferred type information.
  65
  66 \subsection{AST transformation for lambdas}
  67 Before checking the semantic correctness of the program one small \AST{}
  68 transformations needs to be done. Currently this transformation is build into
  69 the semantic analysis phase, however, it would be more in place as a separate
  70 compiler phase, and if more transformations are needed in the future these will
  71 be placed in a separate phase.
  72
  73 The implementation of lambdas in \SPLC{} is implemented through a clever trick.
  74 All \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
  75 transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
  76 has the advantage that, with our support for higher order functions, lambda
  77 functions do not require any special attention after this transformation. One
  78 downside is that in the current implementation lambdas can not access variables
  79 in the scope in which they are declared. This could be solved by adding all
  80 free variables in the lambda expression as parameters to the lifted function.
  81
  82 \begin{lstlisting}[
  83         label={lst:lambdaLift},
  84         caption={SPLC types},
  85         language=Clean]
  86 main() {
  87         var x = \a b -> a + b;
  88 }
  89 --is transformed to:
  90 2lambda_12(a, b) {
  91         return a + b;
  92 }
  93 main() {
  94         var x = 2lambda_12;
  95 }
  96 \end{lstlisting}
  97
  98 % \subsection{Sanity checks}
  99 % The sanity checks are defined as simple recursive functions over the function
 100 % declarations in the \AST. We will very shortly describe them here.
 101
 102 % \begin{description}
 103 %       \item [No duplicate functions] This function checks that the program does
 104 %               not contain top level functions sharing the same name. Formally:
 105 %               $ \forall f \in \textrm{functions} \ldotp \neg
 106 %                       \exists g \in \textrm{functions} \setminus \{f\} \ldotp
 107 %                       f.\textrm{name} = g.\textrm{name}$
 108 %       \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
 109 %               require the main function to take no arguments and to not return a
 110 %               value. Formally: $\forall f \in \textrm{functions} \ldotp
 111 %                       f.\textrm{name} = \textrm{'main'} \Rightarrow
 112 %                       \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
 113 %       \item [The program should have a main function] $\exists f \in
 114 %               \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
 115 % \end{description}
 116
 117 \subsection{Type inference}
 118 \SPLC{} features a Hindley-Milner type inference algorithm based on the
 119 algorithm presented in the slides. It supports full polymorphism and can
 120 infer omitted types. It does not support multiple recursion and all
 121 functions and identifiers need to be declared before they can be used. In
 122 essence the program is typed as nested \tt{let .. in ..} statements.
 123
 124 \subsubsection{Types}
 125 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
 126 these are self explanatory, however typing of functions requires some special
 127 attention. \SPLC{}s extension of higher order functions requires that the type
 128 system can distinguish if an
 129 expression represents a type $a$ or a functions which takes no input and returns
 130 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
 131 the \CI{FuncType}.
 132
 133 Another important difference with standard \SPL{} is that \SPLC{} supports
 134 currying of functions, functions are not typed with a list of argument types
 135 and a return type, but with just a single argument and return type. This return
 136 type then in itself can be a function.
 137
 138 \begin{lstlisting}[
 139         label={lst:types},
 140         caption={SPLC types},
 141         language=Clean]
 142 :: Type
 143         = TupleType (Type, Type)        -- (t1, t2)
 144         | ListType type                         -- [t]
 145         | IdType TVar                           -- type variable
 146         | IntType
 147         | BoolType
 148         | CharType
 149     | VoidType
 150         | FuncType Type                         -- (-> t)
 151     | (->>) infixl 7 Type Type  -- (t1 -> t2)
 152 \end{lstlisting}
 153
 154 \subsubsection{Environment}
 155 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
 156 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
 157 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
 158 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
 159 schemes.
 160
 161 In \SPLC{} type inference and unification are done in one pass. They are unified
 162 in the \tt{Typing} monad, which is
 163 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
 164 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
 165 just an alias of \tt{String}.
 166
 167 \subsubsection{Substitution and Unification}
 168 In \SPLC{} substitutions are defined as map from type variables to concrete
 169 types. Composing of substitutions is then simply taking the union of two
 170 substitutions and substituting the first substitution over the types in the
 171 second substitution. We write $\star$ for a substitution and $\star_0$ for the
 172 empty substitution.
 173
 174 Unification ($\unif$) takes two types and either provides a substitution which
 175 could
 176 unify them, or an error of the types are not unifiable. The interesting case
 177 of unification is when trying to unify an type variable with a more
 178 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
 179 are trivially promoting unification to the contained types, or checking directly
 180 if the types to be unified are equal.
 181
 182 \begin{equation} \label{eq:unifyId}
 183         \infer{\alpha \unif \tau = [tv \mapsto \tau]}
 184                         {\alpha \notin ftv(\tau)}
 185 \end{equation}
 186
 187 \subsubsection{Inferring}
 188 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
 189 Substitution,
 190 a Type and an updated version of that element. In the case of expressions the
 191 yielded type is the actual type of the expression. In the case of statements the
 192 yielded type is the type returned by a return statement if one is present, or
 193 Void otherwise.
 194
 195 In the Clean implementation the new Gamma in the result of an inference is
 196 carried through the Typing Monad.
 197
 198 Below we will show the inference rules for expressions and statements. In these
 199 lowercase Greek letters are always fresh type variables.
 200
 201 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
 202 a few to express the basics and show some interesting cases.
 203
 204 \paragraph{Variables}
 205 Inferring variable requires a bit of care. First of all the variable has to
 206 be present in $\Gamma$, secondly field selectors need to be respected. To apply
 207 the field selectors on a type the function \tt{apfs} is used, which maps a
 208 type and a field selector to a new type: $\texttt{apfs} : \tau \times
 209 \textit{fieldselector} \rightarrow \tau$.
 210
 211 Note that the inference rule only checks if a field selector can be applied on
 212 a certain type, and does not use the field selector to infer the type. This
 213 means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
 214 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
 215 specialised) by the programmer. This could be improved by
 216 changing the \tt{apfs} function to infer for type variables.
 217
 218 \begin{equation}
 219         \infer[Var]
 220                 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
 221                         (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
 222                 {\Gamma(id) = \lfloor \tau \rfloor
 223                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
 224                 }
 225 \end{equation}
 226
 227 \paragraph{Functions}
 228 When inferring two functions we distinguish two rules: one for a function which
 229 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
 230 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
 231 list of expressions is done by folding the infer function over the expressions
 232 and accumulating the resulting types, substitutions, etc.
 233
 234 \begin{equation} \label{eq:inferApp0}
 235         \infer[App 0]
 236                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 237                         ((\Gamma^\star, \star, \tau^r,
 238                                 \textit{id}().\textit{fs}^*)}
 239                 {\Gamma(id) = \lfloor \tau^e \rfloor
 240                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 241                 }
 242 \end{equation}
 243
 244 \begin{equation} \label{eq:inferAppN}
 245         \infer[AppN]
 246                 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 247                         ((\Gamma^\star, \star, \tau^r,
 248                                 \textit{id}({e^*}').\textit{fs}^*)}
 249                 {\Gamma(id) = \lfloor \tau^e \rfloor
 250                 &\Gamma \vdash e^* \Rightarrow
 251                         (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 252                 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 253                         \unif \tau^e = \star_2
 254                 &\star = \star_2 \cdot \star_1
 255                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 256                 }
 257 \end{equation}
 258
 259 \paragraph{Return statements}
 260 Statements are typed with the type they return. The base case for this is
 261 trivially a return statement. An empty return statement is of type \tt{Void}
 262 and other return statements are typed with the type or their expression.
 263
 264 When checking combinations of statements, e.g. if-then-else statements, it is
 265 checked that all branches return the same type, then the
 266 type of the combination is set to this type.
 267
 268 \begin{equation}
 269         \infer[Return \emptyset]
 270                 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
 271                         (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
 272                 }
 273                 {}
 274 \end{equation}
 275
 276 \begin{equation}
 277         \infer[Return]
 278                 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
 279                         (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
 280                 }
 281                 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
 282 \end{equation}
 283
 284 \paragraph{Assignment statement}
 285 When assigning a new value to a variable this variable needs to be present in
 286 Gamma and its type needs to unify with the type of the expression.
 287
 288 One thing that needs to be taken care of when assigning are the field selectors.
 289 \tt{unapfs} is a function which takes a type and a fieldselector and returns the
 290 type of the variable needed to assign to that fieldselector, i.e. \CI{unapfs
 291 a FieldHd = ListType a}
 292 \begin{equation}
 293         \infer[Ass]
 294                 {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
 295                         (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
 296                                 \textit{id.fs}^* \underline{=} e')
 297                 }
 298                 {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
 299                 &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
 300                 &\texttt{fold unapfs } \tau_g \textit{ reverse fs} = \tau^r
 301                 &\tau_e \unif \tau_g = \star_2
 302                 &\star = \star_2 \cdot \star_1
 303                 }
 304 \end{equation}