deliverables/report/sem.tex

   1 \section{Semantic analysis}\label{sec:sem}
   2 %The grammar used to parse
   3 %SPL
   4 %, if it is di erent from the given grammar.
   5 %{
   6 %The scoping rules you use and check in your compiler.
   7 %{
   8 %The typing rules used for
   9 %SPL
  10 %.
  11 %{
  12 %A brief guide telling what each of the examples tests (see next point).
  13 %{
  14 %A concise but precise description of how the work was divided amongst the
  15 %members of
  16 %the team.
  17
  18 \newcommand{\unif}{{\scriptscriptstyle\cup}}
  19 %\subsubsection{Anonymous functions}
  20 %When the programmers wants to use small functions that consist of one line to
  21 %for example use the operators in a functional way the programmer has to make a
  22 %named function for every operator. In some cases it is easy to do this inline
  23 %using anonymous functions. A small syntactic sugar has been added that will
  24 %inline anonymous functions to non-anonymous functions during the parsing phase.
  25 %The following snippet is an example of a filter on even values before and after
  26 %transformation.
  27 %
  28 %\begin{SPLCode}
  29 %//Before transformation
  30 %main(){
  31 %       var a = filter(\x->x % 2 == 0, 1 : 2 : 3 : 4 : 5 : []);
  32 %}
  33 %
  34 %//After transformation
  35 %1lambda_23(x){
  36 %       return x % 2 == 0;
  37 %}
  38 %
  39 %main(){
  40 %       var a = filter(1lambda_23, 1 : 2 : 3 : 4 : 5 : []);
  41 %}
  42 %\end{SPLCode}
  43
  44 Every anonymous functions gets a unique name outside the namespace of legal
  45 names for functions making sure the name is unique. By implementing the
  46 transformation in the parsing phase the function is automatically checked
  47 or inferred on type.
  48
  49 The semantic analysis phase asses if a -grammatically correct- program is also
  50 semantically correct and decorates the program with extra information it
  51 finds during this checking. The input of the semantic analysis is an \AST{} of
  52 the
  53 parsed program, its output is either a semantically correct and completely typed
  54 \AST{} plus the environment in which it was typed (for debug purposes), or it is
  55 an error describing the problem with the \AST{}.
  56
  57 During this analysis \SPLC{} checks four properties of the program:
  58 \begin{enumerate}
  59         \item That no functions are declared with the same name
  60         \item That \tt{main} is of type \tt{(-> Void)}
  61         \item That the program includes a main function
  62         \item AST transformation for lambdas
  63         \item That the program is correctly typed
  64 \end{enumerate}
  65 The first three steps are simple sanity checks which are coded in slightly over
  66 two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
  67 inference
  68 algorithm, which is a magnitude more complex. This last step also decorates the
  69 \AST{} with inferred type information.
  70
  71 \subsection{AST transformation for lambdas}
  72 Before checking the semantic correctness of the program one small \AST{}
  73 transformations needs to be done. Currently this transformation is build into
  74 the semantic analysis phase, however, it would be more in place as a separate
  75 compiler phase, and if more transformations are needed in the future these will
  76 be placed in a separate phase.
  77
  78 The implementation of lambdas in \SPLC{} is implemented through a clever trick.
  79 All \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
  80 transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
  81 has the advantage that, with our support for higher order functions, lambda
  82 functions do not require any special attention after this transformation. One
  83 downside is that in the current implementation lambdas can not access variables
  84 in the scope in which they are declared. This could be solved by adding all
  85 free variables in the lambda expression as parameters to the lifted function.
  86
  87 \begin{lstlisting}[
  88         label={lst:lambdaLift},
  89         caption={SPLC types},
  90         language=Clean]
  91 main() {
  92         var x = \a b -> a + b;
  93 }
  94 --is transformed to:
  95 2lambda_12(a, b) {
  96         return a + b;
  97 }
  98 main() {
  99         var x = 2lambda_12;
 100 }
 101 \end{lstlisting}
 102
 103 % \subsection{Sanity checks}
 104 % The sanity checks are defined as simple recursive functions over the function
 105 % declarations in the \AST. We will very shortly describe them here.
 106
 107 % \begin{description}
 108 %       \item [No duplicate functions] This function checks that the program does
 109 %               not contain top level functions sharing the same name. Formally:
 110 %               $ \forall f \in \textrm{functions} \ldotp \neg
 111 %                       \exists g \in \textrm{functions} \setminus \{f\} \ldotp
 112 %                       f.\textrm{name} = g.\textrm{name}$
 113 %       \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
 114 %               require the main function to take no arguments and to not return a
 115 %               value. Formally: $\forall f \in \textrm{functions} \ldotp
 116 %                       f.\textrm{name} = \textrm{'main'} \Rightarrow
 117 %                       \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
 118 %       \item [The program should have a main function] $\exists f \in
 119 %               \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
 120 % \end{description}
 121
 122 \subsection{Type inference}
 123 \SPLC{} features a Hindley-Milner type inference algorithm based on the
 124 algorithm presented in the slides. It supports full polymorphism and can
 125 infer omitted types. It does not support multiple recursion and all
 126 functions and identifiers need to be declared before they can be used. In
 127 essence the program is typed as nested \tt{let .. in ..} statements.
 128
 129 \subsubsection{Types}
 130 Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
 131 these are self explanatory, however typing of functions requires some special
 132 attention. \SPLC{}s extension is higher order functions, this is described
 133 in more detail in Section~\ref{sec:ext}. These require a change to the
 134 type system, which we will already describe here.
 135
 136 Higher order functions require that the type system can distinguish if an
 137 expression represents a type $a$ or a functions which takes no input and returns
 138 a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
 139 the \CI{FuncType}.
 140
 141 Another important difference with standard \SPL{} is that \SPLC{} supports
 142 currying of functions, functions are not typed with a list of argument types
 143 and a return type, but with just a single argument and return type. This return
 144 type then in itself can be a function.
 145
 146 \begin{lstlisting}[
 147         label={lst:types},
 148         caption={SPLC types},
 149         language=Clean]
 150 :: Type
 151         = TupleType (Type, Type)        -- (t1, t2)
 152         | ListType type                         -- [t]
 153         | IdType TVar                           -- polymorphic type
 154         | IntType
 155         | BoolType
 156         | CharType
 157     | VoidType
 158         | FuncType Type                         -- (-> t)
 159     | (\->>) infixl 7 Type Type         -- (t1 -> t2)
 160 \end{lstlisting}
 161
 162 \subsubsection{Environment}
 163 As all stages of \SPLC{}, the type inference is programmed in Clean. It takes
 164 an \AST{} and produces either a fully typed \AST{}, or an error if the provided
 165 \AST{} contains type errors. The Program is typed in an environment $\Gamma :
 166 id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
 167 schemes.
 168
 169 In \SPLC{} type inference and unification are done in one pass. They are unified
 170 in the \tt{Typing} monad, which is
 171 defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
 172 carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
 173 just an alias of \tt{String}.
 174
 175 \subsubsection{Substitution and Unification}
 176 In \SPLC{} substitutions are defined as map from type variables to concrete
 177 types. Composing of substitutions is then simply taking the union of two
 178 substitutions and substituting the first substitution over the types in the
 179 second substitution. We write $\star$ for a substitution and $\star_0$ for the
 180 empty substitution.
 181
 182 Unification ($\unif$) takes two types and either provides a substitution which
 183 could
 184 unify them, or an error of the types are not unifiable. The interesting case
 185 of unification is when trying to unify an type variable with a more
 186 specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
 187 are trivially promoting unification to the contained types, or checking directly
 188 if the types to be unified are equal.
 189
 190 \begin{equation} \label{eq:unifyId}
 191         \infer{\alpha \unif \tau = [tv \mapsto \tau]}
 192                         {\alpha \notin ftv(\tau)}
 193 \end{equation}
 194
 195 \subsubsection{Inferring}
 196 Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
 197 Substitution,
 198 a Type and an updated version of that element. In the case of expressions the
 199 yielded type is the actual type of the expression. In the case of statements the
 200 yielded type is the type returned by a return statement if one is present, or
 201 Void otherwise.
 202
 203 In the Clean implementation the new Gamma in the result of an inference is
 204 carried through the Typing Monad.
 205
 206 Below we will show the inference rules for expressions and statements. In these
 207 lowercase Greek letters are always fresh type variables.
 208
 209 Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
 210 a few to express the basics and show some interesting cases.
 211
 212 \paragraph{Variables}
 213 Inferring variable requires a bit of care. First of all the variable has to
 214 be present in $\Gamma$, secondly field selectors need to be respected. To apply
 215 the field selectors on a type the function \tt{apfs} is used, which maps a
 216 type and a field selector to a new type: $\texttt{apfs} : \tau \times
 217 \textit{fieldselector} \rightarrow \tau$.
 218
 219 Note that the inference rule only checks if a field selector can be applied on
 220 a certain type, and does not use the field selector to infer the type. This means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
 221 \tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
 222 specialised) by the programmer. This could be improved by
 223 changing the \tt{apfs} function to infer for type variables.
 224
 225 \begin{equation}
 226         \infer[Var]
 227                 {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
 228                         (\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
 229                 {\Gamma(id) = \lfloor \tau \rfloor
 230                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
 231                 }
 232 \end{equation}
 233
 234 \paragraph{Functions}
 235 When inferring two functions we distinguish two rules: one for a function which
 236 is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
 237 a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
 238 list of expressions is done by folding the infer function over the expressions
 239 and accumulating the resulting types, substitutions, etc.
 240
 241 \begin{equation} \label{eq:inferApp0}
 242         \infer[App 0]
 243                 {\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
 244                         ((\Gamma^\star, \star, \tau^r,
 245                                 \textit{id}().\textit{fs}^*)}
 246                 {\Gamma(id) = \lfloor \tau^e \rfloor
 247                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
 248                 }
 249 \end{equation}
 250
 251 \begin{equation} \label{eq:inferAppN}
 252         \infer[AppN]
 253                 {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
 254                         ((\Gamma^\star, \star, \tau^r,
 255                                 \textit{id}({e^*}').\textit{fs}^*)}
 256                 {\Gamma(id) = \lfloor \tau^e \rfloor
 257                 &\Gamma \vdash e^* \Rightarrow
 258                         (\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
 259                 &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
 260                         \unif \tau^e = \star_2
 261                 &\star = \star_2 \cdot \star_1
 262                 &\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
 263                 }
 264 \end{equation}
 265
 266 \paragraph{Return statements}
 267 Statements are typed with the type they return. The base case for this is
 268 trivially a return statement. An empty return statement is of type \tt{Void}
 269 and other return statements are typed with the type or their expression.
 270
 271 When checking combinations of statements, e.g. if-then-else statements, it is
 272 checked that all branches return the same type, then the
 273 type of the combination is set to this type.
 274
 275 \begin{equation}
 276         \infer[Return \emptyset]
 277                 {\Gamma \vdash \underline{\textrm{return}} \Rightarrow
 278                         (\Gamma, \star_0, \textrm{Void}, \underline{\textrm{return}})
 279                 }
 280                 {}
 281 \end{equation}
 282
 283 \begin{equation}
 284         \infer[Return]
 285                 {\Gamma \vdash \underline{\textrm{return }} e \Rightarrow
 286                         (\Gamma^\star, \star, \tau, \underline{\textrm{return }} e')
 287                 }
 288                 {\Gamma \vdash e \Rightarrow (\Gamma^\star, \star, \tau, e')}
 289 \end{equation}
 290
 291 \paragraph{Assignment statement}
 292 When assigning a new value to a variable this variable needs to be present in
 293 Gamma and its type needs to unify with the type of the expression.
 294
 295 One thing that needs to be taken care of when assigning are the field selectors.
 296 \tt{unapfs} is a function which takes a type and a fieldselector and returns the
 297 type of the variable needed to assign to that fieldselector, i.e. \CI{unapfs
 298 a FieldHd = ListType a}
 299 \begin{equation}
 300         \infer[Ass]
 301                 {\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
 302                         (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
 303                                 \textit{id.fs}^* \underline{=} e')
 304                 }
 305                 {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
 306                 &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
 307                 &\texttt{fold unapfs } \tau_g \textit{reverse fs} = \tau^r
 308                 &\tau_e \unif \tau_g = \star_2
 309                 &\star = \star_2 \cdot \star_1
 310                 }
 311 \end{equation}