\section{Semantic analysis}\label{sec:sem}
-%The grammar used to parse
-%SPL
-%, if it is di erent from the given grammar.
-%{
-%The scoping rules you use and check in your compiler.
-%{
-%The typing rules used for
-%SPL
-%.
-%{
-%A brief guide telling what each of the examples tests (see next point).
-%{
-%A concise but precise description of how the work was divided amongst the
-%members of
-%the team.
-
-\newcommand{\unif}{{\scriptscriptstyle\cup}}
-%\subsubsection{Anonymous functions}
-%When the programmers wants to use small functions that consist of one line to
-%for example use the operators in a functional way the programmer has to make a
-%named function for every operator. In some cases it is easy to do this inline
-%using anonymous functions. A small syntactic sugar has been added that will
-%inline anonymous functions to non-anonymous functions during the parsing phase.
-%The following snippet is an example of a filter on even values before and after
-%transformation.
-%
-%\begin{SPLCode}
-%//Before transformation
-%main(){
-% var a = filter(\x->x % 2 == 0, 1 : 2 : 3 : 4 : 5 : []);
-%}
-%
-%//After transformation
-%1lambda_23(x){
-% return x % 2 == 0;
-%}
-%
-%main(){
-% var a = filter(1lambda_23, 1 : 2 : 3 : 4 : 5 : []);
-%}
-%\end{SPLCode}
-
The semantic analysis phase asses if a -grammatically correct- program is also
-semantically correct and decorates the program with extra information it
-finds during this checking. The input of the semantic analysis is an \AST{} of
-the
-parsed program, its output is either a semantically correct and completely typed
-\AST{} plus the environment in which it was typed (for debug purposes), or it is
-an error describing the problem with the \AST{}.
+semantically correct and decorates the program with extra information it finds
+during this checking. The input of the semantic analysis is an \AST{} of the
+parsed program, its output is either a semantically correct and completely
+typed \AST{} plus the environment in which it was typed (for debug purposes),
+or it is an error describing the problem with the \AST{}.
-During this analysis \SPLC{} applies one \AST{} transformation and checks four
+During this analysis \SPLC{} applies one \AST{} transformation and checks four
properties of the program:
\begin{enumerate}
\item That no functions are declared with the same name
\item That the program includes a main function
\item That the program is correctly typed
\end{enumerate}
-The first three steps are simple sanity checks which are coded in slightly over
-two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
-inference
-algorithm, which is a magnitude more complex. This last step also decorates the
-\AST{} with inferred type information.
+The first three steps are simple sanity checks which are coded in slightly over
+two dozen lines of \Clean{} code. The last step is a \emph{Hindley-Milner} type
+inference algorithm, which is a magnitude more complex. This last step also
+decorates the \AST{} with inferred type information.
\subsection{AST transformation for lambdas}
-Before checking the semantic correctness of the program one small \AST{}
-transformations needs to be done. Currently this transformation is build into
-the semantic analysis phase, however, it would be more in place as a separate
+Before checking the semantic correctness of the program one small \AST{}
+transformations needs to be done. Currently this transformation is build into
+the semantic analysis phase, however, it would be more in place as a separate
compiler phase, and if more transformations are needed in the future these will
be placed in a separate phase.
The implementation of lambdas in \SPLC{} is implemented through a clever trick.
All \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
-transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
-has the advantage that, with our support for higher order functions, lambda
-functions do not require any special attention after this transformation. One
+transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
+has the advantage that, with our support for higher order functions, lambda
+functions do not require any special attention after this transformation. One
downside is that in the current implementation lambdas can not access variables
-in the scope in which they are declared. This could be solved by adding all
+in the scope in which they are declared. This could be solved by adding all
free variables in the lambda expression as parameters to the lifted function.
\begin{lstlisting}[
label={lst:lambdaLift},
caption={SPLC types},
- language=Clean]
+ language=SPLCode]
main() {
var x = \a b -> a + b;
}
---is transformed to:
+
+// is transformed to:
2lambda_12(a, b) {
return a + b;
}
}
\end{lstlisting}
-% \subsection{Sanity checks}
-% The sanity checks are defined as simple recursive functions over the function
-% declarations in the \AST. We will very shortly describe them here.
-
-% \begin{description}
-% \item [No duplicate functions] This function checks that the program does
-% not contain top level functions sharing the same name. Formally:
-% $ \forall f \in \textrm{functions} \ldotp \neg
-% \exists g \in \textrm{functions} \setminus \{f\} \ldotp
-% f.\textrm{name} = g.\textrm{name}$
-% \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
-% require the main function to take no arguments and to not return a
-% value. Formally: $\forall f \in \textrm{functions} \ldotp
-% f.\textrm{name} = \textrm{'main'} \Rightarrow
-% \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
-% \item [The program should have a main function] $\exists f \in
-% \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
-% \end{description}
-
\subsection{Type inference}
-\SPLC{} features a Hindley-Milner type inference algorithm based on the
-algorithm presented in the slides. It supports full polymorphism and can
-infer omitted types. It does not support multiple recursion and all
-functions and identifiers need to be declared before they can be used. In
-essence the program is typed as nested \tt{let .. in ..} statements.
+\SPLC{} features a \emph{Hindley-Milner} type inference algorithm based on the
+algorithm presented in the slides. It supports full polymorphism and can infer
+omitted types. It does not support multiple recursion and all functions and
+identifiers need to be declared before they can be used. In essence the program
+is typed as nested \tt{let .. in ..} statements.
\subsubsection{Types}
-Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
+Listing~\ref{lst:types} shows the \ADT{} representing types in \SPLC{}. Most of
these are self explanatory, however typing of functions requires some special
-attention. \SPLC{}s extension of higher order functions requires that the type
-system can distinguish if an
+attention. \SPLC{}s extension of higher order functions requires that the type
+system can distinguish if an
expression represents a type $a$ or a functions which takes no input and returns
-a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
-the \CI{FuncType}.
+a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
+the \CI{FuncType}.
-Another important difference with standard \SPL{} is that \SPLC{} supports
+Another important difference with standard \SPL{} is that \SPLC{} supports
currying of functions, functions are not typed with a list of argument types
and a return type, but with just a single argument and return type. This return
-type then in itself can be a function.
+type then in itself can be a function.
\begin{lstlisting}[
label={lst:types},
caption={SPLC types},
language=Clean]
-:: Type
+:: Type
= TupleType (Type, Type) -- (t1, t2)
| ListType type -- [t]
| IdType TVar -- type variable
- | IntType
+ | IntType
| BoolType
| CharType
| VoidType
an \AST{} and produces either a fully typed \AST{}, or an error if the provided
\AST{} contains type errors. The Program is typed in an environment $\Gamma :
id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
-schemes.
+schemes.
-In \SPLC{} type inference and unification are done in one pass. They are unified
-in the \tt{Typing} monad, which is
-defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
-carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
-just an alias of \tt{String}.
+In \SPLC{} type inference and unification are done in one pass. They are
+unified in the \tt{Typing} monad, which is defined as \CI{Typing a :== StateT
+(Gamma, [TVar]) (Either SemError) a}. The carried list of \tt{TVar} is an
+infinite list of fresh \tt{TVars}, which are just an alias of \tt{String}.
\subsubsection{Substitution and Unification}
In \SPLC{} substitutions are defined as map from type variables to concrete
empty substitution.
Unification ($\unif$) takes two types and either provides a substitution which
-could
-unify them, or an error of the types are not unifiable. The interesting case
-of unification is when trying to unify an type variable with a more
-specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
-are trivially promoting unification to the contained types, or checking directly
-if the types to be unified are equal.
+could unify them, or an error of the types are not unifiable. The interesting
+case of unification is when trying to unify an type variable with a more
+specialised type. Equation~\ref{eq:unifyId} shows this. All other cases are
+trivially promoting unification to the contained types, or checking directly if
+the types to be unified are equal.
\begin{equation} \label{eq:unifyId}
\infer{\alpha \unif \tau = [tv \mapsto \tau]}
\subsubsection{Inferring}
Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
-Substitution,
-a Type and an updated version of that element. In the case of expressions the
-yielded type is the actual type of the expression. In the case of statements the
-yielded type is the type returned by a return statement if one is present, or
-Void otherwise.
+Substitution, a Type and an updated version of that element. In the case of
+expressions the yielded type is the actual type of the expression. In the case
+of statements the yielded type is the type returned by a return statement if
+one is present, or Void otherwise.
In the Clean implementation the new Gamma in the result of an inference is
-carried through the Typing Monad.
+carried through the Typing Monad.
Below we will show the inference rules for expressions and statements. In these
-lowercase Greek letters are always fresh type variables.
+lowercase Greek letters are always fresh type variables.
Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
-a few to express the basics and show some interesting cases.
+a few to express the basics and show some interesting cases.
\paragraph{Variables}
Inferring variable requires a bit of care. First of all the variable has to
be present in $\Gamma$, secondly field selectors need to be respected. To apply
-the field selectors on a type the function \tt{apfs} is used, which maps a
-type and a field selector to a new type: $\texttt{apfs} : \tau \times
-\textit{fieldselector} \rightarrow \tau$.
+the field selectors on a type the function \tt{apfs} is used, which maps a
+type and a field selector to a new type: $\texttt{apfs} : \tau \times
+\textit{fieldselector} \rightarrow \tau$.
Note that the inference rule only checks if a field selector can be applied on
-a certain type, and does not use the field selector to infer the type. This
-means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
-\tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
-specialised) by the programmer. This could be improved by
-changing the \tt{apfs} function to infer for type variables.
+a certain type, and does not use the field selector to infer the type. This
+means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
+\tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
+specialised) by the programmer. This could be improved by changing the
+\tt{apfs} function to infer for type variables.
\begin{equation}
\infer[Var]
- {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
+ {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
(\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau \rfloor
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
list of expressions is done by folding the infer function over the expressions
-and accumulating the resulting types, substitutions, etc.
+and accumulating the resulting types, substitutions, etc.
\begin{equation} \label{eq:inferApp0}
\infer[App 0]
{\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
- ((\Gamma^\star, \star, \tau^r,
+ ((\Gamma^\star, \star, \tau^r,
\textit{id}().\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau^e \rfloor
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
\begin{equation} \label{eq:inferAppN}
\infer[AppN]
{\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
- ((\Gamma^\star, \star, \tau^r,
+ ((\Gamma^\star, \star, \tau^r,
\textit{id}({e^*}').\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau^e \rfloor
- &\Gamma \vdash e^* \Rightarrow
+ &\Gamma \vdash e^* \Rightarrow
(\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
- &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
+ &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
\unif \tau^e = \star_2
&\star = \star_2 \cdot \star_1
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
\end{equation}
\paragraph{Return statements}
-Statements are typed with the type they return. The base case for this is
-trivially a return statement. An empty return statement is of type \tt{Void}
+Statements are typed with the type they return. The base case for this is
+trivially a return statement. An empty return statement is of type \tt{Void}
and other return statements are typed with the type or their expression.
When checking combinations of statements, e.g. if-then-else statements, it is
-checked that all branches return the same type, then the
-type of the combination is set to this type.
+checked that all branches return the same type, then the type of the
+combination is set to this type.
\begin{equation}
\infer[Return \emptyset]
\paragraph{Assignment statement}
When assigning a new value to a variable this variable needs to be present in
-Gamma and its type needs to unify with the type of the expression.
+Gamma and its type needs to unify with the type of the expression.
-One thing that needs to be taken care of when assigning are the field selectors.
-\tt{unapfs} is a function which takes a type and a fieldselector and returns the
-type of the variable needed to assign to that fieldselector, i.e. \CI{unapfs
-a FieldHd = ListType a}
+One thing that needs to be taken care of when assigning are the field
+selectors. \tt{unapfs} is a function which takes a type and a fieldselector
+and returns the type of the variable needed to assign to that fieldselector,
+i.e. \CI{unapfs a FieldHd = ListType a}
\begin{equation}
\infer[Ass]
{\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
- (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
+ (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
\textit{id.fs}^* \underline{=} e')
}
{\Gamma(\textit{id}) = \lfloor \tau_e \rfloor