\section{Lexing \& parsing}\label{sec:pars}
\subsection{\Yard}
-For both lexing and parsing we use proof of concept
-%state of the art
-minimal
-parser combinators library called \Yard. \Yard{} is inspired by the well-known
-\textsc{Parsec} library. Where \Yard{} only has 7 primitive parser (combinators)
+For both lexing and parsing we use proof of concept minimal parser combinators
+library called \Yard. \Yard{} is inspired by the well-known \textsc{Parsec}
+library. Where \Yard{} only has 7 primitive parser (combinators)
\textsc{Parsec} already has 9 categories of parser combinators with in every
category there are numerous combinators.
-Parsers implemented in Yard parse left to right and are greedy, that is they
-always try to consume as
-much input as possible, starting from the left most encountered token.
-Yards parsers also automatically apply backtracking,
-when the left parser combined with \CI{<|>} fails, then any input it might
-have consumed is restored and the right parser is executed.
+Parsers implemented in Yard parse left to right and are greedy, that is that
+they always try to consume as much input as possible, starting from the left
+most encountered token. \Yard's parsers also automatically apply backtracking,
+when the left parser combined with \CI{<|>} fails, then any input it might have
+consumed is restored and the right parser is executed.
\begin{lstlisting}[language=Clean]
:: Error = PositionalError Int Int String | Error String
The existing \SPL{} grammar was not suitable for parsing and lexing. The
grammar listed in Listing~\ref{lst:grammar} shows the current and final
grammar. The grammar for \SPLC{} has been extended to accommodate for the
-syntactic sugar \SPLC{} introduces, such as string and list literals,
-and for anonymous lambda functions.
-Moreover we have applied transformation on the grammar including eliminating
-left recursion, fixing operator priority and associativity.
+syntactic sugar \SPLC{} introduces, such as string and list literals, and for
+anonymous lambda functions. Moreover we have applied transformation on the
+grammar including eliminating left recursion, fixing operator priority and
+associativity.
To make the parsing more easy we first lex the character input into tokens. In
this way the parser can be simplified a lot and some more complex constructs
with a position and the \emph{ADT} used is shown in Listing~\ref{lst:lextoken}.
Parser combinators make it very easy to account for arbitrary white space and
it is much less elegant to do this in a regular way.
-Listing~\ref{lst:lexerwhite} shows the way we handle white space and recalculate
-positions. By choosing to lex with parser combinators the speed of the phase
-decreases. However due to the simplicity of the lexer this is barely
+Listing~\ref{lst:lexerwhite} shows the way we handle white space and
+recalculate positions. By choosing to lex with parser combinators the speed of
+the phase decreases. However due to the simplicity of the lexer this is barely
measurable.
\begin{lstlisting}[
\subsection{Parsing}
The parsing phase is the second phase of the parser and is yet again a \Yard{}
-parser that transforms a list of tokens to an Abstract Syntax Tree(\AST). The
-full abstract syntax tree is listed in Listing~\ref{lst:ast} which closely
-resembles the grammar.
+parser that transforms a list of tokens to an \emph{Abstract Syntax
+Tree}(\AST). The full abstract syntax tree is listed in Listing~\ref{lst:ast}
+which closely resembles the grammar.
The parser uses the standard \Yard{} combinators. Due to the modularity of
combinators it was and is very easy to add functionality to the parser. The
parseFunDecl :: Parser Token FunDecl
parseFunDecl = liftM6 FunDecl
- (peekPos)
- (parseIdent)
+ (peekPos)
+ (parseIdent)
(parseBBraces $ parseSepList CommaToken parseIdent)
(optional (satTok DoubleColonToken *> parseFunType))
(satTok CBraceOpenToken *> many parseVarDecl)
\end{lstlisting}
\subsection{Sugars} \label{ssec:synsugar}
-Some syntactic sugars are procesessed in the lexing or parsing phase. Entering
+Some syntactic sugars are processed in the lexing or parsing phase. Entering
literal lists and strings is a tedious job because you can only glue them
-together with cons opererators. Thus we introduced a literal lists notation as
+together with cons operators. Thus we introduced a literal lists notation as
visible in the grammar. Literal strings are lexed as single tokens and expanded
during parsing in to normal lists of characters. Literal lists are processed in
the parsing phase and expanded to their similar form with the \texttt{Cons}
Another sugar added is variable arguments printing. To enable the user to
pretty print data the number of arguments of the \SI{print} function is not
-fixed and it is expanded at compiletime. With this sugar the programmer can
-print several different types in one line. Listing~\ref{lst:varprint} shows an
-example of the sugar. In the semantic analysis phase these statements are
+fixed and it is expanded at time of compilation. With this sugar the programmer
+can print several different types in one line. Listing~\ref{lst:varprint} shows
+an example of the sugar. In the semantic analysis phase these statements are
actually refined even more to represent the specific print function per type
since the compiler allows printing of \SI{Int}, \SI{Bool}, \SI{Char} and
\SI{[Char]}
\section{Semantic analysis}\label{sec:sem}
-%The grammar used to parse
-%SPL
-%, if it is di erent from the given grammar.
-%{
-%The scoping rules you use and check in your compiler.
-%{
-%The typing rules used for
-%SPL
-%.
-%{
-%A brief guide telling what each of the examples tests (see next point).
-%{
-%A concise but precise description of how the work was divided amongst the
-%members of
-%the team.
-
-\newcommand{\unif}{{\scriptscriptstyle\cup}}
-%\subsubsection{Anonymous functions}
-%When the programmers wants to use small functions that consist of one line to
-%for example use the operators in a functional way the programmer has to make a
-%named function for every operator. In some cases it is easy to do this inline
-%using anonymous functions. A small syntactic sugar has been added that will
-%inline anonymous functions to non-anonymous functions during the parsing phase.
-%The following snippet is an example of a filter on even values before and after
-%transformation.
-%
-%\begin{SPLCode}
-%//Before transformation
-%main(){
-% var a = filter(\x->x % 2 == 0, 1 : 2 : 3 : 4 : 5 : []);
-%}
-%
-%//After transformation
-%1lambda_23(x){
-% return x % 2 == 0;
-%}
-%
-%main(){
-% var a = filter(1lambda_23, 1 : 2 : 3 : 4 : 5 : []);
-%}
-%\end{SPLCode}
-
The semantic analysis phase asses if a -grammatically correct- program is also
-semantically correct and decorates the program with extra information it
-finds during this checking. The input of the semantic analysis is an \AST{} of
-the
-parsed program, its output is either a semantically correct and completely typed
-\AST{} plus the environment in which it was typed (for debug purposes), or it is
-an error describing the problem with the \AST{}.
+semantically correct and decorates the program with extra information it finds
+during this checking. The input of the semantic analysis is an \AST{} of the
+parsed program, its output is either a semantically correct and completely
+typed \AST{} plus the environment in which it was typed (for debug purposes),
+or it is an error describing the problem with the \AST{}.
-During this analysis \SPLC{} applies one \AST{} transformation and checks four
+During this analysis \SPLC{} applies one \AST{} transformation and checks four
properties of the program:
\begin{enumerate}
\item That no functions are declared with the same name
\item That the program includes a main function
\item That the program is correctly typed
\end{enumerate}
-The first three steps are simple sanity checks which are coded in slightly over
-two dozen lines of \Clean{} code. The last step is a Hindley-Milner type
-inference
-algorithm, which is a magnitude more complex. This last step also decorates the
-\AST{} with inferred type information.
+The first three steps are simple sanity checks which are coded in slightly over
+two dozen lines of \Clean{} code. The last step is a \emph{Hindley-Milner} type
+inference algorithm, which is a magnitude more complex. This last step also
+decorates the \AST{} with inferred type information.
\subsection{AST transformation for lambdas}
-Before checking the semantic correctness of the program one small \AST{}
-transformations needs to be done. Currently this transformation is build into
-the semantic analysis phase, however, it would be more in place as a separate
+Before checking the semantic correctness of the program one small \AST{}
+transformations needs to be done. Currently this transformation is build into
+the semantic analysis phase, however, it would be more in place as a separate
compiler phase, and if more transformations are needed in the future these will
be placed in a separate phase.
The implementation of lambdas in \SPLC{} is implemented through a clever trick.
All \tt{LambdaExpressions} in the \AST{} are lifted to actual functions. This
-transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
-has the advantage that, with our support for higher order functions, lambda
-functions do not require any special attention after this transformation. One
+transformation is illustrated in Listing~\ref{lst:lambdaLift}. This
+has the advantage that, with our support for higher order functions, lambda
+functions do not require any special attention after this transformation. One
downside is that in the current implementation lambdas can not access variables
-in the scope in which they are declared. This could be solved by adding all
+in the scope in which they are declared. This could be solved by adding all
free variables in the lambda expression as parameters to the lifted function.
\begin{lstlisting}[
label={lst:lambdaLift},
caption={SPLC types},
- language=Clean]
+ language=SPLCode]
main() {
var x = \a b -> a + b;
}
---is transformed to:
+
+// is transformed to:
2lambda_12(a, b) {
return a + b;
}
}
\end{lstlisting}
-% \subsection{Sanity checks}
-% The sanity checks are defined as simple recursive functions over the function
-% declarations in the \AST. We will very shortly describe them here.
-
-% \begin{description}
-% \item [No duplicate functions] This function checks that the program does
-% not contain top level functions sharing the same name. Formally:
-% $ \forall f \in \textrm{functions} \ldotp \neg
-% \exists g \in \textrm{functions} \setminus \{f\} \ldotp
-% f.\textrm{name} = g.\textrm{name}$
-% \item [\tt{main} should have type \tt{->Void}] Our semantics of \SPL{}
-% require the main function to take no arguments and to not return a
-% value. Formally: $\forall f \in \textrm{functions} \ldotp
-% f.\textrm{name} = \textrm{'main'} \Rightarrow
-% \textit{typeof}\textrm{(main)} = \rightarrow\textrm{Void} $
-% \item [The program should have a main function] $\exists f \in
-% \textrm{functions} \ldotp f.\textrm{name} = \textrm{'main'}$
-% \end{description}
-
\subsection{Type inference}
-\SPLC{} features a Hindley-Milner type inference algorithm based on the
-algorithm presented in the slides. It supports full polymorphism and can
-infer omitted types. It does not support multiple recursion and all
-functions and identifiers need to be declared before they can be used. In
-essence the program is typed as nested \tt{let .. in ..} statements.
+\SPLC{} features a \emph{Hindley-Milner} type inference algorithm based on the
+algorithm presented in the slides. It supports full polymorphism and can infer
+omitted types. It does not support multiple recursion and all functions and
+identifiers need to be declared before they can be used. In essence the program
+is typed as nested \tt{let .. in ..} statements.
\subsubsection{Types}
-Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
+Listing~\ref{lst:types} shows the \ADT{} representing types in \SPLC{}. Most of
these are self explanatory, however typing of functions requires some special
-attention. \SPLC{}s extension of higher order functions requires that the type
-system can distinguish if an
+attention. \SPLC{}s extension of higher order functions requires that the type
+system can distinguish if an
expression represents a type $a$ or a functions which takes no input and returns
-a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
-the \CI{FuncType}.
+a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
+the \CI{FuncType}.
-Another important difference with standard \SPL{} is that \SPLC{} supports
+Another important difference with standard \SPL{} is that \SPLC{} supports
currying of functions, functions are not typed with a list of argument types
and a return type, but with just a single argument and return type. This return
-type then in itself can be a function.
+type then in itself can be a function.
\begin{lstlisting}[
label={lst:types},
caption={SPLC types},
language=Clean]
-:: Type
+:: Type
= TupleType (Type, Type) -- (t1, t2)
| ListType type -- [t]
| IdType TVar -- type variable
- | IntType
+ | IntType
| BoolType
| CharType
| VoidType
an \AST{} and produces either a fully typed \AST{}, or an error if the provided
\AST{} contains type errors. The Program is typed in an environment $\Gamma :
id \rightarrow \Sigma$ which maps identifiers of functions or variables to type
-schemes.
+schemes.
-In \SPLC{} type inference and unification are done in one pass. They are unified
-in the \tt{Typing} monad, which is
-defined as \CI{Typing a :== StateT (Gamma, [TVar]) (Either SemError) a}. The
-carried list of \tt{TVar} is an infinite list of fresh \tt{TVars}, which are
-just an alias of \tt{String}.
+In \SPLC{} type inference and unification are done in one pass. They are
+unified in the \tt{Typing} monad, which is defined as \CI{Typing a :== StateT
+(Gamma, [TVar]) (Either SemError) a}. The carried list of \tt{TVar} is an
+infinite list of fresh \tt{TVars}, which are just an alias of \tt{String}.
\subsubsection{Substitution and Unification}
In \SPLC{} substitutions are defined as map from type variables to concrete
empty substitution.
Unification ($\unif$) takes two types and either provides a substitution which
-could
-unify them, or an error of the types are not unifiable. The interesting case
-of unification is when trying to unify an type variable with a more
-specialised type. Equation~\ref{eq:unifyId} shows this. All other cases
-are trivially promoting unification to the contained types, or checking directly
-if the types to be unified are equal.
+could unify them, or an error of the types are not unifiable. The interesting
+case of unification is when trying to unify an type variable with a more
+specialised type. Equation~\ref{eq:unifyId} shows this. All other cases are
+trivially promoting unification to the contained types, or checking directly if
+the types to be unified are equal.
\begin{equation} \label{eq:unifyId}
\infer{\alpha \unif \tau = [tv \mapsto \tau]}
\subsubsection{Inferring}
Inferring the type of expression or statement yields a 4-tuple of a new Gamma,
-Substitution,
-a Type and an updated version of that element. In the case of expressions the
-yielded type is the actual type of the expression. In the case of statements the
-yielded type is the type returned by a return statement if one is present, or
-Void otherwise.
+Substitution, a Type and an updated version of that element. In the case of
+expressions the yielded type is the actual type of the expression. In the case
+of statements the yielded type is the type returned by a return statement if
+one is present, or Void otherwise.
In the Clean implementation the new Gamma in the result of an inference is
-carried through the Typing Monad.
+carried through the Typing Monad.
Below we will show the inference rules for expressions and statements. In these
-lowercase Greek letters are always fresh type variables.
+lowercase Greek letters are always fresh type variables.
Section~\ref{sec:infRules} shows all typing inference rules. Below we will show
-a few to express the basics and show some interesting cases.
+a few to express the basics and show some interesting cases.
\paragraph{Variables}
Inferring variable requires a bit of care. First of all the variable has to
be present in $\Gamma$, secondly field selectors need to be respected. To apply
-the field selectors on a type the function \tt{apfs} is used, which maps a
-type and a field selector to a new type: $\texttt{apfs} : \tau \times
-\textit{fieldselector} \rightarrow \tau$.
+the field selectors on a type the function \tt{apfs} is used, which maps a
+type and a field selector to a new type: $\texttt{apfs} : \tau \times
+\textit{fieldselector} \rightarrow \tau$.
Note that the inference rule only checks if a field selector can be applied on
-a certain type, and does not use the field selector to infer the type. This
-means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
-\tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
-specialised) by the programmer. This could be improved by
-changing the \tt{apfs} function to infer for type variables.
+a certain type, and does not use the field selector to infer the type. This
+means that a program: \tt{f(x)\{ return x.hd; \}} can not be typed, and instead
+\tt{f} must be explicitly typed as \tt{:: [a] -> a} (or something more
+specialised) by the programmer. This could be improved by changing the
+\tt{apfs} function to infer for type variables.
\begin{equation}
\infer[Var]
- {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
+ {\Gamma \vdash \textit{id}.\textit{fs}^* \Rightarrow
(\Gamma, \star_0, \tau', \textit{id}.\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau \rfloor
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau'
is applied without any arguments (Equation~\ref{eq:inferApp0}) and one for when
a function is applied with arguments (Equation~\ref{eq:inferAppN}). Inferring a
list of expressions is done by folding the infer function over the expressions
-and accumulating the resulting types, substitutions, etc.
+and accumulating the resulting types, substitutions, etc.
\begin{equation} \label{eq:inferApp0}
\infer[App 0]
{\Gamma \vdash \textit{id}().\textit{fs}^* \Rightarrow
- ((\Gamma^\star, \star, \tau^r,
+ ((\Gamma^\star, \star, \tau^r,
\textit{id}().\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau^e \rfloor
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau^r
\begin{equation} \label{eq:inferAppN}
\infer[AppN]
{\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
- ((\Gamma^\star, \star, \tau^r,
+ ((\Gamma^\star, \star, \tau^r,
\textit{id}({e^*}').\textit{fs}^*)}
{\Gamma(id) = \lfloor \tau^e \rfloor
- &\Gamma \vdash e^* \Rightarrow
+ &\Gamma \vdash e^* \Rightarrow
(\Gamma^{\star_1}, \star_1, \tau^*, {e^*}')
- &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
+ &(\texttt{fold } (\rightarrow) \texttt{ } \alpha \texttt{ } \tau^*)
\unif \tau^e = \star_2
&\star = \star_2 \cdot \star_1
&\texttt{fold apfs } \tau \textit{ fs}^* = \tau^e
\end{equation}
\paragraph{Return statements}
-Statements are typed with the type they return. The base case for this is
-trivially a return statement. An empty return statement is of type \tt{Void}
+Statements are typed with the type they return. The base case for this is
+trivially a return statement. An empty return statement is of type \tt{Void}
and other return statements are typed with the type or their expression.
When checking combinations of statements, e.g. if-then-else statements, it is
-checked that all branches return the same type, then the
-type of the combination is set to this type.
+checked that all branches return the same type, then the type of the
+combination is set to this type.
\begin{equation}
\infer[Return \emptyset]
\paragraph{Assignment statement}
When assigning a new value to a variable this variable needs to be present in
-Gamma and its type needs to unify with the type of the expression.
+Gamma and its type needs to unify with the type of the expression.
-One thing that needs to be taken care of when assigning are the field selectors.
-\tt{unapfs} is a function which takes a type and a fieldselector and returns the
-type of the variable needed to assign to that fieldselector, i.e. \CI{unapfs
-a FieldHd = ListType a}
+One thing that needs to be taken care of when assigning are the field
+selectors. \tt{unapfs} is a function which takes a type and a fieldselector
+and returns the type of the variable needed to assign to that fieldselector,
+i.e. \CI{unapfs a FieldHd = ListType a}
\begin{equation}
\infer[Ass]
{\Gamma \vdash \textit{id.fs}^* \underline{=} e \Rightarrow
- (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
+ (\Gamma^\star.k:\tau_r^\star, \star, \textrm{Void},
\textit{id.fs}^* \underline{=} e')
}
{\Gamma(\textit{id}) = \lfloor \tau_e \rfloor