(0,0), most notably all type errors happen at this position. It would be a great
improvement in usability if errors would always be reported at the correct
position. A second issue is that in \SPLC{} it is not possible to spread a
-program over multiple modules, when developing larger programs this would an
+program over multiple modules, when developing larger programs this would be an
absolute requirement. Thirdly \SPLC{} currently does not allow for multiple
recursion, and all functions need to be declared before they are called. It
would be nice to have multiple recursion and to not be forced to structure your
The heap simply grows with each added value, currently no garbage collection is
done. Lists are represented simply as linked lists, by first storing the pointer
to the next list element and then the current list element. Tuples are simply
-stored by staring the two values in succession.
+stored by storing the two values in succession.
When calling a functions its arguments are placed on the stack, after which
-a \tt{bsr} instruction calls the function. The functions can then load its
+a \tt{bsr} instruction calls the function. The function can then load its
arguments, if and when these are needed in an expression, by copying them
from below its stack frame.
are trivially pass-by-value, whilst lists and tuples can behave as both,
depending on the usage. E.g. assigning a new list/tuple to an argument will
behave as pass-by-value, whilst changing the argument through a field selector
-will behave as pass by reference.
+will behave as pass by reference, thus the pointer is passed by value.
-Listing~\ref{lst:stackFunCall} shows the layout of the stack executing the
+Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the
two functions. the layout shown is the layout of the stack just before
-the \tt{add} instruction to add x and y. Note that, in compliance with \SSM{}s
+the \tt{add} instruction to add x and y in plus. Note that, in compliance with
+\SSM{}s
calling model, the return address of a function is saved below the stack frame
-of that function. When plus has returned the two arguments are popped of the
+of that function. When plus has returned, the two arguments are popped of the
stack and the value in \tt{RR} (for functions which do not return \tt{Void})
is placed on the stack.
\subsection{Generation}
The code generation phase transforms a well typed \AST{} to either an error,
-or an \tt{SSMProgram}The latter is a list of labels and instructions. The
+or an \tt{SSMProgram}. The latter is a list of labels and instructions. The
generation happens within a
\CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a}
environment. Which we have implemented as a \tt{RWST} with \tt{()} for the
introduced a separate type of expression called a \SI{FuncType}, this type
encodes a function with arity $0$.
-As all other data types a function type object occupies one item on the stack
+As all other data types, a function type object occupies one item on the stack
which contains a heap pointer. The heap pointer points to a position on the
heap that stores, in increasing addresses, the function identification number,
the number of arguments already given and finally the argument values.
}
\end{equation}
-\begin{equation} \label{eq:inferAppN}
+\begin{equation} \label{eq:inferStmtAppN}
\infer[Stmt AppN]
{\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow
((\Gamma^\star, \star, \textrm{Void},
\section{Introduction}
\subsection{\SPLC}
-\SPLC{} is a program written in the purely functional lazy programming language
+\SPLC{} is a compiler written in the purely functional lazy programming language
Clean\footnote{\url{http://clean.cs.ru.nl/Clean}}. \SPLC{} has a standardized
UNIX like command line interface and functions as a filter in a way that it
processes standard input to standard output. Default command line argument such
\end{lstlisting}
\subsection{Outline}
-\SPLC{} has several phases that each can produce their own output. Via command
+\SPLC{} has several phases that each produce their own output. Via command
line flags the user can enable or disable the printing of the output of
specific phases. The output for the code generation is enabled by default and
the output for the other phases are disabled by default.
-The first and second step are lexing and parsing, which are both explained in
+The first and second steps are lexing and parsing, which are both explained in
Section~\ref{sec:pars}. When one enables only the parsing the program will show
the pretty printed parsed source which is valid \SPL{} code. A simple selftest
can thus be done by feeding the pretty printed output back into \SPLC{}. This
acts as a filter and the \SSM{} emulator does too one can chain the commands
together with standard pipes as shown in Listing~\ref{lst:splcex}.
-In the Section~\ref{sec:ext} it is explained what extra features \SPLC{}
-contains. Some syntactic sugars have been added in combination with support for
-higher order functions.
-
\begin{lstlisting}[
label={lst:splcex},
caption={Example \SPLC{} run},
\section{Lexing \& parsing}\label{sec:pars}
\subsection{\Yard}
-For both lexing and parsing we use proof of concept state of the art minimal
+For both lexing and parsing we use proof of concept
+%state of the art
+minimal
parser combinators library called \Yard. \Yard{} is inspired by the well-known
\textsc{Parsec} library. Where \Yard{} only has 7 primitive parser (combinators)
\textsc{Parsec} already has 9 categories of parser combinators with in every
Parsers implemented in Yard parse left to right and are greedy, that is they
always try to consume as
-much input as possible, starting from the left most token encountered.
+much input as possible, starting from the left most encountered token.
Yards parsers also automatically apply backtracking,
when the left parser combined with \CI{<|>} fails, then any input it might
have consumed is restored and the right parser is executed.
\subsection{Lexing \& Grammar}
The existing \SPL{} grammar was not suitable for parsing and lexing. The
grammar listed in Listing~\ref{lst:grammar} shows the current and final
-grammar. We have added several clauses that are explained in
-Section~\ref{sec:ext} such as string literals and higher order functions.
+grammar. The grammar for \SPLC{} has been extended to accommodate for the
+syntactic sugar \SPLC{} introduces, such as string and list literals,
+and for anonymous lambda functions.
Moreover we have applied transformation on the grammar including eliminating
left recursion, fixing operator priority and associativity.
As said, the lexer uses a \Yard{} parser. The parser takes a list of characters
and returns a list of \CI{Token}s. A token is a \CI{TokenValue} accompanied
-with a position and the \emph{ADT} used is show in Listing~\ref{lst:lextoken}.
+with a position and the \emph{ADT} used is shown in Listing~\ref{lst:lextoken}.
Parser combinators make it very easy to account for arbitrary white space and
it is much less elegant to do this in a regular way.
Listing~\ref{lst:lexerwhite} shows the way we handle white space and recalculate
and is composed of as complex subparsers. A \CI{FunDecl} parser exactly one
complete function.
-First we do the non consuming \CI{peekPos} to make sure the position of
-the function is guaranteed in the \AST{}, after that the identifier is parsed
+First we do the non consuming \CI{peekPos} to get the position of the first
+token without consuming it, after that the identifier is parsed
which is basically a parser that transforms an \CI{IdToken} into a \CI{String}.
\CI{parseBBraces} is a parser combinator built from the basic combinators that
Since the next element of the \ADT{} is a \CI{Maybe} the next parser is
combined with the optional combinator since the type of a function is optional
-and when not given it will be inferred.
+and when not given it will be inferred later.
A function consists of \emph{many} \CI{VarDecl}s and then \emph{many}
\CI{Stmt}s. The word \emph{many} in \Yard{} means zero or more while the word
(flatten <$> (many parseStmt <* satTok CBraceCloseToken))
\end{lstlisting}
-\subsection{Sugars}
+\subsection{Sugars} \label{ssec:synsugar}
Some syntactic sugars are procesessed in the lexing or parsing phase. Entering
literal lists and strings is a tedious job because you can only glue them
together with cons opererators. Thus we introduced a literal lists notation as
-visible in the grammar. Literal strings are lex as single tokens and expanded
+visible in the grammar. Literal strings are lexed as single tokens and expanded
during parsing in to normal lists of characters. Literal lists are processed in
the parsing phase and expanded to their similar form with the \texttt{Cons}
operator. An example of this is shown in Listing~\ref{lst:litlist}
%}
%\end{SPLCode}
-Every anonymous functions gets a unique name outside the namespace of legal
-names for functions making sure the name is unique. By implementing the
-transformation in the parsing phase the function is automatically checked
-or inferred on type.
-
The semantic analysis phase asses if a -grammatically correct- program is also
semantically correct and decorates the program with extra information it
finds during this checking. The input of the semantic analysis is an \AST{} of
\AST{} plus the environment in which it was typed (for debug purposes), or it is
an error describing the problem with the \AST{}.
-During this analysis \SPLC{} checks four properties of the program:
+During this analysis \SPLC{} applies one \AST{} transformation and checks four
+properties of the program:
\begin{enumerate}
\item That no functions are declared with the same name
\item That \tt{main} is of type \tt{(-> Void)}
\item That the program includes a main function
- \item AST transformation for lambdas
\item That the program is correctly typed
\end{enumerate}
The first three steps are simple sanity checks which are coded in slightly over
\subsubsection{Types}
Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of
these are self explanatory, however typing of functions requires some special
-attention. \SPLC{}s extension is higher order functions, this is described
-in more detail in Section~\ref{sec:ext}. These require a change to the
-type system, which we will already describe here.
-
-Higher order functions require that the type system can distinguish if an
+attention. \SPLC{}s extension of higher order functions requires that the type
+system can distinguish if an
expression represents a type $a$ or a functions which takes no input and returns
a value of type $a$ (we write: $\rightarrow a$). The latter is represented by
the \CI{FuncType}.
:: Type
= TupleType (Type, Type) -- (t1, t2)
| ListType type -- [t]
- | IdType TVar -- polymorphic type
+ | IdType TVar -- type variable
| IntType
| BoolType
| CharType
}
{\Gamma(\textit{id}) = \lfloor \tau_e \rfloor
&\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e')
- &\texttt{fold unapfs } \tau_g \textit{reverse fs} = \tau^r
+ &\texttt{fold unapfs } \tau_g \textit{ reverse fs} = \tau^r
&\tau_e \unif \tau_g = \star_2
&\star = \star_2 \cdot \star_1
}