From 814a7125946be9bb5c9d2137040e01ce7122b9da Mon Sep 17 00:00:00 2001 From: pimjager Date: Fri, 17 Jun 2016 13:32:23 +0200 Subject: [PATCH] clean up --- deliverables/report/eval.tex | 2 +- deliverables/report/gen.tex | 17 +++++++++-------- deliverables/report/infRules.tex | 2 +- deliverables/report/intro.tex | 10 +++------- deliverables/report/pars.tex | 23 +++++++++++++---------- deliverables/report/sem.tex | 20 ++++++-------------- 6 files changed, 33 insertions(+), 41 deletions(-) diff --git a/deliverables/report/eval.tex b/deliverables/report/eval.tex index ec111a6..719f19a 100644 --- a/deliverables/report/eval.tex +++ b/deliverables/report/eval.tex @@ -11,7 +11,7 @@ the most interesting and most pressing issues. First of all error handling in (0,0), most notably all type errors happen at this position. It would be a great improvement in usability if errors would always be reported at the correct position. A second issue is that in \SPLC{} it is not possible to spread a -program over multiple modules, when developing larger programs this would an +program over multiple modules, when developing larger programs this would be an absolute requirement. Thirdly \SPLC{} currently does not allow for multiple recursion, and all functions need to be declared before they are called. It would be nice to have multiple recursion and to not be forced to structure your diff --git a/deliverables/report/gen.tex b/deliverables/report/gen.tex index ee377e6..068976e 100644 --- a/deliverables/report/gen.tex +++ b/deliverables/report/gen.tex @@ -9,10 +9,10 @@ directly on the stack, while other values are stored on the heap. The heap simply grows with each added value, currently no garbage collection is done. Lists are represented simply as linked lists, by first storing the pointer to the next list element and then the current list element. Tuples are simply -stored by staring the two values in succession. +stored by storing the two values in succession. When calling a functions its arguments are placed on the stack, after which -a \tt{bsr} instruction calls the function. The functions can then load its +a \tt{bsr} instruction calls the function. The function can then load its arguments, if and when these are needed in an expression, by copying them from below its stack frame. @@ -22,13 +22,14 @@ pass-by-value while in others it behave as pass-by-reference. Simple values are trivially pass-by-value, whilst lists and tuples can behave as both, depending on the usage. E.g. assigning a new list/tuple to an argument will behave as pass-by-value, whilst changing the argument through a field selector -will behave as pass by reference. +will behave as pass by reference, thus the pointer is passed by value. -Listing~\ref{lst:stackFunCall} shows the layout of the stack executing the +Listing~\ref{lst:stackFunCall} shows the layout of the stack when executing the two functions. the layout shown is the layout of the stack just before -the \tt{add} instruction to add x and y. Note that, in compliance with \SSM{}s +the \tt{add} instruction to add x and y in plus. Note that, in compliance with +\SSM{}s calling model, the return address of a function is saved below the stack frame -of that function. When plus has returned the two arguments are popped of the +of that function. When plus has returned, the two arguments are popped of the stack and the value in \tt{RR} (for functions which do not return \tt{Void}) is placed on the stack. @@ -53,7 +54,7 @@ main(){ var a = plus(3, 4); } \subsection{Generation} The code generation phase transforms a well typed \AST{} to either an error, -or an \tt{SSMProgram}The latter is a list of labels and instructions. The +or an \tt{SSMProgram}. The latter is a list of labels and instructions. The generation happens within a \CI{WriterT SSMProgram (StateT (Addressbook, [Label]) (Either GenError)) a} environment. Which we have implemented as a \tt{RWST} with \tt{()} for the @@ -75,7 +76,7 @@ a difference between \SI{read} and \SI{read()}. To make this difference we introduced a separate type of expression called a \SI{FuncType}, this type encodes a function with arity $0$. -As all other data types a function type object occupies one item on the stack +As all other data types, a function type object occupies one item on the stack which contains a heap pointer. The heap pointer points to a position on the heap that stores, in increasing addresses, the function identification number, the number of arguments already given and finally the argument values. diff --git a/deliverables/report/infRules.tex b/deliverables/report/infRules.tex index a6594d9..7c47487 100644 --- a/deliverables/report/infRules.tex +++ b/deliverables/report/infRules.tex @@ -109,7 +109,7 @@ Bool \rightarrow Bool$. } \end{equation} -\begin{equation} \label{eq:inferAppN} +\begin{equation} \label{eq:inferStmtAppN} \infer[Stmt AppN] {\Gamma \vdash \textit{id}(e^*).\textit{fs}^* \Rightarrow ((\Gamma^\star, \star, \textrm{Void}, diff --git a/deliverables/report/intro.tex b/deliverables/report/intro.tex index 7f0e7d7..6cea7bd 100644 --- a/deliverables/report/intro.tex +++ b/deliverables/report/intro.tex @@ -1,6 +1,6 @@ \section{Introduction} \subsection{\SPLC} -\SPLC{} is a program written in the purely functional lazy programming language +\SPLC{} is a compiler written in the purely functional lazy programming language Clean\footnote{\url{http://clean.cs.ru.nl/Clean}}. \SPLC{} has a standardized UNIX like command line interface and functions as a filter in a way that it processes standard input to standard output. Default command line argument such @@ -23,12 +23,12 @@ Options: \end{lstlisting} \subsection{Outline} -\SPLC{} has several phases that each can produce their own output. Via command +\SPLC{} has several phases that each produce their own output. Via command line flags the user can enable or disable the printing of the output of specific phases. The output for the code generation is enabled by default and the output for the other phases are disabled by default. -The first and second step are lexing and parsing, which are both explained in +The first and second steps are lexing and parsing, which are both explained in Section~\ref{sec:pars}. When one enables only the parsing the program will show the pretty printed parsed source which is valid \SPL{} code. A simple selftest can thus be done by feeding the pretty printed output back into \SPLC{}. This @@ -50,10 +50,6 @@ Details on code generation is explained in Section~\ref{sec:gen}. Since \SPLC{} acts as a filter and the \SSM{} emulator does too one can chain the commands together with standard pipes as shown in Listing~\ref{lst:splcex}. -In the Section~\ref{sec:ext} it is explained what extra features \SPLC{} -contains. Some syntactic sugars have been added in combination with support for -higher order functions. - \begin{lstlisting}[ label={lst:splcex}, caption={Example \SPLC{} run}, diff --git a/deliverables/report/pars.tex b/deliverables/report/pars.tex index 152c34a..8f29044 100644 --- a/deliverables/report/pars.tex +++ b/deliverables/report/pars.tex @@ -1,6 +1,8 @@ \section{Lexing \& parsing}\label{sec:pars} \subsection{\Yard} -For both lexing and parsing we use proof of concept state of the art minimal +For both lexing and parsing we use proof of concept +%state of the art +minimal parser combinators library called \Yard. \Yard{} is inspired by the well-known \textsc{Parsec} library. Where \Yard{} only has 7 primitive parser (combinators) \textsc{Parsec} already has 9 categories of parser combinators with in every @@ -8,7 +10,7 @@ category there are numerous combinators. Parsers implemented in Yard parse left to right and are greedy, that is they always try to consume as -much input as possible, starting from the left most token encountered. +much input as possible, starting from the left most encountered token. Yards parsers also automatically apply backtracking, when the left parser combined with \CI{<|>} fails, then any input it might have consumed is restored and the right parser is executed. @@ -36,8 +38,9 @@ list :: [a] -> Parser a [a] | Eq a \subsection{Lexing \& Grammar} The existing \SPL{} grammar was not suitable for parsing and lexing. The grammar listed in Listing~\ref{lst:grammar} shows the current and final -grammar. We have added several clauses that are explained in -Section~\ref{sec:ext} such as string literals and higher order functions. +grammar. The grammar for \SPLC{} has been extended to accommodate for the +syntactic sugar \SPLC{} introduces, such as string and list literals, +and for anonymous lambda functions. Moreover we have applied transformation on the grammar including eliminating left recursion, fixing operator priority and associativity. @@ -47,7 +50,7 @@ can be lexed as one token such as literal characters. As said, the lexer uses a \Yard{} parser. The parser takes a list of characters and returns a list of \CI{Token}s. A token is a \CI{TokenValue} accompanied -with a position and the \emph{ADT} used is show in Listing~\ref{lst:lextoken}. +with a position and the \emph{ADT} used is shown in Listing~\ref{lst:lextoken}. Parser combinators make it very easy to account for arbitrary white space and it is much less elegant to do this in a regular way. Listing~\ref{lst:lexerwhite} shows the way we handle white space and recalculate @@ -114,8 +117,8 @@ Listing~\ref{lst:fundecl}. The \CI{FunDecl} is one of the most complex parsers and is composed of as complex subparsers. A \CI{FunDecl} parser exactly one complete function. -First we do the non consuming \CI{peekPos} to make sure the position of -the function is guaranteed in the \AST{}, after that the identifier is parsed +First we do the non consuming \CI{peekPos} to get the position of the first +token without consuming it, after that the identifier is parsed which is basically a parser that transforms an \CI{IdToken} into a \CI{String}. \CI{parseBBraces} is a parser combinator built from the basic combinators that @@ -124,7 +127,7 @@ a yet another helper function to parse comma separated items. Since the next element of the \ADT{} is a \CI{Maybe} the next parser is combined with the optional combinator since the type of a function is optional -and when not given it will be inferred. +and when not given it will be inferred later. A function consists of \emph{many} \CI{VarDecl}s and then \emph{many} \CI{Stmt}s. The word \emph{many} in \Yard{} means zero or more while the word @@ -151,11 +154,11 @@ parseFunDecl = liftM6 FunDecl (flatten <$> (many parseStmt <* satTok CBraceCloseToken)) \end{lstlisting} -\subsection{Sugars} +\subsection{Sugars} \label{ssec:synsugar} Some syntactic sugars are procesessed in the lexing or parsing phase. Entering literal lists and strings is a tedious job because you can only glue them together with cons opererators. Thus we introduced a literal lists notation as -visible in the grammar. Literal strings are lex as single tokens and expanded +visible in the grammar. Literal strings are lexed as single tokens and expanded during parsing in to normal lists of characters. Literal lists are processed in the parsing phase and expanded to their similar form with the \texttt{Cons} operator. An example of this is shown in Listing~\ref{lst:litlist} diff --git a/deliverables/report/sem.tex b/deliverables/report/sem.tex index 8ff63cc..9897c0a 100644 --- a/deliverables/report/sem.tex +++ b/deliverables/report/sem.tex @@ -41,11 +41,6 @@ %} %\end{SPLCode} -Every anonymous functions gets a unique name outside the namespace of legal -names for functions making sure the name is unique. By implementing the -transformation in the parsing phase the function is automatically checked -or inferred on type. - The semantic analysis phase asses if a -grammatically correct- program is also semantically correct and decorates the program with extra information it finds during this checking. The input of the semantic analysis is an \AST{} of @@ -54,12 +49,12 @@ parsed program, its output is either a semantically correct and completely typed \AST{} plus the environment in which it was typed (for debug purposes), or it is an error describing the problem with the \AST{}. -During this analysis \SPLC{} checks four properties of the program: +During this analysis \SPLC{} applies one \AST{} transformation and checks four +properties of the program: \begin{enumerate} \item That no functions are declared with the same name \item That \tt{main} is of type \tt{(-> Void)} \item That the program includes a main function - \item AST transformation for lambdas \item That the program is correctly typed \end{enumerate} The first three steps are simple sanity checks which are coded in slightly over @@ -129,11 +124,8 @@ essence the program is typed as nested \tt{let .. in ..} statements. \subsubsection{Types} Listing~\ref{lst:types} shows the ADT representing types in \SPLC{}. Most of these are self explanatory, however typing of functions requires some special -attention. \SPLC{}s extension is higher order functions, this is described -in more detail in Section~\ref{sec:ext}. These require a change to the -type system, which we will already describe here. - -Higher order functions require that the type system can distinguish if an +attention. \SPLC{}s extension of higher order functions requires that the type +system can distinguish if an expression represents a type $a$ or a functions which takes no input and returns a value of type $a$ (we write: $\rightarrow a$). The latter is represented by the \CI{FuncType}. @@ -150,7 +142,7 @@ type then in itself can be a function. :: Type = TupleType (Type, Type) -- (t1, t2) | ListType type -- [t] - | IdType TVar -- polymorphic type + | IdType TVar -- type variable | IntType | BoolType | CharType @@ -305,7 +297,7 @@ a FieldHd = ListType a} } {\Gamma(\textit{id}) = \lfloor \tau_e \rfloor &\Gamma \vdash e \Rightarrow (\Gamma^{\star_1}, \star_1, \tau_g, e') - &\texttt{fold unapfs } \tau_g \textit{reverse fs} = \tau^r + &\texttt{fold unapfs } \tau_g \textit{ reverse fs} = \tau^r &\tau_e \unif \tau_g = \star_2 &\star = \star_2 \cdot \star_1 } -- 2.20.1