From: Mart Lubbers <mart@martlubbers.net>
Date: Fri, 10 Jun 2016 07:28:21 +0000 (+0200)
Subject: stukje lexen gedaan
X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=59051986f807a7b711bfefde071fefaf9148927b;p=cc1516.git

stukje lexen gedaan
---

diff --git a/deliverables/report/pars.tex b/deliverables/report/pars.tex
index e666a47..2bb4a35 100644
--- a/deliverables/report/pars.tex
+++ b/deliverables/report/pars.tex
@@ -22,8 +22,51 @@ list :: [a] -> Parser a [a] | Eq a
 eof :: Parser a Void
 \end{lstlisting}
 
-\subsection{Lexing}
-Listing~\ref{lst:grammar}.
+\subsection{Lexing \& Grammar}
+The existing \SPL{} grammar was not suitable for parsing and lexing. The
+grammar listed in Listing~\ref{lst:grammar} shows the current and final
+grammar. We have added several clauses that are explained in
+Section~\ref{sec:ext} such as string literals and higher order functions. The
+transformation of the grammar include eliminating left recursion, fixing
+operator priority and associativity.
+
+To make the parsing more easy we first lex the character input into tokens. In
+this way the parser can be simplified a lot and some more complex constructs
+can be lexed as one token such as literal characters.
+
+As said, the lexer uses a \Yard{} parser. The parser takes a list of characters
+and returns a list of \CI{Token}s. A token is a \CI{TokenValue} accompanied
+with a position and the \emph{ADT} used is show in Listing~\ref{lst:lextoken}.
+Parser combinators make it very easy to account for arbitrary whitespace and it
+is much less elegant to do this in a regular way. By choosing to lex with
+parser combinators the speed of the phase decreases. However, since the parsers
+are very easy this increase is very small.
+
+\begin{lstlisting}[
+	language=Clean,
+	label={lst:lextoken},
+	caption={Lexer token ADT}]
+:: Pos = {line :: Int, col :: Int}
+:: Token :== (Pos, TokenValue)
+:: TokenValue
+	//Value tokens
+	= IdentToken String
+	| NumberToken Int
+	| CharToken Char
+	| StringToken [Char]
+	//Keyword tokens
+	| VarToken
+	| ReturnToken
+	| IfToken
+	| ...
+\end{lstlisting}
+
+A lexer written with parser combinators is pretty straight forward and in
+practice it is just a very long list of \texttt{alternative} monadic
+combinators. However the order is very important since some tokens are very
+greedy. Almost all tokens can be lexed trivially with the \CI{list} combinator
+for matching literal strings. Comments and literals are the only exception that
+require a non trivial parser.
 
 \subsection{Parsing}
 %On March 10 you have to give a very brief presentation.  In this presentation you tell the other