From 59051986f807a7b711bfefde071fefaf9148927b Mon Sep 17 00:00:00 2001 From: Mart Lubbers Date: Fri, 10 Jun 2016 09:28:21 +0200 Subject: [PATCH] stukje lexen gedaan --- deliverables/report/pars.tex | 47 ++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/deliverables/report/pars.tex b/deliverables/report/pars.tex index e666a47..2bb4a35 100644 --- a/deliverables/report/pars.tex +++ b/deliverables/report/pars.tex @@ -22,8 +22,51 @@ list :: [a] -> Parser a [a] | Eq a eof :: Parser a Void \end{lstlisting} -\subsection{Lexing} -Listing~\ref{lst:grammar}. +\subsection{Lexing \& Grammar} +The existing \SPL{} grammar was not suitable for parsing and lexing. The +grammar listed in Listing~\ref{lst:grammar} shows the current and final +grammar. We have added several clauses that are explained in +Section~\ref{sec:ext} such as string literals and higher order functions. The +transformation of the grammar include eliminating left recursion, fixing +operator priority and associativity. + +To make the parsing more easy we first lex the character input into tokens. In +this way the parser can be simplified a lot and some more complex constructs +can be lexed as one token such as literal characters. + +As said, the lexer uses a \Yard{} parser. The parser takes a list of characters +and returns a list of \CI{Token}s. A token is a \CI{TokenValue} accompanied +with a position and the \emph{ADT} used is show in Listing~\ref{lst:lextoken}. +Parser combinators make it very easy to account for arbitrary whitespace and it +is much less elegant to do this in a regular way. By choosing to lex with +parser combinators the speed of the phase decreases. However, since the parsers +are very easy this increase is very small. + +\begin{lstlisting}[ + language=Clean, + label={lst:lextoken}, + caption={Lexer token ADT}] +:: Pos = {line :: Int, col :: Int} +:: Token :== (Pos, TokenValue) +:: TokenValue + //Value tokens + = IdentToken String + | NumberToken Int + | CharToken Char + | StringToken [Char] + //Keyword tokens + | VarToken + | ReturnToken + | IfToken + | ... +\end{lstlisting} + +A lexer written with parser combinators is pretty straight forward and in +practice it is just a very long list of \texttt{alternative} monadic +combinators. However the order is very important since some tokens are very +greedy. Almost all tokens can be lexed trivially with the \CI{list} combinator +for matching literal strings. Comments and literals are the only exception that +require a non trivial parser. \subsection{Parsing} %On March 10 you have to give a very brief presentation. In this presentation you tell the other -- 2.20.1