deliverables/report/pars.tex

   1 \section{Lexing \& parsing}\label{sec:pars}
   2 \subsection{\Yard}
   3 For both lexing and parsing we use proof of concept state of the art minimal
   4 parser combinators library called \Yard. \Yard{} is inspired by the well-known
   5 \textsc{Parsec} library. Where \Yard{} only has 9 parser combinators
   6 \textsc{Parsec} already has 9 categories of parser combinators with in every
   7 category there are numerous combinators.
   8
   9 \begin{lstlisting}[language=Clean]
  10 :: Error = PositionalError Int Int String | Error String
  11 :: Parser a b = Parser ([a] -> (Either Error b, [a]))
  12
  13 (<?>) :: (Parser a b) Error -> Parser a b
  14 fail :: Parser a b
  15 top :: Parser a a
  16 peek :: Parser a a
  17 satisfy :: (a -> Bool) -> Parser a a
  18 check :: (a -> Bool) -> Parser a a
  19 (until) infix 2 :: (Parser a b) (Parser a c) -> Parser a [b]
  20 item :: a -> Parser a a | Eq a
  21 list :: [a] -> Parser a [a] | Eq a
  22 eof :: Parser a Void
  23 \end{lstlisting}
  24
  25 \subsection{Lexing \& Grammar}
  26 The existing \SPL{} grammar was not suitable for parsing and lexing. The
  27 grammar listed in Listing~\ref{lst:grammar} shows the current and final
  28 grammar. We have added several clauses that are explained in
  29 Section~\ref{sec:ext} such as string literals and higher order functions. The
  30 transformation of the grammar include eliminating left recursion, fixing
  31 operator priority and associativity.
  32
  33 To make the parsing more easy we first lex the character input into tokens. In
  34 this way the parser can be simplified a lot and some more complex constructs
  35 can be lexed as one token such as literal characters.
  36
  37 As said, the lexer uses a \Yard{} parser. The parser takes a list of characters
  38 and returns a list of \CI{Token}s. A token is a \CI{TokenValue} accompanied
  39 with a position and the \emph{ADT} used is show in Listing~\ref{lst:lextoken}.
  40 Parser combinators make it very easy to account for arbitrary whitespace and it
  41 is much less elegant to do this in a regular way. By choosing to lex with
  42 parser combinators the speed of the phase decreases. However, since the parsers
  43 are very easy this increase is very small.
  44
  45 \begin{lstlisting}[
  46         language=Clean,
  47         label={lst:lextoken},
  48         caption={Lexer token ADT}]
  49 :: Pos = {line :: Int, col :: Int}
  50 :: Token :== (Pos, TokenValue)
  51 :: TokenValue
  52         //Value tokens
  53         = IdentToken String
  54         | NumberToken Int
  55         | CharToken Char
  56         | StringToken [Char]
  57         //Keyword tokens
  58         | VarToken
  59         | ReturnToken
  60         | IfToken
  61         | ...
  62 \end{lstlisting}
  63
  64 A lexer written with parser combinators is pretty straight forward and in
  65 practice it is just a very long list of \texttt{alternative} monadic
  66 combinators. However the order is very important since some tokens are very
  67 greedy. Almost all tokens can be lexed trivially with the \CI{list} combinator
  68 for matching literal strings. Comments and literals are the only exception that
  69 require a non trivial parser.
  70
  71 \subsection{Parsing}
  72 %On March 10 you have to give a very brief presentation.  In this presentation you tell the other
  73 %students and us how your parser is constructed and demonstrate your parser and pretty printer.
  74 %You should mention things like implementation language used, changes to the grammar, and other
  75 %interesting points of your program.
  76 %For this demonstration you have to prepare at least 10 test programs in
  77 %SPL
  78 %. In your presentation
  79 %you  have  to  show  only  the  most  interesting  or  challenging  example.   You  can  use  the  program
  80 %4
  81 %above as a starting point.  Hand in the test programs, and a document containing the transformed
  82 %grammar as used by your parser.  Indicate what parts of the semantic analysis are handled by your
  83 %scanner