dsl/dsl_techniques.tex

   1 \documentclass[../thesis.tex]{subfiles}
   2
   3 \include{subfilepreamble}
   4
   5 \begin{document}
   6 \chapter{\texorpdfstring{\Glsxtrshort{DSL}}{DSL} embedding techniques}%
   7 \label{chp:dsl_embedding_techniques}%
   8 \begin{chapterabstract}
   9         An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
  10         Properties such as referential transparency, minimal syntax, powerful type systems and rich data types make \gls{FP} languages excellent candidates for hosting \glspl{EDSL}.
  11         Terms in an \glspl{EDSL} can have multiple interpretations\footnote{Interpretations are also called backends or views}, i.e.\ a term in the \gls{DSL} is just an interface.
  12         Commonly used intepretations are printing, compiling, simulating, optimising, verifying, proving the program\etc.
  13
  14         There are two flavours of \gls{DSL} embedding: deep- and shallow embedding~\citep{boulton_experience_1992}.
  15         Shallow or tagless embedding models language constructs as functions in the host language.
  16         As a result, adding new language constructs---extra functions---is easy.
  17         However, the interpretation of the language is embedded in these functions, making it troublesome to add semantics since it requires updating all existing language constructs.
  18
  19         In contrast to shallow embedding, deep embedding or tagged models terms in the language as data types.
  20         Interpretations are functions over these data types.
  21
  22         Consequently, adding new semantics, i.e.\ novel functions, is straightforward.
  23         It can be stated that the language constructs are embedded in the functions that form a semantics.
  24         If one wants to add a language construct, all semantics functions must be revisited and revised to avoid ending up with partial functions.
  25
  26         This juxtaposition has been known for many years~\citep{reynolds_user-defined_1978} and discussed by many others~\citep{krishnamurthi_synthesizing_1998} but most famously dubbed the \emph{expression problem} by \citet{wadler_expression_1998}:
  27
  28         \begin{quote}
  29                 The \emph{expression problem} is a new name for an old problem.
  30                 The goal is to define a data type by cases, where one can add new cases to the data type and new functions over the data type, without recompiling existing code, and while retaining static type safety (e.g., no casts).
  31         \end{quote}
  32
  33 %       Most importantly, the two flavours differ on two axes: extensibility of language constructs and extensibility of interpretations.
  34 %       \todo{elaborate}
  35
  36         Using a simple language with integers, booleans and some arithmetic operators, \cref{sec:deep_embedding} shows some deep embedding variants and \cref{sec:shallow_embedding} shows the relevant shallow embedding variants.
  37         \Cref{sec:compare_embedding} compares the embedding techniques.
  38 \end{chapterabstract}
  39
  40 \section{Deep embedding}\label{sec:deep_embedding}
  41 In a deeply embedded \gls{DSL}, the language terms are represented as data types in the host language.
  42 Therefore, interpretations of the terms are functions that operate on these data types.
  43 \Cref{lst:exdeep} shows an implementation for the example \gls{DSL}.
  44
  45 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
  46 data Value = I Int | B Bool
  47 data Expr
  48         = Lit  Value
  49         | Plus Expr Expr
  50         | Eq   Expr Expr
  51   deriving Show
  52 \end{lstHaskell}
  53
  54 Implementing a printer for the language is straightforward, we just define a function that transforms the term to a string.
  55
  56 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
  57 print :: Expr -> String
  58 print (Lit i)    = show i
  59 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
  60 print (Eq l r)   = "(" ++ print l ++ "==" ++ print r ++ ")"
  61 \end{lstHaskell}
  62
  63 Adding a construct---for example subtraction---reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
  64 I.e.\ we need to add \haskellinline{\| Sub Expr Expr} to the \haskellinline{Expr} data type.
  65 Furthermore, we need to add \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view in order to not end up with a partial function.
  66 Using a novel variant of deep embedding, this limitation can be overcome by lifting the views to classes (see \cref{chp:classy_deep_embedding}).
  67
  68 Implementing an extra view, an evaluator as shown in \cref{lst:deep_simple}, for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
  69 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
  70 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
  71
  72 \begin{lstHaskell}[label={lst:deep_simple},caption={An evaluator for the deeply embedded expression \gls{DSL}.}]
  73 eval :: Expr -> Value
  74 eval (Lit i)    = i
  75 eval (Plus l r) = case (eval l, eval r) of
  76         (Lit (I l), Lit (I r)) -> I (l+r))
  77         (l, r) -> error ("Can't add " ++ show l ++ " to " ++ show r)
  78 eval (Eq l r) = case (eval l, eval r) of
  79         (Lit (I l), Lit (I r)) -> B (l==r)
  80         (Lit (B l), Lit (B r)) -> B (l==r)
  81         (l, r) -> error ("Can't compare " ++ show l ++ " to " ++ show r)
  82 \end{lstHaskell}
  83
  84 \subsection{\texorpdfstring{\Glsxtrlongpl{GADT}}{Generalised algebraic data types}}
  85 Deep embedding has the advantage that it is easy to build and views are easy to add.
  86 On the downside, the expressions created with this language are not necessarily type-safe.
  87 In the given language it is possible to create an expression such as \haskellinline{LitI 4 `Plus` LitB True} that adds a boolean to an integer.
  88 This downside of the \gls{EDSL} technique can be overcome by using \glspl{GADT} instead of \glspl{ADT}~\citep{cheney_first-class_2003}.
  89 Even if the host language does not support \glspl{GADT}, it has been shown that they can be simulated using bimaps or projection pairs~\citep[\citesection{2.2}]{cheney_lightweight_2002}.
  90 \Cref{lst:exdeepgadt} shows the same language, but made type-safe with a \gls{GADT}.
  91 The data types are annotated with a type variable representing the type of the expression.
  92 This restriction makes the evaluator's implementation much more concise.
  93 For the printer, the implementation can even remain the same.
  94 Only the type signature needs an update.
  95
  96 \begin{lstHaskell}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
  97 data Expr a where
  98     Lit  :: Show a => a -> Expr a
  99     Plus :: Num a  => Expr a -> Expr a -> Expr a
 100     Eq   :: Eq a   => Expr a -> Expr a -> Expr Bool
 101
 102 eval :: Expr a -> a
 103 eval (Lit i)    = i
 104 eval (Plus l r) = eval l + eval r
 105 eval (Eq l r)   = eval l == eval r
 106
 107 print :: Expr a -> String
 108 ...
 109 \end{lstHaskell}
 110
 111 \section{Shallow embedding}\label{sec:shallow_embedding}
 112 In a shallowly embedded \gls{DSL} the language constructs are expressed as functions in the host language.
 113 An evaluator view for the example language then can be implemented as the code shown in \cref{lst:exshallow}.
 114
 115 \begin{lstHaskell}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
 116 data Eval a = Eval {runEval :: a}
 117
 118 lit :: a -> Eval a
 119 lit x = Eval x
 120
 121 plus :: Num a => Eval a -> Eval a -> Eval a
 122 plus x y = Eval (runEval x + runEval y)
 123
 124 eq :: Eq a => Eval a -> Eval a -> Eval Bool
 125 eq x y = Eval (runEval x == runEval y)
 126 \end{lstHaskell}
 127
 128 One of the advantages of shallowly embedding a language in a host language is its extendability.
 129 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
 130 For example, adding a new construct---such as subtraction---is done as follows:
 131
 132 \begin{lstHaskell}[label={lst:exshallowsubst},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 133 sub :: Num a => Eval a -> Eval a -> Eval a
 134 sub x y = Eval (runEval x - runEval y)
 135 \end{lstHaskell}
 136
 137 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \haskellinline{lit True `plus` lit 4} is rejected.
 138 Another advantage is the intimate link with the host language, allowing for a lot more linguistic reuse such as the support of implicit sharing \citep{krishnamurthi_linguistic_2001}.
 139
 140 The downside of this method is extending the language with views.
 141 Since the views are embedded, adding a view requires embedding combining the two views in some way.
 142 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time or to create a single interpretation that produces a fold function \citep{gibbons_folding_2014}.
 143
 144 \subsection{Tagless-final embedding}\label{ssec:tagless}
 145 By lifting the functions representing the \gls{DSL} terms to type classes, interpretations can be added.
 146 This technique is called tagless-final---or class-based shallow---embedding~\citep{carette_finally_2009}.
 147 The interface for the \gls{DSL} looks as follows:
 148
 149 \begin{lstHaskell}[label={lst:extagless},caption={A minimal tagless-final \gls{EDSL}.}]
 150 class DSL v where
 151         lit  :: Show a => a -> v a
 152         plus :: Num a => v a -> v a -> v a
 153         eq   :: Eq a => v a -> v a -> v Bool
 154 \end{lstHaskell}
 155
 156 An interpretation of this view is a data type that implements the type class.
 157
 158 \begin{lstHaskell}[label={lst:extaglessprint},caption={A pretty printer for the tagless-final \gls{EDSL}.}]
 159 data Print a = P {runPrint :: String}
 160 instance DSL Print where
 161         lit a = P (show a)
 162         plus x y = P ("(" ++ runPrint x ++ "+" ++ runPrint y ++ ")"
 163         eq x y = P ("(" ++ runPrint x ++ "==" ++ runPrint y ++ ")"
 164 \end{lstHaskell}
 165
 166 Adding a language construct---e.g.\ subtraction---is a easy as adding a type class and providing instances for interpretations.
 167
 168 \begin{lstHaskell}[label={lst:extaglesssubt},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 169 class Sub v where
 170         sub :: Num a => v a -> v a -> v a
 171
 172 instance Sub Print where
 173         sub x y = P ("(" ++ runPrint x ++ "-" ++ runPrint y ++ ")"
 174 \end{lstHaskell}
 175
 176 Adding an interpretation means adding a data type and providing instances for the language constructs.
 177
 178 \begin{lstHaskell}[label={lst:extaglesseval},caption={An evaluator interpretation of the minimal tagless-final \gls{EDSL}.}]
 179 data Eval a = Eval {runEval :: a}
 180
 181 instance DSL v where
 182         lit a = Eval a
 183         plus x y = Eval (runEval x + runEval y)
 184         eq x y = Eval (runEval x == runEval y)
 185
 186 instance Sub Eval where
 187         sub x y = Eval (runEval x - runEval y)
 188 \end{lstHaskell}
 189
 190 \input{subfilepostamble}
 191 \end{document}