dsl/dsl_techniques.tex

   1 \documentclass[../thesis.tex]{subfiles}
   2
   3 \begin{document}
   4 \ifSubfilesClassLoaded{
   5         \pagenumbering{arabic}
   6 }{}
   7
   8 \chapter{\texorpdfstring{\Acrshort{DSL}}{DSL} embedding techniques}%
   9 \label{chp:dsl_embedding_techniques}%
  10 An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
  11 Properties such as referential transparency, minimal syntax and \glspl{ADT} make \gls{FP} languages excellent candidates for hosting \glspl{EDSL}.
  12 Terms in an \glspl{EDSL} can have multiple interpretations---also called backends or views---i.e.\ preferably, a term in the \gls{DSL} is just an interface.
  13 Commonly used intepretations are printing, compiling, simulating, optimising, verifying, proving the program\etc.
  14 There are two main flavours of embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding.
  15
  16 Most importantly, the two flavours differ on two axes: extensibility of language constructs and extensibility of interpretations.
  17 \todo{elaborate}
  18
  19 \Cref{sec:deep_embedding} shows the basics of deep embedding.
  20 \Cref{sec:shallow_embedding} shows the basics of shallow embedding including tagless embedding.
  21 \Cref{sec:compare_embedding} compares the embedding technique.
  22
  23 In the following sections the basics of both techniques are explained.
  24 A simple language with integers, booleans and some arithmetic operators is used as a running example.
  25
  26 \section{Deep embedding}\label{sec:deep_embedding}
  27 In a deeply embedded \gls{DSL}, the language terms are represented as data type{(s)} in the host language.
  28 Therefore, interpretations of the terms are functions that operate on these data types.
  29 \Cref{lst:exdeep} shows an implementation for the example \gls{DSL}.
  30
  31 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
  32 data Value = I Int | B Bool
  33 data Expr
  34         = Lit  Value
  35         | Plus Expr Expr
  36         | Eq   Expr Expr
  37   deriving Show
  38 \end{lstHaskell}
  39
  40 Implementing a printer for the language is straightforward, we just define a function that transforms the term to a string.
  41
  42 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
  43 print :: Expr -> String
  44 print (Lit i)    = show i
  45 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
  46 print (Eq l r)   = "(" ++ print l ++ "==" ++ print r ++ ")"
  47 \end{lstHaskell}
  48
  49 Adding a construct---for example subtraction---reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
  50 I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type.
  51 Furthermore, we need to add \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view in order to not end up with a partial function.
  52 This limitation can be overcome by lifting the views to classes (See \cref{chp:classy_deep_embedding}).
  53
  54 Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
  55 To store variables, it has an extra environment argument.
  56 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
  57 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
  58
  59 \begin{lstHaskell}[caption={An evaluator for the deeply embedded expression \gls{DSL}.}]
  60 eval :: Expr -> Value
  61 eval (Lit i)    = i
  62 eval (Plus l r) = case (eval l, eval r) of
  63         (Lit (I l), Lit (I r)) -> I (l+r))
  64         (l, r)       -> error ("Can't add " ++ show l ++ " to " ++ show r)
  65 eval (Eq l r) = case (eval l, eval r) of
  66         (Lit (I l), Lit (I r)) -> B (l==r)
  67         (Lit (B l), Lit (B r)) -> B (l==r)
  68         (l, r)       -> error ("Can't compare " ++ show l ++ " to " ++ show r)
  69 \end{lstHaskell}
  70
  71 \subsection{\texorpdfstring{\Acrlongpl{GADT}}{Generalised algebraic data types}}
  72 Deep embedding has the advantage that it is easy to build and views are easy to add.
  73 On the downside, the expressions created with this language are not necessarily type-safe.
  74 In the given language it is possible to create an expression such as \haskellinline{LitI 4 `Plus` LitB True} that adds a boolean to an integer.
  75 Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views.
  76
  77 The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\citep{cheney_first-class_2003}.
  78 \Cref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}.
  79 \glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}).
  80 However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\citep[Sec.~2.2]{cheney_lightweight_2002}.
  81 Unfortunately the lack of extendability remains a problem.
  82 If a language construct is added, no compile time guarantee can be given that all views support it.
  83
  84 \begin{lstHaskell}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
  85 data Expr a where
  86     Lit  :: Show a => a -> Expr a
  87     Plus :: Num a  => Expr a -> Expr a -> Expr a
  88     Eq   :: Eq a   => Expr a -> Expr a -> Expr Bool
  89
  90 eval :: Expr a -> a
  91 eval (Lit i)    = i
  92 eval (Plus l r) = eval l + eval r
  93 eval (Eq l r)   = eval l == eval r
  94 \end{lstHaskell}
  95
  96 \section{Shallow embedding}\label{sec:shallow_embedding}
  97 In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language.
  98 An evaluator view for the example language then can be implemented as the code shown in \cref{lst:exshallow}.
  99 Note that the internals of the language could have been hidden using a reader monad.
 100
 101 \begin{lstHaskell}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
 102 type Env   = String -> Int
 103 type DSL a = Env -> a
 104
 105 lit :: a -> DSL a
 106 lit x = \e->x
 107
 108 var :: String -> DSL Int
 109 var i = \e->retrEnv e i
 110
 111 plus :: DSL Int -> DSL Int -> DSL Int
 112 plus x y = \e->x e + y e
 113
 114 eq :: Eq a => DSL a -> DSL a -> DSL Bool
 115 eq x y = \e->x e == y e
 116 \end{lstHaskell}
 117
 118 One of the advantages of shallowly embedding a language in a host language is its extendability.
 119 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
 120 For example, adding a new construct---such as subtraction---is done as follows:
 121
 122 \begin{lstHaskell}[label={lst:exshallowsubst},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 123 sub :: DSL Int -> DSL Int -> DSL Int
 124 sub x y = \e->x e - y e
 125 \end{lstHaskell}
 126
 127 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \haskellinline{lit True `plus` lit 4} is rejected.
 128 Another advantage is the intimate link with the host language, allowing for a lot more linguistic reuse such as the support of implicit sharing~\cite{krishnamurthi_linguistic_2001}.
 129
 130 The downside of this method is extending the language with views.
 131 It is nearly impossible to add views to a shallowly embedded language.
 132 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time or to create a single interpretation that produces a fold function~\citep{gibbons_folding_2014}.
 133
 134 \subsection{Tagless-final embedding}\label{ssec:tagless}
 135 By lifting the functions representing the \gls{DSL} terms to type classes, interpretations can be added.
 136 This technique is called tagless-final---or class-based shallow---embedding.
 137 The interface for the \gls{DSL} looks as follows:
 138
 139 \begin{lstHaskell}[label={lst:extagless},caption={A minimal tagless-final \gls{EDSL}.}]
 140 class DSL v where
 141         lit :: a -> v a
 142         var :: String -> v a
 143         plus :: v Int -> v Int -> v Int
 144         eq :: Eq a => v a -> v a -> v Bool
 145 \end{lstHaskell}
 146
 147 An interpretation of this view is a data type that implements the type class.
 148
 149 \begin{lstHaskell}[label={lst:extagless},caption={A minimal tagless-final \gls{EDSL}.}]
 150 data Print a = P {runPrint :: String}
 151 instance DSL Print where
 152         lit a = P (show a)
 153         var i = P i
 154         plus x y = P ("(" ++ runPrint x ++ "+" ++ runPrint y ++ ")"
 155         eq x y = P ("(" ++ runPrint x ++ "==" ++ runPrint y ++ ")"
 156 \end{lstHaskell}
 157
 158 Adding a language construct---e.g.\ subtraction---is a easy as adding a type class and providing instances for interpretations.
 159
 160 \begin{lstHaskell}[label={lst:extaglesssubt},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 161 class Sub v where
 162         sub :: v Int -> v Int -> v Int
 163
 164 instance Sub Print where
 165         sub x y = P ("(" ++ runPrint x ++ "-" ++ runPrint y ++ ")"
 166 \end{lstHaskell}
 167
 168 Adding an interpretation means adding a data type and providing instances for the language constructs.
 169
 170 \begin{lstHaskell}[label={lst:extagless},caption={An evaluator interpretation of the minimal tagless-final \gls{EDSL}.}]
 171 data Eval a = Eval {runEval :: Env -> a}
 172
 173 instance DSL v where
 174         lit a = Eval (\_->a)
 175         var i = Eval (\e->retrEnv e i)
 176         plus x y = Eval (\e->runEval x e + runEval y e)
 177         eq x y = Eval (\e->runEval x e == runEval y e)
 178
 179 instance Sub Eval where
 180         sub x y = Eval (\e->runEval x e - runEval y e)
 181 \end{lstHaskell}
 182
 183 \section{Comparison}\label{sec:compare_embedding}
 184 Both flavours have their strengths and weaknesses and both flavours can be improved in order to mitigate (some of the) downsides.
 185
 186 \begin{table}[ht]
 187         \begin{threeparttable}[b]
 188                 \caption{Comparison of embedding techniques, adapted from \citet[Sec.~3.6]{sun_compositional_2022}}%
 189                 \label{tbl:dsl_comparison}
 190                 \begin{tabular}{lllllll}
 191                         \toprule
 192                                                                         & Shallow & Deep  & Hybrid          & Poly           & Comp. & Classy\\
 193                         \midrule
 194                         Transcoding free        & yes     & yes   & no              & yes            & yes            & yes\\
 195                         Linguistic reuse        & yes     & no    & partly\tnote{1} & yes            & yes            & no?\\
 196                         Extend constructs       & yes     & no    & partly\tnote{1} & yes            & yes            & yes\\
 197                         Extend interpretations  & no      & yes   & yes             & yes            & yes            & yes\\
 198                         Transformations         & no      & yes   & yes             & maybe\tnote{2} & maybe\tnote{2} & yes\\
 199                         Modular dependencies    & no      & maybe & maybe           & yes            & yes            & yes\\
 200                         Nested pattern matching & no      & yes   & yes             & no             & maybe          & maybe\tnote{3}\\
 201                         Type safe               & yes     & maybe & no              & yes            & yes            & yes\\
 202                         \bottomrule
 203                 \end{tabular}
 204                 \begin{tablenotes}
 205                         \item [1] Only in the shallowly embedded part.
 206                         \item [2] Transformations require some ingenuity and are sometimes awkward to write.
 207                         \item [3] It requires some---safe---form of dynamic typing.
 208                 \end{tablenotes}
 209         \end{threeparttable}
 210 \end{table}
 211
 212 \input{subfilepostamble}
 213 \end{document}