dsl/dsl_techniques.tex

   1 \documentclass[../thesis.tex]{subfiles}
   2
   3 \begin{document}
   4 \ifSubfilesClassLoaded{
   5         \pagenumbering{arabic}
   6 }{}
   7
   8 \chapter{\texorpdfstring{\Acrshort{DSL}}{DSL} embedding techniques}%
   9 \label{chp:dsl_embedding_techniques}%
  10 An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
  11 Properties such as referential transparency, minimal syntax, powerful type systems and rich data types make \gls{FP} languages excellent candidates for hosting \glspl{EDSL}.
  12
  13 There are two flavours of \gls{DSL} embedding: deep- and shallow embedding~\citep{boulton_experience_1992}.
  14 Shallow embedding---also called tagless embedding---models language constructs as functions in the host language.
  15 As a result, adding new language constructs---extra functions---is easy.
  16 However, the interpretation of the language is embedded in these functions, making it troublesome to add semantics since it requires updating all existing language constructs.
  17
  18 In contrast to shallow embedding, deep embedding---also called tagged embedding---models terms in the language as data types.
  19 Interpretations are functions over these data types.
  20
  21 Consequently, adding new semantics, i.e.\ novel functions, is straightforward.
  22 It can be stated that the language constructs are embedded in the functions that form a semantics.
  23 If one wants to add a language construct, all semantics functions must be revisited and revised to avoid ending up with partial functions.
  24
  25 This juxtaposition has been known for many years~\citep{reynolds_user-defined_1978} and discussed by many others~\citep{krishnamurthi_synthesizing_1998} but most famously dubbed the \emph{expression problem} by Wadler~\citep{wadler_expression_1998}:
  26
  27 \begin{quote}
  28         The \emph{expression problem} is a new name for an old problem.
  29         The goal is to define a data type by cases, where one can add new cases to the data type and new functions over the data type, without recompiling existing code, and while retaining static type safety (e.g., no casts).
  30 \end{quote}
  31
  32 Terms in an \glspl{EDSL} can have multiple interpretations\footnote{Interpretations are also called backends or views}, i.e.\ a term in the \gls{DSL} is just an interface.
  33 Commonly used intepretations are printing, compiling, simulating, optimising, verifying, proving the program\etc.
  34 There are two main flavours of embedding \glspl{DSL}.
  35 Deep embedding---also called shallow--- models terms in the language as data types, interpretations are functions over these terms.
  36 Shallow embedding---also called tagless---models terms in the language as functions, interpretations are embedded in these functions.
  37
  38 Most importantly, the two flavours differ on two axes: extensibility of language constructs and extensibility of interpretations.
  39 \todo{elaborate}
  40
  41 \Cref{sec:deep_embedding} shows the basics of deep embedding.
  42 \Cref{sec:shallow_embedding} shows the basics of shallow embedding including tagless embedding.
  43 \Cref{sec:compare_embedding} compares the embedding technique.
  44
  45 In the following sections the basics of both techniques are explained.
  46 A simple language with integers, booleans and some arithmetic operators is used as a running example.
  47
  48 \section{Deep embedding}\label{sec:deep_embedding}
  49 In a deeply embedded \gls{DSL}, the language terms are represented as data type{(s)} in the host language.
  50 Therefore, interpretations of the terms are functions that operate on these data types.
  51 \Cref{lst:exdeep} shows an implementation for the example \gls{DSL}.
  52
  53 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
  54 data Value = I Int | B Bool
  55 data Expr
  56         = Lit  Value
  57         | Plus Expr Expr
  58         | Eq   Expr Expr
  59   deriving Show
  60 \end{lstHaskell}
  61
  62 Implementing a printer for the language is straightforward, we just define a function that transforms the term to a string.
  63
  64 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
  65 print :: Expr -> String
  66 print (Lit i)    = show i
  67 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
  68 print (Eq l r)   = "(" ++ print l ++ "==" ++ print r ++ ")"
  69 \end{lstHaskell}
  70
  71 Adding a construct---for example subtraction---reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
  72 I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type.
  73 Furthermore, we need to add \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view in order to not end up with a partial function.
  74 This limitation can be overcome by lifting the views to classes (See \cref{chp:classy_deep_embedding}).
  75
  76 Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
  77 To store variables, it has an extra environment argument.
  78 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
  79 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
  80
  81 \begin{lstHaskell}[caption={An evaluator for the deeply embedded expression \gls{DSL}.}]
  82 eval :: Expr -> Value
  83 eval (Lit i)    = i
  84 eval (Plus l r) = case (eval l, eval r) of
  85         (Lit (I l), Lit (I r)) -> I (l+r))
  86         (l, r)       -> error ("Can't add " ++ show l ++ " to " ++ show r)
  87 eval (Eq l r) = case (eval l, eval r) of
  88         (Lit (I l), Lit (I r)) -> B (l==r)
  89         (Lit (B l), Lit (B r)) -> B (l==r)
  90         (l, r)       -> error ("Can't compare " ++ show l ++ " to " ++ show r)
  91 \end{lstHaskell}
  92
  93 \subsection{\texorpdfstring{\Acrlongpl{GADT}}{Generalised algebraic data types}}
  94 Deep embedding has the advantage that it is easy to build and views are easy to add.
  95 On the downside, the expressions created with this language are not necessarily type-safe.
  96 In the given language it is possible to create an expression such as \haskellinline{LitI 4 `Plus` LitB True} that adds a boolean to an integer.
  97 Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views.
  98
  99 The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\citep{cheney_first-class_2003}.
 100 \Cref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}.
 101 \glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}).
 102 However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\citep[Sec.~2.2]{cheney_lightweight_2002}.
 103 Unfortunately the lack of extendability remains a problem.
 104 If a language construct is added, no compile time guarantee can be given that all views support it.
 105
 106 \begin{lstHaskell}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
 107 data Expr a where
 108     Lit  :: Show a => a -> Expr a
 109     Plus :: Num a  => Expr a -> Expr a -> Expr a
 110     Eq   :: Eq a   => Expr a -> Expr a -> Expr Bool
 111
 112 eval :: Expr a -> a
 113 eval (Lit i)    = i
 114 eval (Plus l r) = eval l + eval r
 115 eval (Eq l r)   = eval l == eval r
 116 \end{lstHaskell}
 117
 118 \section{Shallow embedding}\label{sec:shallow_embedding}
 119 In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language.
 120 An evaluator view for the example language then can be implemented as the code shown in \cref{lst:exshallow}.
 121 Note that the internals of the language could have been hidden using a reader monad.
 122
 123 \begin{lstHaskell}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
 124 type Env   = String -> Int
 125 type DSL a = Env -> a
 126
 127 lit :: a -> DSL a
 128 lit x = \e->x
 129
 130 var :: String -> DSL Int
 131 var i = \e->retrEnv e i
 132
 133 plus :: DSL Int -> DSL Int -> DSL Int
 134 plus x y = \e->x e + y e
 135
 136 eq :: Eq a => DSL a -> DSL a -> DSL Bool
 137 eq x y = \e->x e == y e
 138 \end{lstHaskell}
 139
 140 One of the advantages of shallowly embedding a language in a host language is its extendability.
 141 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
 142 For example, adding a new construct---such as subtraction---is done as follows:
 143
 144 \begin{lstHaskell}[label={lst:exshallowsubst},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 145 sub :: DSL Int -> DSL Int -> DSL Int
 146 sub x y = \e->x e - y e
 147 \end{lstHaskell}
 148
 149 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \haskellinline{lit True `plus` lit 4} is rejected.
 150 Another advantage is the intimate link with the host language, allowing for a lot more linguistic reuse such as the support of implicit sharing~\cite{krishnamurthi_linguistic_2001}.
 151
 152 The downside of this method is extending the language with views.
 153 It is nearly impossible to add views to a shallowly embedded language.
 154 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time or to create a single interpretation that produces a fold function~\citep{gibbons_folding_2014}.
 155
 156 \subsection{Tagless-final embedding}\label{ssec:tagless}
 157 By lifting the functions representing the \gls{DSL} terms to type classes, interpretations can be added.
 158 This technique is called tagless-final---or class-based shallow---embedding.
 159 The interface for the \gls{DSL} looks as follows:
 160
 161 \begin{lstHaskell}[label={lst:extagless},caption={A minimal tagless-final \gls{EDSL}.}]
 162 class DSL v where
 163         lit :: a -> v a
 164         var :: String -> v a
 165         plus :: v Int -> v Int -> v Int
 166         eq :: Eq a => v a -> v a -> v Bool
 167 \end{lstHaskell}
 168
 169 An interpretation of this view is a data type that implements the type class.
 170
 171 \begin{lstHaskell}[label={lst:extagless},caption={A minimal tagless-final \gls{EDSL}.}]
 172 data Print a = P {runPrint :: String}
 173 instance DSL Print where
 174         lit a = P (show a)
 175         var i = P i
 176         plus x y = P ("(" ++ runPrint x ++ "+" ++ runPrint y ++ ")"
 177         eq x y = P ("(" ++ runPrint x ++ "==" ++ runPrint y ++ ")"
 178 \end{lstHaskell}
 179
 180 Adding a language construct---e.g.\ subtraction---is a easy as adding a type class and providing instances for interpretations.
 181
 182 \begin{lstHaskell}[label={lst:extaglesssubt},caption={Adding subtraction to the shallow \gls{EDSL}.}]
 183 class Sub v where
 184         sub :: v Int -> v Int -> v Int
 185
 186 instance Sub Print where
 187         sub x y = P ("(" ++ runPrint x ++ "-" ++ runPrint y ++ ")"
 188 \end{lstHaskell}
 189
 190 Adding an interpretation means adding a data type and providing instances for the language constructs.
 191
 192 \begin{lstHaskell}[label={lst:extagless},caption={An evaluator interpretation of the minimal tagless-final \gls{EDSL}.}]
 193 data Eval a = Eval {runEval :: Env -> a}
 194
 195 instance DSL v where
 196         lit a = Eval (\_->a)
 197         var i = Eval (\e->retrEnv e i)
 198         plus x y = Eval (\e->runEval x e + runEval y e)
 199         eq x y = Eval (\e->runEval x e == runEval y e)
 200
 201 instance Sub Eval where
 202         sub x y = Eval (\e->runEval x e - runEval y e)
 203 \end{lstHaskell}
 204
 205 \section{Comparison}\label{sec:compare_embedding}
 206 Both flavours have their strengths and weaknesses and both flavours can be improved in order to mitigate (some of the) downsides.
 207
 208 \begin{table}[ht]
 209         \begin{threeparttable}[b]
 210                 \caption{Comparison of embedding techniques, adapted from \citet[Sec.~3.6]{sun_compositional_2022}}%
 211                 \label{tbl:dsl_comparison}
 212                 \begin{tabular}{lllllll}
 213                         \toprule
 214                                                                         & Shallow & Deep  & Hybrid          & Poly           & Comp. & Classy\\
 215                         \midrule
 216                         Transcoding free        & yes     & yes   & no              & yes            & yes            & yes\\
 217                         Linguistic reuse        & yes     & no    & partly\tnote{1} & yes            & yes            & no?\\
 218                         Extend constructs       & yes     & no    & partly\tnote{1} & yes            & yes            & yes\\
 219                         Extend interpretations  & no      & yes   & yes             & yes            & yes            & yes\\
 220                         Transformations         & no      & yes   & yes             & maybe\tnote{2} & maybe\tnote{2} & yes\\
 221                         Modular dependencies    & no      & maybe & maybe           & yes            & yes            & yes\\
 222                         Nested pattern matching & no      & yes   & yes             & no             & maybe          & maybe\tnote{3}\\
 223                         Type safe               & yes     & maybe & no              & yes            & yes            & yes\\
 224                         \bottomrule
 225                 \end{tabular}
 226                 \begin{tablenotes}
 227                         \item [1] Only in the shallowly embedded part.
 228                         \item [2] Transformations require some ingenuity and are sometimes awkward to write.
 229                         \item [3] It requires some---safe---form of dynamic typing.
 230                 \end{tablenotes}
 231         \end{threeparttable}
 232 \end{table}
 233
 234 \input{subfilepostamble}
 235 \end{document}