stab at some chapters
[phd-thesis.git] / domain_specific_languages / dsl_techniques.tex
1 An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
2 \glspl{EDSL} can have one or more backends or views.
3 Commonly used views are pretty printing, compiling, simulating, verifying and proving the program.
4 Embedding techniques can be characterised by at least the following properties:
5 \begin{enumerate}
6 \item Extendibility of constructs
7 \item Extendibility of views
8 \item Type safety, the type of a term in the language is typed in the host language
9 \item Support for well typed variables
10 \item Intensional analyses
11 \item Support for partial views, not all views need to support all constructs
12 \end{enumerate}
13
14 There are two main techniques for creating embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding.
15 In the following sections both techniques are explained.
16 As an example, a simple language with integers, booleans and some arithmetic operators is used as a running example.
17
18 \section{Deep embedding}
19 A deep \gls{EDSL} is a language represented as data in the host language.
20 Views are functions that transform \emph{something} to the datatype or the other way around.
21 Definition~\ref{lst:exdeep} shows an implementation for the example \gls{DSL}.
22
23 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
24 data Value = I Int | B Bool
25 data Expr
26 = Lit Value
27 | Plus Expr Expr
28 | Eq Expr Expr
29 deriving Show
30 \end{lstHaskell}
31
32 Implementing a printer for the language is straightforward:
33
34 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
35 print :: Expr -> String
36 print (Lit i) = show i
37 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
38 print (Eq l r) = "(" ++ print l ++ "==" ++ print r ++ ")"
39 \end{lstHaskell}
40
41 Adding a construct, for example subtraction reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
42 I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type and \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view.
43 This limitation can be overcome by lifting the views to classes (See \cref{chp:classy deep embedding}).
44
45 Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
46 To store variables, it has an extra environment argument\todo{cite while}.
47 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
48
49 \begin{lstHaskell}[]
50 eval :: Expr -> Value
51 eval (Lit i) = i
52 eval (Plus l r) = case (eval l, eval r) of
53 (Lit (I l), Lit (I r)) -> I (l+r))
54 (l, r) -> error $ "Cannot add " ++ show l ++ " to " ++ show r
55 eval (Eq l r) = case (eval l, eval r) of
56 (Lit (I l), Lit (I r)) -> B (l==r)
57 (Lit (B l), Lit (B r)) -> B (l==r)
58 (l, r) -> error $ "Cannot compare " ++ show l ++ " to " ++ show r
59 \end{lstHaskell}
60
61 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
62
63 \begin{lstHaskell}[]
64 data Expr a where
65 Lit :: Show a => a -> Expr a
66 Plus :: Num a => Expr a -> Expr a -> Expr a
67 Eq :: Eq a => Expr a -> Expr a -> Expr Bool
68
69 eval :: Expr a -> a
70 eval (Lit i) = i
71 eval (Plus l r) = eval l + eval r
72 eval (Eq l r) = eval l == eval r
73 \end{lstHaskell}
74
75 Deep embedding has the advantage that it is easy to build and views are easy to add.
76 On the downside, the expressions created with this language are not necessarily type-safe.
77 In the given language it is possible to create an expression such as \cleaninline{Plus (LitI 4) (LitB True)} that adds a boolean to an integer.
78 Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views.
79
80 The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\cite{cheney_first-class_2003}.
81 Example~\ref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}.
82 \glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical.
83 However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\cite{cheney_first-class_2003}.
84 Unfortunately the lack of extendability remains a problem.
85 If a language construct is added, no compile time guarantee can be given that all views support it.
86
87 \begin{lstClean}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
88 :: Expr a
89 = Lit a -> Expr a
90 | E.e: Var String -> Expr e
91 | Plus (Expr Int) (Expr Int) -> Expr Int
92 | E.e: Eq (Expr e) (Expr e) -> Expr Bool & == e
93 \end{lstClean}
94
95 \subsection{Shallow embedding}
96 In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language.
97 An evaluator view for the example language then can be implemented as the code shown in Definition~\ref{lst:exshallow}.
98 Note that much of the internals of the language can be hidden using monads.
99
100 \begin{lstClean}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
101 :: Env :== (String -> Int)
102 :: DSL a :== (Env -> a)
103
104 Lit :: a -> DSL a
105 Lit x = \e->x
106
107 Var :: String -> DSL Int
108 Var i = \e->retrEnv e i
109
110 Plus :: (DSL Int) (DSL Int) -> DSL Int
111 Plus x y = \e->x e + y e
112
113 Eq :: (DSL a) (DSL a) -> DSL Bool | == a
114 Eq x y = \e->x e == y e
115 \end{lstClean}
116
117 The advantage of shallowly embedding a language in a host language is its extendability.
118 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
119 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \cleaninline{Lit True +. Lit 4} is rejected.
120
121 The downside of this method is extending the language with views.
122 It is nearly impossible to add views to a shallowly embedded language.
123 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time.
124 This will mean that every component will have to implement all views rendering it slow for multiple views and complex to implement.
125
126 \subsection{\Acrlongpl{GADT}}
127
128 \section{Shallow embedding}
129
130 \subsection{Tagless-final embedding}
131
132 \section{Comparison}