directory structure
[phd-thesis.git] / dsl / dsl_techniques.tex
1 \documentclass[../thesis.tex]{subfiles}
2
3 \begin{document}
4 \ifSubfilesClassLoaded{
5 \pagenumbering{arabic}
6 }{}
7
8 \chapter{DSL embedding techniques}%
9 \label{chp:dsl_embedding_techniques}%
10 An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
11 \glspl{EDSL} can have one or more backends or views.
12 Commonly used views are pretty printing, compiling, simulating, verifying and proving the program.
13 Embedding techniques can be characterised by at least the following properties:
14 \begin{enumerate}
15 \item Extendibility of constructs
16 \item Extendibility of views
17 \item Type safety, the type of a term in the language is typed in the host language
18 \item Support for well typed variables
19 \item Intensional analyses
20 \item Support for partial views, not all views need to support all constructs
21 \end{enumerate}
22
23 There are two main techniques for creating embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding.
24 In the following sections both techniques are explained.
25 As an example, a simple language with integers, booleans and some arithmetic operators is used as a running example.
26
27 \section{Deep embedding}
28 A deep \gls{EDSL} is a language represented as data in the host language.
29 Views are functions that transform \emph{something} to the datatype or the other way around.
30 Definition~\ref{lst:exdeep} shows an implementation for the example \gls{DSL}.
31
32 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
33 data Value = I Int | B Bool
34 data Expr
35 = Lit Value
36 | Plus Expr Expr
37 | Eq Expr Expr
38 deriving Show
39 \end{lstHaskell}
40
41 Implementing a printer for the language is straightforward:
42
43 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
44 print :: Expr -> String
45 print (Lit i) = show i
46 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
47 print (Eq l r) = "(" ++ print l ++ "==" ++ print r ++ ")"
48 \end{lstHaskell}
49
50 Adding a construct, for example subtraction reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
51 I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type and \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view.
52 This limitation can be overcome by lifting the views to classes (See \cref{chp:classy_deep_embedding}).
53
54 Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
55 To store variables, it has an extra environment argument\todo{cite while}.
56 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
57
58 \begin{lstHaskell}[]
59 eval :: Expr -> Value
60 eval (Lit i) = i
61 eval (Plus l r) = case (eval l, eval r) of
62 (Lit (I l), Lit (I r)) -> I (l+r))
63 (l, r) -> error ("Can't add " ++ show l ++ " to " ++ show r)
64 eval (Eq l r) = case (eval l, eval r) of
65 (Lit (I l), Lit (I r)) -> B (l==r)
66 (Lit (B l), Lit (B r)) -> B (l==r)
67 (l, r) -> error ("Can't compare " ++ show l ++ " to " ++ show r)
68 \end{lstHaskell}
69
70 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
71
72 \begin{lstHaskell}[]
73 data Expr a where
74 Lit :: Show a => a -> Expr a
75 Plus :: Num a => Expr a -> Expr a -> Expr a
76 Eq :: Eq a => Expr a -> Expr a -> Expr Bool
77
78 eval :: Expr a -> a
79 eval (Lit i) = i
80 eval (Plus l r) = eval l + eval r
81 eval (Eq l r) = eval l == eval r
82 \end{lstHaskell}
83
84 Deep embedding has the advantage that it is easy to build and views are easy to add.
85 On the downside, the expressions created with this language are not necessarily type-safe.
86 In the given language it is possible to create an expression such as \cleaninline{Plus (LitI 4) (LitB True)} that adds a boolean to an integer.
87 Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views.
88
89 The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\citep{cheney_first-class_2003}.
90 Example~\ref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}.
91 \glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}).
92 However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\citep{cheney_first-class_2003}.
93 Unfortunately the lack of extendability remains a problem.
94 If a language construct is added, no compile time guarantee can be given that all views support it.
95
96 \begin{lstClean}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
97 :: Expr a
98 = Lit a -> Expr a
99 | E.e: Var String -> Expr e
100 | Plus (Expr Int) (Expr Int) -> Expr Int
101 | E.e: Eq (Expr e) (Expr e) -> Expr Bool & == e
102 \end{lstClean}
103
104 \section{Shallow embedding}
105 In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language.
106 An evaluator view for the example language then can be implemented as the code shown in Definition~\ref{lst:exshallow}.
107 Note that much of the internals of the language can be hidden using monads.
108
109 \begin{lstClean}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
110 :: Env :== (String -> Int)
111 :: DSL a :== (Env -> a)
112
113 Lit :: a -> DSL a
114 Lit x = \e->x
115
116 Var :: String -> DSL Int
117 Var i = \e->retrEnv e i
118
119 Plus :: (DSL Int) (DSL Int) -> DSL Int
120 Plus x y = \e->x e + y e
121
122 Eq :: (DSL a) (DSL a) -> DSL Bool | == a
123 Eq x y = \e->x e == y e
124 \end{lstClean}
125
126 The advantage of shallowly embedding a language in a host language is its extendability.
127 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
128 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \cleaninline{Lit True +. Lit 4} is rejected.
129
130 The downside of this method is extending the language with views.
131 It is nearly impossible to add views to a shallowly embedded language.
132 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time.
133 This will mean that every component will have to implement all views rendering it slow for multiple views and complex to implement.
134
135 \subsection{Tagless-final embedding}\label{ssec:tagless}
136
137 \section{Comparison}
138 \todo{cite compositional DSLs}
139
140 \input{subfilepostamble}
141 \end{document}