a843ccf08b4a18c65f825609312b8fb96afbcbb6
[phd-thesis.git] / domain-specific_languages / dsl_techniques.tex
1 \documentclass[../thesis.tex]{subfiles}
2
3 \begin{document}
4 \ifSubfilesClassLoaded{
5 \pagenumbering{arabic}
6 }{}
7
8 \mychapter{chp:dsl_embedding_techniques}{DSL embedding techniques}%
9 An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}.
10 \glspl{EDSL} can have one or more backends or views.
11 Commonly used views are pretty printing, compiling, simulating, verifying and proving the program.
12 Embedding techniques can be characterised by at least the following properties:
13 \begin{enumerate}
14 \item Extendibility of constructs
15 \item Extendibility of views
16 \item Type safety, the type of a term in the language is typed in the host language
17 \item Support for well typed variables
18 \item Intensional analyses
19 \item Support for partial views, not all views need to support all constructs
20 \end{enumerate}
21
22 There are two main techniques for creating embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding.
23 In the following sections both techniques are explained.
24 As an example, a simple language with integers, booleans and some arithmetic operators is used as a running example.
25
26 \section{Deep embedding}
27 A deep \gls{EDSL} is a language represented as data in the host language.
28 Views are functions that transform \emph{something} to the datatype or the other way around.
29 Definition~\ref{lst:exdeep} shows an implementation for the example \gls{DSL}.
30
31 \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}]
32 data Value = I Int | B Bool
33 data Expr
34 = Lit Value
35 | Plus Expr Expr
36 | Eq Expr Expr
37 deriving Show
38 \end{lstHaskell}
39
40 Implementing a printer for the language is straightforward:
41
42 \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}]
43 print :: Expr -> String
44 print (Lit i) = show i
45 print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")"
46 print (Eq l r) = "(" ++ print l ++ "==" ++ print r ++ ")"
47 \end{lstHaskell}
48
49 Adding a construct, for example subtraction reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views.
50 I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type and \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view.
51 This limitation can be overcome by lifting the views to classes (See \cref{chp:classy_deep_embedding}).
52
53 Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type.
54 To store variables, it has an extra environment argument\todo{cite while}.
55 Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code.
56
57 \begin{lstHaskell}[]
58 eval :: Expr -> Value
59 eval (Lit i) = i
60 eval (Plus l r) = case (eval l, eval r) of
61 (Lit (I l), Lit (I r)) -> I (l+r))
62 (l, r) -> error ("Can't add " ++ show l ++ " to " ++ show r)
63 eval (Eq l r) = case (eval l, eval r) of
64 (Lit (I l), Lit (I r)) -> B (l==r)
65 (Lit (B l), Lit (B r)) -> B (l==r)
66 (l, r) -> error ("Can't compare " ++ show l ++ " to " ++ show r)
67 \end{lstHaskell}
68
69 Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator.
70
71 \begin{lstHaskell}[]
72 data Expr a where
73 Lit :: Show a => a -> Expr a
74 Plus :: Num a => Expr a -> Expr a -> Expr a
75 Eq :: Eq a => Expr a -> Expr a -> Expr Bool
76
77 eval :: Expr a -> a
78 eval (Lit i) = i
79 eval (Plus l r) = eval l + eval r
80 eval (Eq l r) = eval l == eval r
81 \end{lstHaskell}
82
83 Deep embedding has the advantage that it is easy to build and views are easy to add.
84 On the downside, the expressions created with this language are not necessarily type-safe.
85 In the given language it is possible to create an expression such as \cleaninline{Plus (LitI 4) (LitB True)} that adds a boolean to an integer.
86 Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views.
87
88 The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\cite{cheney_first-class_2003}.
89 Example~\ref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}.
90 \glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}).
91 However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\cite{cheney_first-class_2003}.
92 Unfortunately the lack of extendability remains a problem.
93 If a language construct is added, no compile time guarantee can be given that all views support it.
94
95 \begin{lstClean}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}]
96 :: Expr a
97 = Lit a -> Expr a
98 | E.e: Var String -> Expr e
99 | Plus (Expr Int) (Expr Int) -> Expr Int
100 | E.e: Eq (Expr e) (Expr e) -> Expr Bool & == e
101 \end{lstClean}
102
103 \subsection{Shallow embedding}
104 In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language.
105 An evaluator view for the example language then can be implemented as the code shown in Definition~\ref{lst:exshallow}.
106 Note that much of the internals of the language can be hidden using monads.
107
108 \begin{lstClean}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}]
109 :: Env :== (String -> Int)
110 :: DSL a :== (Env -> a)
111
112 Lit :: a -> DSL a
113 Lit x = \e->x
114
115 Var :: String -> DSL Int
116 Var i = \e->retrEnv e i
117
118 Plus :: (DSL Int) (DSL Int) -> DSL Int
119 Plus x y = \e->x e + y e
120
121 Eq :: (DSL a) (DSL a) -> DSL Bool | == a
122 Eq x y = \e->x e == y e
123 \end{lstClean}
124
125 The advantage of shallowly embedding a language in a host language is its extendability.
126 It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used.
127 Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \cleaninline{Lit True +. Lit 4} is rejected.
128
129 The downside of this method is extending the language with views.
130 It is nearly impossible to add views to a shallowly embedded language.
131 The only way of achieving this is by reimplementing all functions so that they run all backends at the same time.
132 This will mean that every component will have to implement all views rendering it slow for multiple views and complex to implement.
133
134 \subsection{Generalised algebraic data types}
135
136 \section{Shallow embedding}
137
138 \subsection{Tagless-final embedding}\label{ssec:tagless}
139
140 \section{Comparison}
141
142 \input{subfilepostamble}
143 \end{document}