From: Mart Lubbers Date: Tue, 11 Oct 2022 13:56:26 +0000 (+0200) Subject: updates X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=1544c8074a9b894c6e78beb0911aff25a295352c;p=phd-thesis.git updates --- diff --git a/appx/clean_for_haskell_programmers.tex b/appx/clean_for_haskell_programmers.tex index cc7873d..36d2653 100644 --- a/appx/clean_for_haskell_programmers.tex +++ b/appx/clean_for_haskell_programmers.tex @@ -119,6 +119,7 @@ To illustrate this, \cref{lst:gadt_clean} shows an example \gls{GADT} that would \clearpage \section{Syntax} +\lstset{basicstyle=\tt\footnotesize} \begin{longtable}{p{.45\linewidth}p{.5\linewidth}} \caption[]{Syntactical differences between \gls{CLEAN} and \gls{HASKELL}.}% \label{tbl:syn_clean_haskell}\\ @@ -226,9 +227,9 @@ To illustrate this, \cref{lst:gadt_clean} shows an example \gls{GADT} that would \multicolumn{2}{c}{Record expressions}\\ \midrule \cleaninline{:: R = \{ f :: t \}} & \haskellinline{data R = R \{ f :: t \}}\\ - \cleaninline{r = \{ f = e \}} & \haskellinline{r = R \{e\}}\\ + \cleaninline{r = \{ f = e \}} & \haskellinline{r = R \{ f = e \}}\\ \cleaninline{r.f} & \haskellinline{f r}\\ - & \haskellinline{r.f}\requiresGHCmod{Requires \gls{GHC} version 9.2.0 or higher}{OverloadedRecordDot}\\ + & \quad or \haskellinline{r.f}\requiresGHCmod[Requires \gls{GHC} version 9.2.0 or higher]{OverloadedRecordDot}\\ \cleaninline{r!f}\footnote{This operator allows for field selection from unique records.} & \haskellinline{(\\v->(f v, v)) r}\\ \cleaninline{\{r \& f = e \}} & \haskellinline{r \{ f = e \}}\\ diff --git a/dsl/class_deep_embedding.tex b/dsl/class_deep_embedding.tex index bd3b59a..d91f0b4 100644 --- a/dsl/class_deep_embedding.tex +++ b/dsl/class_deep_embedding.tex @@ -61,8 +61,9 @@ In deep embedding, terms in the language are represented by data in the host lan Hence, defining the constructs is as simple as creating the following algebraic data type\footnote{All data types and functions are subscripted to indicate the evolution.}. \begin{lstHaskellLhstex} -data Expr_0 = Lit_0 Int - | Add_0 Expr_0 Expr_0 +data Expr_0 + = Lit_0 Int + | Add_0 Expr_0 Expr_0 \end{lstHaskellLhstex} Semantics are defined as functions on the \haskelllhstexinline{Expr_0} data type. @@ -89,9 +90,10 @@ Traditionally, extending the language is achieved by adding a case to the \haske So, adding subtraction to the language results in the following revised data type. \begin{lstHaskellLhstex} -data Expr_0 = Lit_0 Int - | Add_0 Expr_0 Expr_0 - | Sub_0 Expr_0 Expr_0 +data Expr_0 + = Lit_0 Int + | Add_0 Expr_0 Expr_0 + | Sub_0 Expr_0 Expr_0 \end{lstHaskellLhstex} Extending the DSL with language constructs exposes the Achilles' heel of deep embedding. @@ -133,8 +135,8 @@ For our language this results in the following type class definition. \begin{lstHaskellLhstex} class Expr_t s where - lit_t :: Int -> s - add_t :: s -> s -> s + lit_t :: Int -> s + add_t :: s -> s -> s \end{lstHaskellLhstex} Semantics become data types\footnotemark{} implementing these type classes, resulting in the following instance for the evaluator. @@ -149,18 +151,18 @@ Semantics become data types\footnotemark{} implementing these type classes, resu newtype Eval_t = E_t Int instance Expr_t Eval_t where - lit_t v = E_t v - add_t (E_t e1) (E_t e2) = E_t (e1 + e2) + lit_t v = E_t v + add_t (E_t e1) (E_t e2) = E_t (e1 + e2) \end{lstHaskellLhstex} Adding constructs---e.g.\ subtraction---just results in an extra type class and corresponding instances. \begin{lstHaskellLhstex} class Sub_t s where - sub_t :: s -> s -> s + sub_t :: s -> s -> s instance Sub_t Eval_t where - sub_t (E_t e1) (E_t e2) = E_t (e1 - e2) + sub_t (E_t e1) (E_t e2) = E_t (e1 - e2) \end{lstHaskellLhstex} Finally, adding semantics such as a printer over the language is achieved by providing a data type representing the semantics accompanied by instances for the language constructs. @@ -169,11 +171,11 @@ Finally, adding semantics such as a printer over the language is achieved by pro newtype Printer_t = P_t String instance Expr_t Printer_t where - lit_t i = P_t (show i) - add_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "+" ++ e2 ++ ")") + lit_t i = P_t (show i) + add_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "+" ++ e2 ++ ")") instance Sub_t Printer_t where - sub_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "-" ++ e2 ++ ")") + sub_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "-" ++ e2 ++ ")") \end{lstHaskellLhstex} \section{Lifting the backends}% @@ -187,10 +189,11 @@ In our language this results in the following type class. \begin{lstHaskellLhstex} class Eval_1 v where - eval_1 :: v -> Int + eval_1 :: v -> Int -data Expr_1 = Lit_1 Int - | Add_1 Expr_1 Expr_1 +data Expr_1 + = Lit_1 Int + | Add_1 Expr_1 Expr_1 \end{lstHaskellLhstex} Implementing the semantics type class instances for the \haskelllhstexinline{Expr_1} data type is an elementary exercise. @@ -198,8 +201,8 @@ By a copy-paste and some modifications, we come to the following implementation. \begin{lstHaskellLhstex} instance Eval_1 Expr_1 where - eval_1 (Lit_1 v) = v - eval_1 (Add_1 e1 e2) = eval_1 e1 + eval_1 e2 + eval_1 (Lit_1 v) = v + eval_1 (Add_1 e1 e2) = eval_1 e1 + eval_1 e2 \end{lstHaskellLhstex} Subtraction can now be defined in a separate data type, leaving the original data type intact. @@ -209,7 +212,7 @@ Instances for the additional semantics can now be implemented separately as inst data Sub_1 = Sub_1 Expr_1 Expr_1 instance Eval_1 Sub_1 where - eval_1 (Sub_1 e1 e2) = eval_1 e1 - eval_1 e2 + eval_1 (Sub_1 e1 e2) = eval_1 e1 - eval_1 e2 \end{lstHaskellLhstex} \section{Existential data types}% @@ -222,18 +225,19 @@ Luckily, we can reconnect them by adding a special constructor to the \haskelllh It contains an existentially quantified~\citep{mitchell_abstract_1988} type with type class constraints~\citep{laufer_combining_1994,laufer_type_1996} for all semantics type classes~\citep[Chp.~6.4.6]{ghc_team_ghc_2021} to allow it to house not just subtraction but any future extension. \begin{lstHaskellLhstex} -data Expr_2 = Lit_2 Int - | Add_2 Expr_2 Expr_2 - | forall x. Eval_2 x => Ext_2 x +data Expr_2 + = Lit_2 Int + | Add_2 Expr_2 Expr_2 + | forall x. Eval_2 x => Ext_2 x \end{lstHaskellLhstex} The implementation of the extension case in the semantics type classes is in most cases just a matter of calling the function for the argument as can be seen in the semantics instances shown below. \begin{lstHaskellLhstex} instance Eval_2 Expr_2 where - eval_2 (Lit_2 v) = v - eval_2 (Add_2 e1 e2) = eval_2 e1 + eval_2 e2 - eval_2 (Ext_2 x) = eval_2 x + eval_2 (Lit_2 v) = v + eval_2 (Add_2 e1 e2) = eval_2 e1 + eval_2 e2 + eval_2 (Ext_2 x) = eval_2 x \end{lstHaskellLhstex} Adding language construct extensions in different data types does mean that an extra \haskelllhstexinline{Ext_2} tag is introduced when using the extension. @@ -268,9 +272,10 @@ To avoid this, the type classes can be bundled in a type class alias or type cla \begin{lstHaskellLhstex} class (Eval_2 x, Print_2 x) => Semantics_2 x -data Expr_2 = Lit_2 Int - | Add_2 Expr_2 Expr_2 - | forall x. Semantics_2 x => Ext_2 x +data Expr_2 + = Lit_2 Int + | Add_2 Expr_2 Expr_2 + | forall x. Semantics_2 x => Ext_2 x \end{lstHaskellLhstex} The class alias removes the need for the programmer to visit the main data type when adding additional semantics. @@ -281,11 +286,12 @@ However, by adding some boilerplate, even without this extension, the language c First the data types for the language constructs are parametrised by the type variable \haskelllhstexinline{d} as follows. \begin{lstHaskellLhstex} -data Expr_3 d = Lit_3 Int - | Add_3 (Expr_3 d) (Expr_3 d) - | forall x. Ext_3 (d x) x +data Expr_3 d + = Lit_3 Int + | Add_3 (Expr_3 d) (Expr_3 d) + | forall x. Ext_3 (d x) x -data Sub_3 d = Sub_3 (Expr_3 d) (Expr_3 d) +data Sub_3 d = Sub_3 (Expr_3 d) (Expr_3 d) \end{lstHaskellLhstex} The \haskelllhstexinline{d} type variable is inhabited by an explicit dictionary for the semantics, i.e.\ a witness to the class instance. @@ -296,10 +302,10 @@ This means that for \haskelllhstexinline{Eval_3}, a dictionary with the function newtype EvalDict_3 v = EvalDict_3 (v -> Int) class HasEval_3 d where - getEval_3 :: d v -> v -> Int + getEval_3 :: d v -> v -> Int instance HasEval_3 EvalDict_3 where - getEval_3 (EvalDict_3 e) = e + getEval_3 (EvalDict_3 e) = e \end{lstHaskellLhstex} The instances for the type classes change as well according to the change in the datatype. @@ -307,12 +313,12 @@ Given that there is a \haskelllhstexinline{HasEval_3} instance for the witness t \begin{lstHaskellLhstex} instance HasEval_3 d => Eval_3 (Expr_3 d) where - eval_3 (Lit_3 v) = v - eval_3 (Add_3 e1 e2) = eval_3 e1 + eval_3 e2 - eval_3 (Ext_3 d x) = getEval_3 d x + eval_3 (Lit_3 v) = v + eval_3 (Add_3 e1 e2) = eval_3 e1 + eval_3 e2 + eval_3 (Ext_3 d x) = getEval_3 d x instance HasEval_3 d => Eval_3 (Sub_3 d) where - eval_3 (Sub_3 e1 e2) = eval_3 e1 - eval_3 e2 + eval_3 (Sub_3 e1 e2) = eval_3 e1 - eval_3 e2 \end{lstHaskellLhstex} Because the \haskelllhstexinline{Ext_3} constructor from \haskelllhstexinline{Expr_3} now contains a value of type \haskelllhstexinline{d}, the smart constructor for \haskelllhstexinline{Sub_3} must somehow come up with this value. @@ -320,7 +326,7 @@ To achieve this, a type class is introduced that allows the generation of such a \begin{lstHaskellLhstex} class GDict a where - gdict :: a + gdict :: a \end{lstHaskellLhstex} This type class has individual instances for all semantics dictionaries, linking the class instance to the witness value. @@ -328,7 +334,7 @@ I.e.\ if there is a type class instance known, a witness value can be conjured u \begin{lstHaskellLhstex} instance Eval_3 v => GDict (EvalDict_3 v) where - gdict = EvalDict_3 eval_3 + gdict = EvalDict_3 eval_3 \end{lstHaskellLhstex} With these instances, the semantics function can be retrieved from the \haskelllhstexinline{Ext_3} constructor and in the smart constructors they can be generated as follows: @@ -344,29 +350,29 @@ First the printer type class, dictionaries and instances for \haskelllhstexinlin \begin{lstHaskellLhstex} class Print_3 v where - print_3 :: v -> String + print_3 :: v -> String newtype PrintDict_3 v = PrintDict_3 (v -> String) class HasPrint_3 d where - getPrint_3 :: d v -> v -> String + getPrint_3 :: d v -> v -> String instance HasPrint_3 PrintDict_3 where - getPrint_3 (PrintDict_3 e) = e + getPrint_3 (PrintDict_3 e) = e instance Print_3 v => GDict (PrintDict_3 v) where - gdict = PrintDict_3 print_3 + gdict = PrintDict_3 print_3 \end{lstHaskellLhstex} Then the instances for \haskelllhstexinline{Print_3} of all the language constructs can be defined. \begin{lstHaskellLhstex} instance HasPrint_3 d => Print_3 (Expr_3 d) where - print_3 (Lit_3 v) = show v - print_3 (Add_3 e1 e2) = "(" ++ print_3 e1 ++ "+" ++ print_3 e2 ++ ")" - print_3 (Ext_3 d x) = getPrint_3 d x + print_3 (Lit_3 v) = show v + print_3 (Add_3 e1 e2) = "(" ++ print_3 e1 ++ "+" ++ print_3 e2 ++ ")" + print_3 (Ext_3 d x) = getPrint_3 d x instance HasPrint_3 d => Print_3 (Sub_3 d) where - print_3 (Sub_3 e1 e2) = "(" ++ print_3 e1 ++ "-" ++ print_3 e2 ++ ")" + print_3 (Sub_3 e1 e2) = "(" ++ print_3 e1 ++ "-" ++ print_3 e2 ++ ")" \end{lstHaskellLhstex} \section{Transformation semantics} @@ -380,18 +386,18 @@ In classy deep embedding, adding new semantics means first adding a new type cla \begin{lstHaskellLhstex} class Opt_3 v where - opt_3 :: v -> v + opt_3 :: v -> v newtype OptDict_3 v = OptDict_3 (v -> v) class HasOpt_3 d where - getOpt_3 :: d v -> v -> v + getOpt_3 :: d v -> v -> v instance HasOpt_3 OptDict_3 where - getOpt_3 (OptDict_3 e) = e + getOpt_3 (OptDict_3 e) = e instance Opt_3 v => GDict (OptDict_3 v) where - gdict = OptDict_3 opt_3 + gdict = OptDict_3 opt_3 \end{lstHaskellLhstex} The implementation of the optimizer for the \haskelllhstexinline{Expr_3} data type is no complicated task. @@ -400,21 +406,21 @@ If this is the case, the addition is removed. \begin{lstHaskellLhstex} instance HasOpt_3 d => Opt_3 (Expr_3 d) where - opt_3 (Lit_3 v) = Lit_3 v - opt_3 (Add_3 e1 e2) = case (opt_3 e1, opt_3 e2) of - (Lit_3 0, e2p ) -> e2p - (e1p, Lit_3 0) -> e1p - (e1p, e2p ) -> Add_3 e1p e2p - opt_3 (Ext_3 d x) = Ext_3 d (getOpt_3 d x) + opt_3 (Lit_3 v) = Lit_3 v + opt_3 (Add_3 e1 e2) = case (opt_3 e1, opt_3 e2) of + (Lit_3 0, e2p ) -> e2p + (e1p, Lit_3 0) -> e1p + (e1p, e2p ) -> Add_3 e1p e2p + opt_3 (Ext_3 d x) = Ext_3 d (getOpt_3 d x) \end{lstHaskellLhstex} Replicating this for the \haskelllhstexinline{Opt_3} instance of \haskelllhstexinline{Sub_3} seems a clear-cut task at first glance. \begin{lstHaskellLhstex} instance HasOpt_3 d => Opt_3 (Sub_3 d) where - opt_3 (Sub_3 e1 e2) = case (opt_3 e1, opt_3 e2) of - (e1p, Lit_3 0) -> e1p - (e1p, e2p ) -> Sub_3 e1p e2p + opt_3 (Sub_3 e1 e2) = case (opt_3 e1, opt_3 e2) of + (e1p, Lit_3 0) -> e1p + (e1p, e2p ) -> Sub_3 e1p e2p \end{lstHaskellLhstex} Unsurprisingly, this code is rejected by the compiler. @@ -428,22 +434,23 @@ It should be noted that a loopback case is \emph{only} required if the transform This changes the \haskelllhstexinline{Sub_3} data type as follows. \begin{lstHaskellLhstex} -data Sub_4 d = Sub_4 (Expr_4 d) (Expr_4 d) - | SubLoop_4 (Expr_4 d) +data Sub_4 d + = Sub_4 (Expr_4 d) (Expr_4 d) + | SubLoop_4 (Expr_4 d) instance HasEval_4 d => Eval_4 (Sub_4 d) where - eval_4 (Sub_4 e1 e2) = eval_4 e1 - eval_4 e2 - eval_4 (SubLoop_4 e1) = eval_4 e1 + eval_4 (Sub_4 e1 e2) = eval_4 e1 - eval_4 e2 + eval_4 (SubLoop_4 e1) = eval_4 e1 \end{lstHaskellLhstex} With this loopback case in the toolbox, the following \haskelllhstexinline{Sub} instance optimises away subtraction with zero literals. \begin{lstHaskellLhstex} instance HasOpt_4 d => Opt_4 (Sub_4 d) where - opt_4 (Sub_4 e1 e2) = case (opt_4 e1, opt_4 e2) of - (e1p, Lit_4 0) -> SubLoop_4 e1p - (e1p, e2p ) -> Sub_4 e1p e2p - opt_4 (SubLoop_4 e) = SubLoop_4 (opt_4 e) + opt_4 (Sub_4 e1 e2) = case (opt_4 e1, opt_4 e2) of + (e1p, Lit_4 0) -> SubLoop_4 e1p + (e1p, e2p ) -> Sub_4 e1p e2p + opt_4 (SubLoop_4 e) = SubLoop_4 (opt_4 e) \end{lstHaskellLhstex} \subsection{Pattern matching} @@ -455,17 +462,19 @@ To enable dynamic typing support, the \haskelllhstexinline{Typeable} type class As a result, the \haskelllhstexinline{Typeable} type class constraints are added to the quantified type variable \haskelllhstexinline{x} of the \haskelllhstexinline{Ext_4} constructor and to \haskelllhstexinline{d}s in the smart constructors. \begin{lstHaskellLhstex} -data Expr_4 d = Lit_4 Int - | Add_4 (Expr_4 d) (Expr_4 d) - | forall x. Typeable x => Ext_4 (d x) x +data Expr_4 d + = Lit_4 Int + | Add_4 (Expr_4 d) (Expr_4 d) + | forall x. Typeable x => Ext_4 (d x) x \end{lstHaskellLhstex} First let us add negation to the language by defining a datatype representing it. Negation elimination requires the removal of negation constructors, so a convolution constructor is defined as well. \begin{lstHaskellLhstex} -data Neg_4 d = Neg_4 (Expr_4 d) - | NegLoop_4 (Expr_4 d) +data Neg_4 d + = Neg_4 (Expr_4 d) + | NegLoop_4 (Expr_4 d) neg_4 :: (Typeable d, GDict (d (Neg_4 d))) => Expr_4 d -> Expr_4 d neg_4 e = Ext_4 gdict (Neg_4 e) @@ -475,12 +484,12 @@ The evaluation and printer instances for the \haskelllhstexinline{Neg_4} datatyp \begin{lstHaskellLhstex} instance HasEval_4 d => Eval_4 (Neg_4 d) where - eval_4 (Neg_4 e) = negate (eval_4 e) - eval_4 (NegLoop_4 e) = eval_4 e + eval_4 (Neg_4 e) = negate (eval_4 e) + eval_4 (NegLoop_4 e) = eval_4 e instance HasPrint_4 d => Print_4 (Neg_4 d) where - print_4 (Neg_4 e) = "(~" ++ print_4 e ++ ")" - print_4 (NegLoop_4 e) = print_4 e + print_4 (Neg_4 e) = "(~" ++ print_4 e ++ ")" + print_4 (NegLoop_4 e) = print_4 e \end{lstHaskellLhstex} The \haskelllhstexinline{Opt_4} instance contains the interesting bit. @@ -489,14 +498,14 @@ If the sub expression is again a negation, something that can only be found out \begin{lstHaskellLhstex} instance (Typeable d, GDict (d (Neg_4 d)), HasOpt_4 d) => Opt_4 (Neg_4 d) where - opt_4 (Neg_4 (Add_4 e1 e2)) - = NegLoop_4 (Add_4 (opt_4 (neg_4 e1)) (opt_4 (neg_4 e2))) - opt_4 (Neg_4 (Ext_4 d x)) - = case fromDynamic (toDyn (getOpt_4 d x)) of - Just (Neg_4 e) -> NegLoop_4 e - _ -> Neg_4 (Ext_4 d (getOpt_4 d x)) - opt_4 (Neg_4 e) = Neg_4 (opt_4 e) - opt_4 (NegLoop_4 e) = NegLoop_4 (opt_4 e) + opt_4 (Neg_4 (Add_4 e1 e2)) + = NegLoop_4 (Add_4 (opt_4 (neg_4 e1)) (opt_4 (neg_4 e2))) + opt_4 (Neg_4 (Ext_4 d x)) + = case fromDynamic (toDyn (getOpt_4 d x)) of + Just (Neg_4 e) -> NegLoop_4 e + _ -> Neg_4 (Ext_4 d (getOpt_4 d x)) + opt_4 (Neg_4 e) = Neg_4 (opt_4 e) + opt_4 (NegLoop_4 e) = NegLoop_4 (opt_4 e) \end{lstHaskellLhstex} Loopback cases do make cross-extensional pattern matching less modular in general. @@ -514,12 +523,12 @@ Luckily, a solution is readily at hand: introduce an ad-hoc combination semantic data OptPrintDict_4 v = OPD_4 (OptDict_4 v) (PrintDict_4 v) instance HasOpt_4 OptPrintDict_4 where - getOpt_4 (OPD_4 v _) = getOpt_4 v + getOpt_4 (OPD_4 v _) = getOpt_4 v instance HasPrint_4 OptPrintDict_4 where - getPrint_4 (OPD_4 _ v) = getPrint_4 v + getPrint_4 (OPD_4 _ v) = getPrint_4 v instance (Opt_4 v, Print_4 v) => GDict (OptPrintDict_4 v) where - gdict = OPD_4 gdict gdict + gdict = OPD_4 gdict gdict \end{lstHaskellLhstex} And this allows us to write \haskelllhstexinline{print_4 (opt_4 e1)} resulting in \verb|"((~42)+(~38))"| when \haskelllhstexinline{e1} represents $(\sim(42+38))-0$ and is thus defined as follows. @@ -539,9 +548,6 @@ e3 :: (Typeable d, GDict (d (Neg_4 d)), GDict (d (Sub_4 d))) => Expr_4 d e3 = neg_4 (Lit_4 42 `sub_4` Lit_4 38) `Add_4` Lit_4 1 \end{lstHaskellLhstex} -\section{Chaining (reprise)}\label{sec:classy_reprise} -\todo{\ldots} - \section{Generalised algebraic data types}% Generalised algebraic data types (GADTs) are enriched data types that allow the type instantiation of the constructor to be explicitly defined~\citep{cheney_first-class_2003,hinze_fun_2003}. Leveraging GADTs, deeply embedded DSLs can be made statically type safe even when different value types are supported. @@ -558,26 +564,26 @@ The omitted definitions and implementation can be found in \cref{sec:cde:appendi \begin{lstHaskellLhstex} data Expr_g d a where - Lit_g :: Show a => a -> Expr_g d a - Add_g :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Expr_g d a - Ext_g :: Typeable x => d x -> x a -> Expr_g d a + Lit_g :: Show a => a -> Expr_g d a + Add_g :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Expr_g d a + Ext_g :: Typeable x => d x -> x a -> Expr_g d a data Neg_g d a where - Neg_g :: (Typeable a, Num a) => Expr_g d a -> Neg_g d a - NegLoop_g :: Expr_g d a -> Neg_g d a + Neg_g :: (Typeable a, Num a) => Expr_g d a -> Neg_g d a + NegLoop_g :: Expr_g d a -> Neg_g d a data Not_g d a where - Not_g :: Expr_g d Bool -> Not_g d Bool - NotLoop_g :: Expr_g d a -> Not_g d a + Not_g :: Expr_g d Bool -> Not_g d Bool + NotLoop_g :: Expr_g d a -> Not_g d a \end{lstHaskellLhstex} The smart constructors for the language extensions inherit the class constraints of their data types and include a \haskelllhstexinline{Typeable} constraint on the \haskelllhstexinline{d} type variable for it to be usable in the \haskelllhstexinline{Ext_g} constructor as can be seen in the smart constructor for \haskelllhstexinline{Neg_g}: \begin{lstHaskellLhstex} neg_g :: (Typeable d, GDict (d (Neg_g d)), Typeable a, Num a) => - Expr_g d a -> Expr_g d a + Expr_g d a -> Expr_g d a neg_g e = Ext_g gdict (Neg_g e) not_g :: (Typeable d, GDict (d (Not_g d))) => - Expr_g d Bool -> Expr_g d Bool + Expr_g d Bool -> Expr_g d Bool not_g e = Ext_g gdict (Not_g e) \end{lstHaskellLhstex} @@ -586,11 +592,11 @@ All occurrences of \haskelllhstexinline{v} are now parametrised by type variable \begin{lstHaskellLhstex} class Eval_g v where - eval_g :: v a -> a + eval_g :: v a -> a class Print_g v where - print_g :: v a -> String + print_g :: v a -> String class Opt_g v where - opt_g :: v a -> v a + opt_g :: v a -> v a \end{lstHaskellLhstex} Now that the shape of the type classes has changed, the dictionary data types and the type classes need to be adapted as well. @@ -601,9 +607,9 @@ Concretely, for the evaluatior this results in the following definitions: \begin{lstHaskellLhstex} newtype EvalDict_g v = EvalDict_g (forall a. v a -> a) class HasEval_g d where - getEval_g :: d v -> v a -> a + getEval_g :: d v -> v a -> a instance HasEval_g EvalDict_g where - getEval_g (EvalDict_g e) = e + getEval_g (EvalDict_g e) = e \end{lstHaskellLhstex} The \haskelllhstexinline{GDict} type class is general enough, so the instances can remain the same. @@ -611,19 +617,19 @@ The \haskelllhstexinline{Eval_g} instance of \haskelllhstexinline{GDict} looks a \begin{lstHaskellLhstex} instance Eval_g v => GDict (EvalDict_g v) where - gdict = EvalDict_g eval_g + gdict = EvalDict_g eval_g \end{lstHaskellLhstex} Finally, the implementations for the instances can be ported without complication show using the optimisation instance of \haskelllhstexinline{Not_g}: \begin{lstHaskellLhstex} instance (Typeable d, GDict (d (Not_g d)), HasOpt_g d) => Opt_g (Not_g d) where - opt_g (Not_g (Ext_g d x)) - = case fromDynamic (toDyn (getOpt_g d x)) :: Maybe (Not_g d Bool) of - Just (Not_g e) -> NotLoop_g e - _ -> Not_g (Ext_g d (getOpt_g d x)) - opt_g (Not_g e) = Not_g (opt_g e) - opt_g (NotLoop_g e) = NotLoop_g (opt_g e) + opt_g (Not_g (Ext_g d x)) + = case fromDynamic (toDyn (getOpt_g d x)) :: Maybe (Not_g d Bool) of + Just (Not_g e) -> NotLoop_g e + _ -> Not_g (Ext_g d (getOpt_g d x)) + opt_g (Not_g e) = Not_g (opt_g e) + opt_g (NotLoop_g e) = NotLoop_g (opt_g e) \end{lstHaskellLhstex} \section{Conclusion}% @@ -636,7 +642,7 @@ These extensions are brought together in a special extension constructor residin This extension case is overloaded by the language construct using a data type containing the class dictionary. As a result, orthogonal extension is possible for language constructs and semantics using only little syntactic overhead or type annotations. The basic technique only requires---well established through history and relatively standard---existential data types. -However, if needed, the technique generalises to GADTs as well, adding rank-2 types to the list of type system requirements as well. +However, if needed, the technique generalises to \glspl{GADT} as well, adding rank-2 types to the list of type system requirements as well. Finally, the abstract syntax tree remains observable which makes it suitable for intensional analyses, albeit using occasional dynamic typing for truly cross-extensional transformations. Defining reusable expressions overloaded in semantics or using multiple semantics on a single expression requires some boilerplate still, getting around this remains future work. @@ -685,21 +691,21 @@ Furthermore, I would like to thank Pieter and Rinus for the fruitful discussions \label{sec:cde:appendix} \begin{lstHaskellLhstex}[caption={Data type definitions.}] data Sub_g d a where - Sub_g :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Sub_g d a - SubLoop_g :: Expr_g d a -> Sub_g d a + Sub_g :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Sub_g d a + SubLoop_g :: Expr_g d a -> Sub_g d a data Eq_g d a where - Eq_g :: (Typeable a, Eq a) => Expr_g d a -> Expr_g d a -> Eq_g d Bool - EqLoop_g :: Expr_g d a -> Eq_g d a + Eq_g :: (Typeable a, Eq a) => Expr_g d a -> Expr_g d a -> Eq_g d Bool + EqLoop_g :: Expr_g d a -> Eq_g d a \end{lstHaskellLhstex} \begin{lstHaskellLhstex}[caption={Smart constructions.}] sub_g :: (Typeable d, GDict (d (Sub_g d)), Eq a, Num a) => - Expr_g d a -> Expr_g d a -> Expr_g d a + Expr_g d a -> Expr_g d a -> Expr_g d a sub_g e1 e2 = Ext_g gdict (Sub_g e1 e2) eq_g :: (Typeable d, GDict (d (Eq_g d)), Eq a, Typeable a) => - Expr_g d a -> Expr_g d a -> Expr_g d Bool + Expr_g d a -> Expr_g d a -> Expr_g d Bool eq_g e1 e2 = Ext_g gdict (Eq_g e1 e2) \end{lstHaskellLhstex} @@ -707,101 +713,196 @@ eq_g e1 e2 = Ext_g gdict (Eq_g e1 e2) newtype PrintDict_g v = PrintDict_g (forall a.v a -> String) class HasPrint_g d where - getPrint_g :: d v -> v a -> String + getPrint_g :: d v -> v a -> String instance HasPrint_g PrintDict_g where - getPrint_g (PrintDict_g e) = e + getPrint_g (PrintDict_g e) = e newtype OptDict_g v = OptDict_g (forall a.v a -> v a) class HasOpt_g d where - getOpt_g :: d v -> v a -> v a + getOpt_g :: d v -> v a -> v a instance HasOpt_g OptDict_g where - getOpt_g (OptDict_g e) = e + getOpt_g (OptDict_g e) = e \end{lstHaskellLhstex} \begin{lstHaskellLhstex}[caption={\texorpdfstring{\haskelllhstexinline{GDict}}{GDict} instances}] instance Print_g v => GDict (PrintDict_g v) where - gdict = PrintDict_g print_g + gdict = PrintDict_g print_g instance Opt_g v => GDict (OptDict_g v) where - gdict = OptDict_g opt_g + gdict = OptDict_g opt_g \end{lstHaskellLhstex} \begin{lstHaskellLhstex}[caption={Evaluator instances}] instance HasEval_g d => Eval_g (Expr_g d) where - eval_g (Lit_g v) = v - eval_g (Add_g e1 e2) = eval_g e1 + eval_g e2 - eval_g (Ext_g d x) = getEval_g d x + eval_g (Lit_g v) = v + eval_g (Add_g e1 e2) = eval_g e1 + eval_g e2 + eval_g (Ext_g d x) = getEval_g d x instance HasEval_g d => Eval_g (Sub_g d) where - eval_g (Sub_g e1 e2) = eval_g e1 - eval_g e2 - eval_g (SubLoop_g e) = eval_g e + eval_g (Sub_g e1 e2) = eval_g e1 - eval_g e2 + eval_g (SubLoop_g e) = eval_g e instance HasEval_g d => Eval_g (Neg_g d) where - eval_g (Neg_g e) = negate (eval_g e) - eval_g (NegLoop_g e) = eval_g e + eval_g (Neg_g e) = negate (eval_g e) + eval_g (NegLoop_g e) = eval_g e instance HasEval_g d => Eval_g (Eq_g d) where - eval_g (Eq_g e1 e2) = eval_g e1 == eval_g e2 - eval_g (EqLoop_g e) = eval_g e + eval_g (Eq_g e1 e2) = eval_g e1 == eval_g e2 + eval_g (EqLoop_g e) = eval_g e instance HasEval_g d => Eval_g (Not_g d) where - eval_g (Not_g e) = not (eval_g e) - eval_g (NotLoop_g e) = eval_g e + eval_g (Not_g e) = not (eval_g e) + eval_g (NotLoop_g e) = eval_g e \end{lstHaskellLhstex} \begin{lstHaskellLhstex}[caption={Printer instances}] instance HasPrint_g d => Print_g (Expr_g d) where - print_g (Lit_g v) = show v - print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")" - print_g (Ext_g d x) = getPrint_g d x + print_g (Lit_g v) = show v + print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")" + print_g (Ext_g d x) = getPrint_g d x instance HasPrint_g d => Print_g (Sub_g d) where - print_g (Sub_g e1 e2) = "(" ++ print_g e1 ++ "-" ++ print_g e2 ++ ")" - print_g (SubLoop_g e) = print_g e + print_g (Sub_g e1 e2) = "(" ++ print_g e1 ++ "-" ++ print_g e2 ++ ")" + print_g (SubLoop_g e) = print_g e instance HasPrint_g d => Print_g (Neg_g d) where - print_g (Neg_g e) = "(negate " ++ print_g e ++ ")" - print_g (NegLoop_g e) = print_g e + print_g (Neg_g e) = "(negate " ++ print_g e ++ ")" + print_g (NegLoop_g e) = print_g e instance HasPrint_g d => Print_g (Eq_g d) where - print_g (Eq_g e1 e2) = "(" ++ print_g e1 ++ "==" ++ print_g e2 ++ ")" - print_g (EqLoop_g e) = print_g e + print_g (Eq_g e1 e2) = "(" ++ print_g e1 ++ "==" ++ print_g e2 ++ ")" + print_g (EqLoop_g e) = print_g e instance HasPrint_g d => Print_g (Not_g d) where - print_g (Not_g e) = "(not " ++ print_g e ++ ")" - print_g (NotLoop_g e) = print_g e + print_g (Not_g e) = "(not " ++ print_g e ++ ")" + print_g (NotLoop_g e) = print_g e \end{lstHaskellLhstex} \begin{lstHaskellLhstex}[caption={Optimisation instances}] instance HasOpt_g d => Opt_g (Expr_g d) where - opt_g (Lit_g v) = Lit_g v - opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of - (Lit_g 0, e2p ) -> e2p - (e1p, Lit_g 0) -> e1p - (e1p, e2p ) -> Add_g e1p e2p - opt_g (Ext_g d x) = Ext_g d (getOpt_g d x) + opt_g (Lit_g v) = Lit_g v + opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of + (Lit_g 0, e2p ) -> e2p + (e1p, Lit_g 0) -> e1p + (e1p, e2p ) -> Add_g e1p e2p + opt_g (Ext_g d x) = Ext_g d (getOpt_g d x) instance HasOpt_g d => Opt_g (Sub_g d) where - opt_g (Sub_g e1 e2) = case (opt_g e1, opt_g e2) of - (e1p, Lit_g 0) -> SubLoop_g e1p - (e1p, e2p ) -> Sub_g e1p e2p - opt_g (SubLoop_g e) = SubLoop_g (opt_g e) + opt_g (Sub_g e1 e2) = case (opt_g e1, opt_g e2) of + (e1p, Lit_g 0) -> SubLoop_g e1p + (e1p, e2p ) -> Sub_g e1p e2p + opt_g (SubLoop_g e) = SubLoop_g (opt_g e) instance (Typeable d, GDict (d (Neg_g d)), HasOpt_g d) => Opt_g (Neg_g d) where - opt_g (Neg_g (Add_g e1 e2)) - = NegLoop_g (Add_g (opt_g (neg_g e1)) (opt_g (neg_g e2))) - opt_g (Neg_g (Ext_g d x)) - = case fromDynamic (toDyn (getOpt_g d x)) of - Just (Neg_g e) -> NegLoop_g e - _ -> Neg_g (Ext_g d (getOpt_g d x)) - opt_g (Neg_g e) = Neg_g (opt_g e) - opt_g (NegLoop_g e) = NegLoop_g (opt_g e) + opt_g (Neg_g (Add_g e1 e2)) + = NegLoop_g (Add_g (opt_g (neg_g e1)) (opt_g (neg_g e2))) + opt_g (Neg_g (Ext_g d x)) + = case fromDynamic (toDyn (getOpt_g d x)) of + Just (Neg_g e) -> NegLoop_g e + _ -> Neg_g (Ext_g d (getOpt_g d x)) + opt_g (Neg_g e) = Neg_g (opt_g e) + opt_g (NegLoop_g e) = NegLoop_g (opt_g e) instance HasOpt_g d => Opt_g (Eq_g d) where - opt_g (Eq_g e1 e2) = Eq_g (opt_g e1) (opt_g e2) - opt_g (EqLoop_g e) = EqLoop_g (opt_g e) + opt_g (Eq_g e1 e2) = Eq_g (opt_g e1) (opt_g e2) + opt_g (EqLoop_g e) = EqLoop_g (opt_g e) +\end{lstHaskellLhstex} + +\section{Chaining semantics (reprise)}\label{sec:classy_reprise} +\todo{Verbeteren} +One of the unique selling points of this novel \gls{DSL} embedding technique is that it, in its basic form, does not require advanced type system extensions. +However, while generalising the technique to \glspl{GADT} arguably unleashes a cesspool of \emph{unsafe} compiler extensions. +If we are willing to work with extensions, much of the boilerplate can be either generated or omitted entirely. + +The \gls{DSL} datatype is parametrised by a type variable providing a witness to the view on the language. +Using constraint kinds, we can make these witnesses explicit, tying into \gls{HASKELL}'s type system immediately. +Furthermore, when resorting to the \GHCmod{DataKinds} and \GHCmod{PolyKinds} extensions, this constraint does not necessarily has to be a single constraint, using a \haskelllhstexinline{Record} auxiliary type, we can encode list of witnesses. +The data type for this list of witnesses is \haskelllhstexinline{Record}. +Record is parametrised by two type variables, the first type variable (\haskelllhstexinline{dt}) is the data type on which the constraints can be applied. +The second type variable (\haskelllhstexinline{clist}) is the list of constraints itself. +It is not just a list of \haskelllhstexinline{Constraint} but it is a list containing constraint constructors that will, when given a type of kind \haskelllhstexinline{k}, produce a constraint. +This means that when \haskelllhstexinline{Cons} is pattern matched, the type class constraint for \haskelllhstexinline{c dt} can be solved by the compiler. + +\begin{lstHaskellLhstex}[caption={Data type for a list of constraints}] +data Record (dt :: k) (clist :: [k -> Constraint]) where + Nil :: Record dt '[] + Cons :: c dt => Record dt cs -> Record dt (c ': cs) +\end{lstHaskellLhstex} + +To incorporate this type in the \haskelllhstexinline{Expr} type, the \haskelllhstexinline{Ext} constructor changes as follows: + +\begin{lstHaskellLhstex}[caption={Data type for a list of constraints}] +data Expr c + = Lit Int + | Add (Expr c) (Expr c) + | Ext (Record x c) x +\end{lstHaskellLhstex} + +Furthermore, we define a type class that allows us to extract explicit dictionaries \haskelllhstexinline{Dict} from these records if the constraint can is present in the list. + +\begin{lstHaskellLhstex}[caption={Membership functions for constraints}] +class c `In` cs where + project :: Record dt cs -> Dict (c dt) +instance {-# OVERLAPPING #-} c `In` (c ': cs) where + project (Cons _) = Dict +instance {-# OVERLAPPING #-} c `In` cs => c `In` (b ': cs) where + project (Cons xs) = project xs +\end{lstHaskellLhstex} + +Finally, creating these \haskelllhstexinline{Record} witnesses is a chore so this can be automated as well using a \haskelllhstexinline{CreateRecord} type class that will create a record structure cell by cell if and only if all type class constraints are available. + +\begin{lstHaskellLhstex}[caption={Membership functions for constraints}] +class CreateRecord dt c where + createRecord :: Record dt c +instance CreateRecord d '[] where + createRecord = Nil +instance (c (d c0), CreateRecord (d c0) cs) => + CreateRecord (d c0) (c ': cs) where + createRecord = Cons createRecord +\end{lstHaskellLhstex} + +The class constraints for the interpretation instances can now be greatly simplified, as shown in the evaluation instance for \haskelllhstexinline{Expr}. +The implementation remains the same, only that for the extension case, a trick needs to be applied to convince the compiler of the correct instances. +Using \haskelllhstexinline{`In`}'s \haskelllhstexinline{project} function, a dictionary can be brought into scope. +This dictionary can then subsequently be used to apply the type class function on the extension using the \haskelllhstexinline{withDict} function from the \haskelllhstexinline{Data.Constraint} library\footnote{\haskelllhstexinline{withDict :: Dict c -> (c => r) -> r}}. +The \GHCmod{ScopedTypeVariables} extension is used to make sure the existentially quantified type variable for the extension is matched to the type of the dictionary. + +\begin{lstHaskellLhstex}[caption={Evaluation instance for the main data type}] +class Eval v where eval :: v -> Int + +instance Eval `In` s => Eval (Expr s) where + eval (Lit i) = i + eval (Add l r) = eval l + eval r + eval (Ext r (e :: x)) = withDict (project r :: Dict (Eval x)) eval e +\end{lstHaskellLhstex} + +Smart constructors need to be adapted as well, as can be seen from the smart constructor \haskelllhstexinline{subst}. + +\begin{lstHaskellLhstex}[caption={Substitution smart constructor}] +subst :: (Typeable c, CreateRecord (Subt c) c) => Expr c -> Expr c -> Expr c +subst l r = Ext createRecord (l `Subt` r) +\end{lstHaskellLhstex} + +Finally, defining terms in the language is can be done immediately if the interpretations are known. +For example, if we want to print the term $~(~(42+(38-4)))$, we can define it as follows. + +\begin{lstHaskellLhstex}[caption={Substitution smart constructor}] +e0 :: Expr '[Print,Opt] +e0 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4))) +\end{lstHaskellLhstex} + +It is also possible to define terms in the \gls{DSL} as being overloaded in the interpretation. +This does require enumerating all the \haskelllhstexinline{CreateRecord} type classes for every extension. +At the call site, the concrete list of constraints must be known. + +\begin{lstHaskellLhstex}[caption={Substitution smart constructor}] +e1 :: (Typeable c + , CreateRecord (Neg c) c + , CreateRecord (Subst c) c) + => Expr c +e1 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4))) \end{lstHaskellLhstex} \end{subappendices} diff --git a/dsl/dsl_techniques.tex b/dsl/dsl_techniques.tex index 7791c82..29e49c8 100644 --- a/dsl/dsl_techniques.tex +++ b/dsl/dsl_techniques.tex @@ -5,40 +5,64 @@ \pagenumbering{arabic} }{} -\chapter{DSL embedding techniques}% +\chapter{\texorpdfstring{\Acrshort{DSL}}{DSL} embedding techniques}% \label{chp:dsl_embedding_techniques}% An \gls{EDSL} is a language embedded in a host language created for a specific domain\todo{citation needed?}. -\glspl{EDSL} can have one or more backends or views. -Commonly used views are pretty printing, compiling, simulating, verifying and proving the program. -Embedding techniques can be characterised by at least the following properties: -\begin{enumerate} - \item Extendibility of constructs - \item Extendibility of views - \item Type safety, the type of a term in the language is typed in the host language - \item Support for well typed variables - \item Intensional analyses - \item Support for partial views, not all views need to support all constructs -\end{enumerate} - -There are two main techniques for creating embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding. +Properties such as referential transparency, minimal syntax and \glspl{ADT} make \gls{FP} languages excellent candidates for hosting \glspl{EDSL}. +Terms in an \glspl{EDSL} can have multiple interpretations---also called backends or views---i.e.\ preferably, a term in the \gls{DSL} is just an interface. +Commonly used intepretations are printing, compiling, simulating, optimising, verifying, proving the program\etc. +There are two main flavours of embedding \glspl{DSL}, deep---also called tagged---embedding and shallow---also called tagless---embedding. + +\Cref{sec:deep_embedding} shows the basics of deep embedding. +\Cref{sec:shallow_embedding} shows the basics of shallow embedding including tagless embedding. +\Cref{sec:compare_embedding} compares the embedding technique. + +Both flavours have their strengths and weaknesses and both flavours can be improved in order to mitigate (some of the) downsides. + +\begin{table}[ht] + \begin{threeparttable}[b] + \caption{Comparison of embedding techniques, adapted from \citet[Sec.~3.6]{sun_compositional_2022}}% + \label{tbl:dsl_comparison} + \begin{tabular}{lllllll} + \toprule + & Shallow & Deep & Hybrid & Poly & Comp. & Classy\\ + \midrule + Transcoding free & yes & yes & no & yes & yes & yes\\ + Linguistic reuse & yes & no & partly\tnote{1} & yes & yes & no?\\ + Extend constructs & yes & no & partly\tnote{1} & yes & yes & yes\\ + Extend interpretations & no & yes & yes & yes & yes & yes\\ + Transformations & no & yes & yes & maybe\tnote{2} & maybe\tnote{2} & yes\\ + Modular dependencies & no & maybe & maybe & yes & yes & yes\\ + Nested pattern matching & no & yes & yes & no & maybe & maybe\tnote{3}\\ + Type safe & yes & maybe & no & yes & yes & yes\\ + \bottomrule + \end{tabular} + \begin{tablenotes} + \item [1] Only in the shallowly embedded part. + \item [2] Transformations require some ingenuity and are sometimes awkward to write. + \item [3] It requires some---safe---form of dynamic typing. + \end{tablenotes} + \end{threeparttable} +\end{table} + In the following sections both techniques are explained. As an example, a simple language with integers, booleans and some arithmetic operators is used as a running example. -\section{Deep embedding} -A deep \gls{EDSL} is a language represented as data in the host language. -Views are functions that transform \emph{something} to the datatype or the other way around. +\section{Deep embedding}\label{sec:deep_embedding} +In a deeply embedded \gls{DSL}, the language terms are represented as data type{(s)} in the host language. +Therefore, interpretations of the terms are functions that operate on these data types. Definition~\ref{lst:exdeep} shows an implementation for the example \gls{DSL}. \begin{lstHaskell}[label={lst:exdeep},caption={A deeply embedded expression \gls{DSL}.}] data Value = I Int | B Bool data Expr - = Lit Value - | Plus Expr Expr - | Eq Expr Expr + = Lit Value + | Plus Expr Expr + | Eq Expr Expr deriving Show \end{lstHaskell} -Implementing a printer for the language is straightforward: +Implementing a printer for the language is straightforward, we just define a function that transforms the term to a string. \begin{lstHaskell}[caption={A printer for the deeply embedded expression \gls{DSL}.}] print :: Expr -> String @@ -47,15 +71,17 @@ print (Plus l r) = "(" ++ print l ++ "+" ++ print r ++ ")" print (Eq l r) = "(" ++ print l ++ "==" ++ print r ++ ")" \end{lstHaskell} -Adding a construct, for example subtraction reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views. -I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type and \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view. +Adding a construct---for example subtraction---reveals the Achilles' heel of deep embedding, namely that we need to revisit the original data type \emph{and} all the existing views. +I.e.\ we need to add \haskellinline{| Sub Expr Expr} to the \haskellinline{Expr} data type. +Furthermore, we need to add \haskellinline{print (Sub l r) = ...} to the \haskellinline{print} view in order to not end up with a partial function. This limitation can be overcome by lifting the views to classes (See \cref{chp:classy_deep_embedding}). Implementing an evaluator for the language is possible without touching any original code, we just add a function operating on the \haskellinline{Expr} data type. -To store variables, it has an extra environment argument\todo{cite while}. +To store variables, it has an extra environment argument. Here another downside of basic deep embedding arises immediately, the expressions are not typed, and therefore there has to be some type checking in the evaluation code. +Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator. -\begin{lstHaskell}[] +\begin{lstHaskell}[caption={An evaluator for the deeply embedded expression \gls{DSL}.}] eval :: Expr -> Value eval (Lit i) = i eval (Plus l r) = case (eval l, eval r) of @@ -67,9 +93,20 @@ eval (Eq l r) = case (eval l, eval r) of (l, r) -> error ("Can't compare " ++ show l ++ " to " ++ show r) \end{lstHaskell} -Luckily this problem can be overcome by switching from regular \glspl{ADT} to \glspl{GADT}, resulting in the following data type and evaluator. +\subsection{Deep embedding with \texorpdfstring{\acrshortpl{GADT}}{GADTs}} +Deep embedding has the advantage that it is easy to build and views are easy to add. +On the downside, the expressions created with this language are not necessarily type-safe. +In the given language it is possible to create an expression such as \haskellinline{Plus (LitI 4) (LitB True)} that adds a boolean to an integer. +Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views. -\begin{lstHaskell}[] +The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\citep{cheney_first-class_2003}. +\Cref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}. +\glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}). +However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\citep{cheney_first-class_2003}. +Unfortunately the lack of extendability remains a problem. +If a language construct is added, no compile time guarantee can be given that all views support it. + +\begin{lstHaskell}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}] data Expr a where Lit :: Show a => a -> Expr a Plus :: Num a => Expr a -> Expr a -> Expr a @@ -81,34 +118,14 @@ eval (Plus l r) = eval l + eval r eval (Eq l r) = eval l == eval r \end{lstHaskell} -Deep embedding has the advantage that it is easy to build and views are easy to add. -On the downside, the expressions created with this language are not necessarily type-safe. -In the given language it is possible to create an expression such as \cleaninline{Plus (LitI 4) (LitB True)} that adds a boolean to an integer. -Extending the \gls{ADT} is easy and convenient but extending the views accordingly is tedious since it has to be done individually for all views. - -The first downside of this type of \gls{EDSL} can be overcome by using \glspl{GADT}~\citep{cheney_first-class_2003}. -Example~\ref{lst:exdeepgadt} shows the same language, but type-safe with a \gls{GADT}. -\glspl{GADT} are not supported in the current version of \gls{CLEAN} and therefore the syntax is hypothetical (See \todo{insert link to appendix}). -However, it has been shown that \glspl{GADT} can be simulated using bimaps or projection pairs~\citep{cheney_first-class_2003}. -Unfortunately the lack of extendability remains a problem. -If a language construct is added, no compile time guarantee can be given that all views support it. - -\begin{lstClean}[label={lst:exdeepgadt},caption={A deeply embedded expression \gls{DSL} using \glspl{GADT}.}] -:: Expr a - = Lit a -> Expr a - | E.e: Var String -> Expr e - | Plus (Expr Int) (Expr Int) -> Expr Int - | E.e: Eq (Expr e) (Expr e) -> Expr Bool & == e -\end{lstClean} - \section{Shallow embedding} In a shallowly \gls{EDSL} all language constructs are expressed as functions in the host language. -An evaluator view for the example language then can be implemented as the code shown in Definition~\ref{lst:exshallow}. +An evaluator view for the example language then can be implemented as the code shown in \cref{lst:exshallow}. Note that much of the internals of the language can be hidden using monads. -\begin{lstClean}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}] -:: Env :== (String -> Int) -:: DSL a :== (Env -> a) +\begin{lstHaskell}[label={lst:exshallow}, caption={A minimal shallow \gls{EDSL}.}] +type Env = String -> Int +type DSL a = Env -> a Lit :: a -> DSL a Lit x = \e->x @@ -116,16 +133,17 @@ Lit x = \e->x Var :: String -> DSL Int Var i = \e->retrEnv e i -Plus :: (DSL Int) (DSL Int) -> DSL Int +Plus :: DSL Int -> DSL Int -> DSL Int Plus x y = \e->x e + y e -Eq :: (DSL a) (DSL a) -> DSL Bool | == a +Eq :: Eq a => DSL a -> DSL a -> DSL Bool Eq x y = \e->x e == y e -\end{lstClean} +\end{lstHaskell} -The advantage of shallowly embedding a language in a host language is its extendability. +One of the advantages of shallowly embedding a language in a host language is its extendability. It is very easy to add functionality because the compile time checks of the host language guarantee whether or not the functionality is available when used. Moreover, the language is type safe as it is directly typed in the host language, i.e.\ \cleaninline{Lit True +. Lit 4} is rejected. +Another advantage is the intimate link with the host language, allowing for a lot more linguistic reuse such as the support of implicit sharing~\cite{krishnamurthi_linguistic_2001}. The downside of this method is extending the language with views. It is nearly impossible to add views to a shallowly embedded language. @@ -134,6 +152,7 @@ This will mean that every component will have to implement all views rendering i \subsection{Tagless-final embedding}\label{ssec:tagless} + \section{Comparison} \todo{cite compositional DSLs} diff --git a/dsl/first-class_datatypes.tex b/dsl/first-class_datatypes.tex index 2f3464c..4caef3b 100644 --- a/dsl/first-class_datatypes.tex +++ b/dsl/first-class_datatypes.tex @@ -343,7 +343,7 @@ This information can then be used to generate code according to the structure of Reification is done using the \haskellinline{reify :: Name -> Q Info} function. The \haskellinline{Info} type is an \gls{ADT} containing all the---known to the compiler---information about the matching type: constructors, instances, etc. -\section{Metaprogramming for generating \gls{DSL} functions} +\section{Metaprogramming for generating \texorpdfstring{\acrshort{DSL}}{DSL} functions} With the power of metaprogramming, we can generate the boilerplate code for our user-defined data types automatically at compile time. To generate the code required for the \gls{DSL}, we define the \haskellinline{genDSL} function. The type belonging to the name passed as an argument to this function is made available for the \gls{DSL} by generating the \haskellinline{typeDSL} class and view instances. diff --git a/intro/introduction.tex b/intro/introduction.tex index 87aeee6..93e541a 100644 --- a/intro/introduction.tex +++ b/intro/introduction.tex @@ -81,7 +81,7 @@ Writing idiomatic domain-specific code in an \gls{DSL} is easy but this may come \begin{figure}[ht] \centering \includestandalone{hyponymy_of_dsls} - \caption{Hyponymy of \glspl{DSL} (adapted from \cite[pg.\ 2]{mernik_extensible_2013})}% + \caption{Hyponymy of \glspl{DSL} (adapted from \citet[pg.\ 2]{mernik_extensible_2013})}% \label{fig:hyponymy_of_dsls} \end{figure} diff --git a/lstlanghaskell.sty b/lstlanghaskell.sty index 0abc301..88d13a3 100644 --- a/lstlanghaskell.sty +++ b/lstlanghaskell.sty @@ -1,10 +1,11 @@ \lstdefinelanguage[Regular]{Haskell}[]{Haskell}{% keywords={abstype,if,then,else,case,class,data,default,deriving,hiding,if,in,infix,infixl,infixr,import,instance,let,module,newtype,of,qualified,type,where,do,Q}, + morekeywords={forall}, literate=% - {forall}{{$\forall$}}1 +% {forall}{{$\forall$}}1 {\_}{{\raisebox{.15ex}{\_}}}1 {~}{{\raisebox{-.6ex}{\textasciitilde}}}1 - {\\}{{$\lambda\:$}}1 +% {\\}{{$\lambda\:$}}1 {->}{{$\shortrightarrow$}}2 {<-}{{$\shortleftarrow$}}2 {=>}{{$\Rightarrow$}}2 diff --git a/other.bib b/other.bib index 188e95c..e44a014 100644 --- a/other.bib +++ b/other.bib @@ -1803,3 +1803,16 @@ Publisher: Association for Computing Machinery}, author = {Lijnse, Bas}, year = {2022}, } + +@article{sun_compositional_2022, + title = {Compositional {Embeddings} of {Domain}-{Specific} {Languages}}, + volume = {6}, + doi = {10.1145/3563294}, + language = {en}, + number = {OOPSLA2}, + journal = {Proc. ACM Program. Lang.}, + author = {Sun, Yaozhu and Dhandhania, Utkarsh and Oliveira, Bruno C. d. S.}, + year = {2022}, + pages = {34}, + file = {Sun and Dhandhania - Compositional Embeddings of Domain-Specific Langua.pdf:/home/mrl/.local/share/zotero/storage/Y4GADQFP/Sun and Dhandhania - Compositional Embeddings of Domain-Specific Langua.pdf:application/pdf}, +} diff --git a/preamble.tex b/preamble.tex index b285f89..de54386 100644 --- a/preamble.tex +++ b/preamble.tex @@ -143,6 +143,7 @@ \usepackage{multirow} % Multirow cells \usepackage{tabularx} % Automatically wrapping tables \usepackage{longtable} % Tables spanning pages +\usepackage{threeparttable} % Tables with footnotes % Code % Pseudocode @@ -187,12 +188,12 @@ {Clean},% {[Regular]Haskell}, {[Lhs2Tex]Haskell}} -\newcommand{\cinline}[1]{\lstinline[language=c,basicstyle=\tt,postbreak=]|#1|} -\newcommand{\arduinoinline}[1]{\lstinline[language={[Arduino]C++},basicstyle=\tt,postbreak=]|#1|} -\newcommand{\pythoninline}[1]{\lstinline[language=Python,basicstyle=\tt,postbreak=]|#1|} -\newcommand{\cleaninline}[1]{\lstinline[language=Clean,basicstyle=\tt,postbreak=]|#1|} -\newcommand{\haskellinline}[1]{\lstinline[language={[Regular]Haskell},basicstyle=\tt,postbreak=]|#1|} -\newcommand{\haskelllhstexinline}[1]{\lstinline[language={[Lhs2Tex]Haskell},basicstyle=\tt,postbreak=]|#1|} +\newcommand{\cinline}[1]{\lstinline[language=c,postbreak=]|#1|} +\newcommand{\arduinoinline}[1]{\lstinline[language={[Arduino]C++},postbreak=]|#1|} +\newcommand{\pythoninline}[1]{\lstinline[language=Python,postbreak=]|#1|} +\newcommand{\cleaninline}[1]{\lstinline[language=Clean,postbreak=]|#1|} +\newcommand{\haskellinline}[1]{\lstinline[language={[Regular]Haskell},postbreak=]|#1|} +\newcommand{\haskelllhstexinline}[1]{\lstinline[language={[Lhs2Tex]Haskell},postbreak=]|#1|} %For storing listings in footnotes \newsavebox{\LstBox} % Fix list of listings title @@ -309,7 +310,7 @@ % Custom commands \newcommand{\GHCmod}[1]{\texttt{#1}} -\newcommand{\requiresGHCmod}[2][]{\footnote{Requires \GHCmod{#1} to be enabled.#2}} +\newcommand{\requiresGHCmod}[2][]{\footnote{Requires \GHCmod{#2} to be enabled. #1}} \newcommand{\etc}{{\fontfamily{cmr}\selectfont{\itshape\/\&c}}} \newcommand{\rdmentry}[6]{#1: #2 (#3): #4. #5.\ \href{https://doi.org/#6}{#6}} \newcommand{\mlubbers}{Lubbers, M.\ (Radboud University)} diff --git a/top/mtask.tex b/top/mtask.tex index 013e84a..17ee2b0 100644 --- a/top/mtask.tex +++ b/top/mtask.tex @@ -56,7 +56,8 @@ blink = >>|. delay (lit 500) >>|. writeD d2 false >>|. delay (lit 500) - )}\end{lstClean} + ) +}\end{lstClean} \end{subfigure} \end{figure} @@ -628,7 +629,6 @@ This task sets up the communication, exchanges specifications, handles errors an class channelSync a :: a (sds () Channels Channels) -> Task () | RWShared sds withDevice :: (a (MTDevice -> Task b) -> Task b) | iTask b & channelSync, iTask a - \end{lstClean} \section{Lifting tasks} diff --git a/top/mtask_integration.tex b/top/mtask_integration.tex index f0f32d8..efd0659 100644 --- a/top/mtask_integration.tex +++ b/top/mtask_integration.tex @@ -1,15 +1,70 @@ \documentclass[tikz]{standalone} -\usetikzlibrary{positioning} +\usetikzlibrary{positioning,shapes.geometric,shapes.symbols,shadows,fit} \begin{document} -\begin{tikzpicture}[nodes={draw,minimum width=7.5em}] - \node (dsl) {domain-specific language}; - \node (sta) [below=of dsl,xshift=-3.75em] {standalone}; - \node (emb) [right=of sta] {embedded}; - \node (het) [below=of emb,xshift=-3.75em] {heterogeneous}; - \node (hom) [right=of het] {homogeneous}; - \draw [->] (dsl) -- (sta); - \draw [->] (dsl) -- (emb); - \draw [->] (emb) -- (het); - \draw [->] (emb) -- (hom); +\begin{tikzpicture}[% + node distance=1em, + task/.style={regular polygon,regular polygon sides=3,text width=2.5em,draw}, + share/.style={cylinder,shape border rotate=90,aspect=.3,text width=2.5em,text height=1em,draw}, + client/.style={cloud,aspect=2} + ] + \node (C1) [client,inner sep=0pt,draw] {Client\textsubscript{1}}; + \node (C2) [client,inner sep=0pt,draw,right=of C1] {Client\textsubscript{2}}; + \node (CD) [right=of C2] {\ldots}; + \node (CN) [client,inner sep=0pt,draw,right=of CD] {Client\textsubscript{n}}; + + % line between server and browser + \draw [dotted] ([xshift=-5em,yshift=-.8em]C1.south west) node[left,above]{browser} node[left,below]{server(s)} -- ([xshift=3em,yshift=-.8em]CN.south east); + + \node[task,inner sep=-4.5pt,below=of C1,fill=white,xshift=4.5em,yshift=-4.5em] (Tn) {Task\textsubscript{n}}; + \node[task,inner sep=-4.5pt,below=of C1,fill=white,xshift=3em,yshift=-3.0em] (Td) {\ldots}; + \node[task,inner sep=-4.5pt,below=of C1,fill=white,xshift=1.5em,yshift=-1.5em] (T2) {Task\textsubscript{2}}; + \node[task,inner sep=-4.5pt,below=of C1,fill=white] (T1) {Task\textsubscript{1}}; + + \node[share,node distance=6em,right=of T1,fill=white,xshift=4.5em,yshift=-4.5em] (Sn) {SDS\textsubscript{n}}; + \node[share,node distance=6em,right=of T1,fill=white,xshift=3em,yshift=-3.0em] (Sd) {\ldots}; + \node[share,node distance=6em,right=of T1,fill=white,xshift=1.5em,yshift=-1.5em] (S2) {SDS\textsubscript{2}}; + \node[share,node distance=6em,right=of T1,fill=white] (S1) {SDS\textsubscript{1}}; + + % line between client and server + \draw [dotted] ([xshift=-5em,yshift=-11em]C1.south west) node[left,above]{server(s)} node[left,below]{device(s)} -- ([xshift=3em,yshift=-11em]CN.south east); + + % device 1 + \node[task,text width=1.5em,inner sep=-4.5pt,node distance=12em,below=of C1,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (D1T1) {}; + \node[share,text width=.3em,text height=.8ex,right=of D1T1,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (D1S1) {}; + + \node[draw,fit={(D1S1)(D1T1)([shift={(1.2em,-1.2em)}]D1S1.south east)},label=below:{\scriptsize mTask device\textsubscript{1}}] {}; + + % device 2 + \node[task,text width=1.5em,inner sep=-4.5pt,node distance=12em,below=of C2,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (D2T1) {}; + \node[share,text width=.3em,text height=.8ex,right=of D2T1,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (D2S1) {}; + + \node[draw,fit={(D2S1)(D2T1)([shift={(1.2em,-1.2em)}]D2S1.south east)},label=below:{\scriptsize mTask device\textsubscript{2}}] (D2) {}; + + % device ... + \node[right=of D2] {\ldots}; + + % device n + \node[task,text width=1.5em,inner sep=-4.5pt,node distance=12em,below=of CN,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (DNT1) {}; + \node[share,text width=.3em,text height=.8ex,right=of DNT1,double copy shadow={shadow xshift=.4em,shadow yshift=-.4em},fill=white] (DNS1) {}; + + \node[draw,fit={(DNS1)(DNT1)([shift={(1.2em,-1.2em)}]DNS1.south east)},label=below:{\scriptsize mTask device\textsubscript{n}}] (DN) {}; + + % iTask server + \node[fit={(C1)([shift={(1em,-1.2em)}]DN.south east)},draw,label={iTask server}] {}; + + % Arrows + \draw [<->] (C1) -- (T1); + \draw [<->] (C2) -- (T1); + \draw [<->] (CN) -- (T1); + + \draw [<->] (T1) -- (S1); + \draw [<->] (T1) -- (S2); + \draw [<->] (T1) -- (Sn); + + \draw [<->] (Tn) -- node [midway,above,sloped] {\scriptsize\texttt{liftmTask}} (D1T1); + \draw [<->] (Sn) -- node [midway,above,sloped] {\scriptsize\texttt{liftsds}} (D1S1); + + \draw [<->] (D1T1) -- (D1S1); + \end{tikzpicture} \end{document}