dsl/class_deep_embedding.tex

   1 \documentclass[../thesis.tex]{subfiles}
   2
   3 \begin{document}
   4 \ifSubfilesClassLoaded{
   5         \pagenumbering{arabic}
   6 }{}
   7
   8 \chapter{Deep embedding with class}%
   9 \label{chp:classy_deep_embedding}
  10
  11 \begin{chapterabstract}
  12         The two flavours of DSL embedding are shallow and deep embedding.
  13         In functional languages, shallow embedding models the language constructs as functions in which the semantics are embedded.
  14         Adding semantics is therefore cumbersome while adding constructs is a breeze.
  15         Upgrading the functions to type classes lifts this limitation to a certain extent.
  16
  17         Deeply embedded languages represent their language constructs as data and the semantics are functions on it.
  18         As a result, the language constructs are embedded in the semantics, hence adding new language constructs is laborious where adding semantics is trouble free.
  19
  20         This paper shows that by abstracting the semantics functions in deep embedding to type classes, it is possible to easily add language constructs as well.
  21         So-called classy deep embedding results in DSLs that are extensible both in language constructs and in semantics while maintaining a concrete abstract syntax tree.
  22         Additionally, little type-level trickery or complicated boilerplate code is required to achieve this.
  23 \end{chapterabstract}
  24
  25 \section{Introduction}
  26 The two flavours of DSL embedding are deep and shallow embedding~\citep{boulton_experience_1992}.
  27 In functional programming languages, shallow embedding models language constructs as functions in the host language.
  28 As a result, adding new language constructs---extra functions---is easy.
  29 However, the semantics of the language is embedded in these functions, making it troublesome to add semantics since it requires updating all existing language constructs.
  30
  31 On the other hand, deep embedding models language constructs as data in the host language.
  32 The semantics of the language are represented by functions over the data.
  33 Consequently, adding new semantics, i.e.\ novel functions, is straightforward.
  34 It can be stated that the language constructs are embedded in the functions that form a semantics.
  35 If one wants to add a language construct, all semantics functions must be revisited and revised to avoid ending up with partial functions.
  36
  37 This juxtaposition has been known for many years~\citep{reynolds_user-defined_1978} and discussed by many others~\citep{krishnamurthi_synthesizing_1998} but most famously dubbed the \emph{expression problem} by Wadler~\citep{wadler_expression_1998}:
  38
  39 \begin{quote}
  40         The \emph{expression problem} is a new name for an old problem.
  41         The goal is to define a data type by cases, where one can add new cases to the data type and new functions over the data type, without recompiling existing code, and while retaining static type safety (e.g., no casts).
  42 \end{quote}
  43
  44 In shallow embedding, abstracting the functions to type classes disentangles the language constructs from the semantics, allowing extension both ways.
  45 This technique is dubbed tagless-final embedding~\citep{carette_finally_2009}, nonetheless it is no silver bullet.
  46 Some semantics that require an intensional analysis of the syntax tree, such as transformation and optimisations, are difficult to implement in shallow embedding due to the lack of an explicit data structure representing the abstract syntax tree.
  47 The semantics of the DSL have to be combined and must hold some kind of state or context, so that structural information is not lost~\citep{kiselyov_typed_2012}.
  48
  49 \subsection{Research contribution}
  50 This paper shows how to apply the technique observed in tagless-final embedding to deep embedding.
  51 The presented basic technique, christened \emph{classy deep embedding}, does not require advanced type system extensions to be used.
  52 However, it is suitable for type system extensions such as generalised algebraic data types.
  53 While this paper is written as a literate
  54 Haskell~\citep{peyton_jones_haskell_2003} program using some minor extensions provided by GHC~\citep{ghc_team_ghc_2021}, the idea is applicable to other languages as well\footnotemark{}.
  55 \footnotetext{Lubbers, M. (2021): Literate Haskell/lhs2\TeX{} source code of the paper ``Deep Embedding
  56 with Class'': TFP 2022.\ DANS.\ \url{https://doi.org/10.5281/zenodo.5081386}.}
  57
  58 \section{Deep embedding}
  59 Pick a DSL, any DSL, pick the language of literal integers and addition.
  60 In deep embedding, terms in the language are represented by data in the host language.
  61 Hence, defining the constructs is as simple as creating the following algebraic data type\footnote{All data types and functions are subscripted to indicate the evolution.}.
  62
  63 \begin{lstHaskellLhstex}
  64 data Expr_0
  65         = Lit_0 Int
  66         | Add_0 Expr_0 Expr_0
  67 \end{lstHaskellLhstex}
  68
  69 Semantics are defined as functions on the \haskelllhstexinline{Expr_0} data type.
  70 For example, a function transforming the term to an integer---an evaluator---is implemented as follows.
  71
  72 \begin{lstHaskellLhstex}
  73 eval_0 :: Expr_0 -> Int
  74 eval_0 (Lit_0 e)     = e
  75 eval_0 (Add_0 e1 e2) = eval_0 e1 + eval_0 e2
  76 \end{lstHaskellLhstex}
  77
  78 Adding semantics---e.g.\ a printer---just means adding another function while the existing functions remain untouched.
  79 I.e.\ the key property of deep embedding.
  80 The following function, transforming the \haskelllhstexinline{Expr_0} data type to a string, defines a simple printer for our language.
  81
  82 \begin{lstHaskellLhstex}
  83 print_0 :: Expr_0 -> String
  84 print_0 (Lit_0 v)     = show v
  85 print_0 (Add_0 e1 e2) = "(" ++ print_0 e1 ++ "-" ++ print_0 e2 ++ ")"
  86 \end{lstHaskellLhstex}
  87
  88 While the language is concise and elegant, it is not very expressive.
  89 Traditionally, extending the language is achieved by adding a case to the \haskelllhstexinline{Expr_0} data type.
  90 So, adding subtraction to the language results in the following revised data type.
  91
  92 \begin{lstHaskellLhstex}
  93 data Expr_0
  94         = Lit_0 Int
  95         | Add_0 Expr_0 Expr_0
  96         | Sub_0 Expr_0 Expr_0
  97 \end{lstHaskellLhstex}
  98
  99 Extending the DSL with language constructs exposes the Achilles' heel of deep embedding.
 100 Adding a case to the data type means that all semantics functions have become partial and need to be updated to be able to handle this new case.
 101 This does not seem like an insurmountable problem, but it does pose a problem if either the functions or the data type itself are written by others or are contained in a closed library.
 102
 103 \section{Shallow embedding}
 104 Conversely, let us see how this would be done in shallow embedding.
 105 First, the data type is represented by functions in the host language with embedded semantics.
 106 Therefore, the evaluators for literals and addition both become a function in the host language as follows.
 107
 108 \begin{lstHaskellLhstex}
 109 type Sem_s = Int
 110
 111 lit_s :: Int -> Sem_s
 112 lit_s i = i
 113
 114 add_s :: Sem_s -> Sem_s -> Sem_s
 115 add_s e1 e2 = e1 + e2
 116 \end{lstHaskellLhstex}
 117
 118 Adding constructions to the language is done by adding functions.
 119 Hence, the following function adds subtraction to our language.
 120
 121 \begin{lstHaskellLhstex}
 122 sub_s :: Sem_s -> Sem_s -> Sem_s
 123 sub_s e1 e2 = e1 - e2
 124 \end{lstHaskellLhstex}
 125
 126 Adding semantics on the other hand---e.g.\ a printer---is not that simple because the semantics are part of the functions representing the language constructs.
 127 One way to add semantics is to change all functions to execute both semantics at the same time.
 128 In our case this means changing the type of \haskelllhstexinline{Sem_s} to be \haskelllhstexinline{(Int, String)} so that all functions operate on a tuple containing the result of the evaluator and the printed representation at the same time. %chktex 36
 129 Alternatively, a single semantics can be defined that represents a fold over the language constructs~\citep{gibbons_folding_2014}, delaying the selection of semantics to the moment the fold is applied.
 130
 131 \subsection{Tagless-final embedding}
 132 Tagless-final embedding overcomes the limitations of standard shallow embedding.
 133 To upgrade to this embedding technique, the language constructs are changed from functions to type classes.
 134 For our language this results in the following type class definition.
 135
 136 \begin{lstHaskellLhstex}
 137 class Expr_t s where
 138         lit_t :: Int -> s
 139         add_t :: s -> s -> s
 140 \end{lstHaskellLhstex}
 141
 142 Semantics become data types\footnotemark{} implementing these type classes, resulting in the following instance for the evaluator.
 143 \footnotetext{%
 144         In this case \haskelllhstexinline{newtype}s are used instead of regular \haskelllhstexinline{data} declarations.
 145         A \haskelllhstexinline{newtype} is a special data type with a single constructor containing a single value only to which it is isomorphic.
 146         It allows the programmer to define separate class instances that the instances of the isomorphic type without any overhead.
 147         During compilation the constructor is completely removed~\citep[Sec.~4.2.3]{peyton_jones_haskell_2003}.
 148 }
 149
 150 \begin{lstHaskellLhstex}
 151 newtype Eval_t = E_t Int
 152
 153 instance Expr_t Eval_t where
 154         lit_t v               = E_t v
 155         add_t (E_t e1) (E_t e2) = E_t (e1 + e2)
 156 \end{lstHaskellLhstex}
 157
 158 Adding constructs---e.g.\ subtraction---just results in an extra type class and corresponding instances.
 159
 160 \begin{lstHaskellLhstex}
 161 class Sub_t s where
 162         sub_t :: s -> s -> s
 163
 164 instance Sub_t Eval_t where
 165         sub_t (E_t e1) (E_t e2) = E_t (e1 - e2)
 166 \end{lstHaskellLhstex}
 167
 168 Finally, adding semantics such as a printer over the language is achieved by providing a data type representing the semantics accompanied by instances for the language constructs.
 169
 170 \begin{lstHaskellLhstex}
 171 newtype Printer_t = P_t String
 172
 173 instance Expr_t Printer_t where
 174         lit_t i               = P_t (show i)
 175         add_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "+" ++ e2 ++ ")")
 176
 177 instance Sub_t Printer_t where
 178         sub_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "-" ++ e2 ++ ")")
 179 \end{lstHaskellLhstex}
 180
 181 \section{Lifting the backends}%
 182 Let us rethink the deeply embedded DSL design.
 183 Remember that in shallow embedding, the semantics are embedded in the language construct functions.
 184 Obtaining extensibility both in constructs and semantics was accomplished by abstracting the semantics functions to type classes, making the constructs overloaded in the semantics.
 185 In deep embedding, the constructs are embedded in the semantics functions instead.
 186 So, let us apply the same technique, i.e.\ make the semantics overloaded in the language constructs by abstracting the semantics functions to type classes.
 187 The same effect may be achieved when using similar techniques such as explicit dictionary passing or ML style modules.
 188 In our language this results in the following type class.
 189
 190 \begin{lstHaskellLhstex}
 191 class Eval_1 v where
 192         eval_1 :: v -> Int
 193
 194 data Expr_1
 195         = Lit_1 Int
 196         | Add_1 Expr_1 Expr_1
 197 \end{lstHaskellLhstex}
 198
 199 Implementing the semantics type class instances for the \haskelllhstexinline{Expr_1} data type is an elementary exercise.
 200 By a copy-paste and some modifications, we come to the following implementation.
 201
 202 \begin{lstHaskellLhstex}
 203 instance Eval_1 Expr_1 where
 204         eval_1 (Lit_1 v)     = v
 205         eval_1 (Add_1 e1 e2) = eval_1 e1 + eval_1 e2
 206 \end{lstHaskellLhstex}
 207
 208 Subtraction can now be defined in a separate data type, leaving the original data type intact.
 209 Instances for the additional semantics can now be implemented separately as instances of the type classes.
 210
 211 \begin{lstHaskellLhstex}
 212 data Sub_1 = Sub_1 Expr_1 Expr_1
 213
 214 instance Eval_1 Sub_1 where
 215         eval_1 (Sub_1 e1 e2) = eval_1 e1 - eval_1 e2
 216 \end{lstHaskellLhstex}
 217
 218 \section{Existential data types}%
 219
 220 The astute reader might have noticed that we have dissociated ourselves from the original data type.
 221 It is only possible to create an expression with a subtraction on the top level.
 222 The recursive knot is left untied and as a result, \haskelllhstexinline{Sub_1} can never be reached from an \haskelllhstexinline{Expr_1}.
 223
 224 Luckily, we can reconnect them by adding a special constructor to the \haskelllhstexinline{Expr_1} data type for housing extensions.
 225 It contains an existentially quantified~\citep{mitchell_abstract_1988} type with type class constraints~\citep{laufer_combining_1994,laufer_type_1996} for all semantics type classes~\citep[Chp.~6.4.6]{ghc_team_ghc_2021} to allow it to house not just subtraction but any future extension.
 226
 227 \begin{lstHaskellLhstex}
 228 data Expr_2
 229         =                      Lit_2 Int
 230         |                      Add_2 Expr_2 Expr_2
 231         | forall x. Eval_2 x => Ext_2 x
 232 \end{lstHaskellLhstex}
 233
 234 The implementation of the extension case in the semantics type classes is in most cases just a matter of calling the function for the argument as can be seen in the semantics instances shown below.
 235
 236 \begin{lstHaskellLhstex}
 237 instance Eval_2 Expr_2 where
 238         eval_2 (Lit_2  v)      = v
 239         eval_2 (Add_2  e1 e2)  = eval_2 e1 + eval_2 e2
 240         eval_2 (Ext_2  x)      = eval_2 x
 241 \end{lstHaskellLhstex}
 242
 243 Adding language construct extensions in different data types does mean that an extra \haskelllhstexinline{Ext_2} tag is introduced when using the extension.
 244 This burden can be relieved by creating a smart constructor for it that automatically wraps the extension with the \haskelllhstexinline{Ext_2} constructor so that it is of the type of the main data type.
 245
 246 \begin{lstHaskellLhstex}
 247 sub_2 :: Expr_2 -> Expr_2 -> Expr_2
 248 sub_2 e1 e2 = Ext_2 (Sub_2 e1 e2)
 249 \end{lstHaskellLhstex}
 250
 251 In our example this means that the programmer can write\footnotemark{}:
 252 \footnotetext{%
 253         Backticks are used to use functions or constructors in an infix fashion~\citep[Sec.~4.3.3]{peyton_jones_haskell_2003}.
 254 }
 255 \begin{lstHaskellLhstex}
 256 e2 :: Expr_2
 257 e2 = Lit_2 42 `sub_2` Lit_2 1
 258 \end{lstHaskellLhstex}
 259 instead of having to write
 260 \begin{lstHaskellLhstex}
 261 e2p :: Expr_2
 262 e2p = Ext_2 (Lit_2 42 `Sub_2` Lit_2 1)
 263 \end{lstHaskellLhstex}
 264
 265 \subsection{Unbraiding the semantics from the data}
 266 This approach does reveal a minor problem.
 267 Namely, that all semantics type classes are braided into our datatypes via the \haskelllhstexinline{Ext_2} constructor.
 268 Say if we add the printer again, the \haskelllhstexinline{Ext_2} constructor has to be modified to contain the printer type class constraint as well\footnote{Resulting in the following constructor: \haskelllhstexinline{forall x.(Eval_2 x, Print_2 x) => Ext_2 x}.}. %chktex 36
 269 Thus, if we add semantics, the main data type's type class constraints in the \haskelllhstexinline{Ext_2} constructor need to be updated.
 270 To avoid this, the type classes can be bundled in a type class alias or type class collection as follows.
 271
 272 \begin{lstHaskellLhstex}
 273 class (Eval_2 x, Print_2 x) => Semantics_2 x
 274
 275 data Expr_2
 276         =                           Lit_2 Int
 277         |                           Add_2 Expr_2 Expr_2
 278         | forall x. Semantics_2 x => Ext_2 x
 279 \end{lstHaskellLhstex}
 280
 281 The class alias removes the need for the programmer to visit the main data type when adding additional semantics.
 282 Unfortunately, the compiler does need to visit the main data type again.
 283 Some may argue that adding semantics happens less frequently than adding language constructs but in reality it means that we have to concede that the language is not as easily extensible in semantics as in language constructs.
 284 More exotic type system extensions such as constraint kinds~\citep{bolingbroke_constraint_2011,yorgey_giving_2012} can untangle the semantics from the data types by making the data types parametrised by the particular semantics.
 285 However, by adding some boilerplate, even without this extension, the language constructs can be parametrised by the semantics by putting the semantics functions in a data type.
 286 First the data types for the language constructs are parametrised by the type variable \haskelllhstexinline{d} as follows.
 287
 288 \begin{lstHaskellLhstex}
 289 data Expr_3 d
 290         =           Lit_3 Int
 291         |           Add_3 (Expr_3 d) (Expr_3 d)
 292         | forall x. Ext_3 (d x) x
 293
 294 data Sub_3 d = Sub_3 (Expr_3 d) (Expr_3 d)
 295 \end{lstHaskellLhstex}
 296
 297 The \haskelllhstexinline{d} type variable is inhabited by an explicit dictionary for the semantics, i.e.\ a witness to the class instance.
 298 Therefore, for all semantics type classes, a data type is made that contains the semantics function for the given semantics.
 299 This means that for \haskelllhstexinline{Eval_3}, a dictionary with the function \haskelllhstexinline{EvalDict_3} is defined, a type class \haskelllhstexinline{HasEval_3} for retrieving the function from the dictionary and an instance for \haskelllhstexinline{HasEval_3} for \haskelllhstexinline{EvalDict_3}.
 300
 301 \begin{lstHaskellLhstex}
 302 newtype EvalDict_3 v = EvalDict_3 (v -> Int)
 303
 304 class HasEval_3 d where
 305         getEval_3 :: d v -> v -> Int
 306
 307 instance HasEval_3 EvalDict_3 where
 308         getEval_3 (EvalDict_3 e) = e
 309 \end{lstHaskellLhstex}
 310
 311 The instances for the type classes change as well according to the change in the datatype.
 312 Given that there is a \haskelllhstexinline{HasEval_3} instance for the witness type \haskelllhstexinline{d}, we can provide an implementation of \haskelllhstexinline{Eval_3} for \haskelllhstexinline{Expr_3 d}.
 313
 314 \begin{lstHaskellLhstex}
 315 instance HasEval_3 d => Eval_3 (Expr_3 d) where
 316         eval_3 (Lit_3 v)     = v
 317         eval_3 (Add_3 e1 e2) = eval_3 e1 + eval_3 e2
 318         eval_3 (Ext_3 d x)   = getEval_3 d x
 319
 320 instance HasEval_3 d => Eval_3 (Sub_3 d) where
 321         eval_3 (Sub_3 e1 e2) = eval_3 e1 - eval_3 e2
 322 \end{lstHaskellLhstex}
 323
 324 Because the \haskelllhstexinline{Ext_3} constructor from \haskelllhstexinline{Expr_3} now contains a value of type \haskelllhstexinline{d}, the smart constructor for \haskelllhstexinline{Sub_3} must somehow come up with this value.
 325 To achieve this, a type class is introduced that allows the generation of such a dictionary.
 326
 327 \begin{lstHaskellLhstex}
 328 class GDict a where
 329         gdict :: a
 330 \end{lstHaskellLhstex}
 331
 332 This type class has individual instances for all semantics dictionaries, linking the class instance to the witness value.
 333 I.e.\ if there is a type class instance known, a witness value can be conjured using the \haskelllhstexinline{gdict} function.
 334
 335 \begin{lstHaskellLhstex}
 336 instance Eval_3 v => GDict (EvalDict_3 v) where
 337         gdict = EvalDict_3 eval_3
 338 \end{lstHaskellLhstex}
 339
 340 With these instances, the semantics function can be retrieved from the \haskelllhstexinline{Ext_3} constructor and in the smart constructors they can be generated as follows:
 341
 342 \begin{lstHaskellLhstex}
 343 sub_3 :: GDict (d (Sub_3 d)) => Expr_3 d -> Expr_3 d -> Expr_3 d
 344 sub_3 e1 e2 = Ext_3 gdict (Sub_3 e1 e2)
 345 \end{lstHaskellLhstex}
 346
 347 Finally, we reached the end goal, orthogonal extension of both language constructs as shown by adding subtraction to the language and in language semantics.
 348 Adding the printer can now be done without touching the original code as follows.
 349 First the printer type class, dictionaries and instances for \haskelllhstexinline{GDict} are defined.
 350
 351 \begin{lstHaskellLhstex}
 352 class Print_3 v where
 353         print_3 :: v -> String
 354
 355 newtype PrintDict_3 v = PrintDict_3 (v -> String)
 356
 357 class HasPrint_3 d where
 358         getPrint_3 :: d v -> v -> String
 359
 360 instance HasPrint_3 PrintDict_3 where
 361         getPrint_3 (PrintDict_3 e) = e
 362
 363 instance Print_3 v => GDict (PrintDict_3 v) where
 364         gdict = PrintDict_3 print_3
 365 \end{lstHaskellLhstex}
 366
 367 Then the instances for \haskelllhstexinline{Print_3} of all the language constructs can be defined.
 368
 369 \begin{lstHaskellLhstex}
 370 instance HasPrint_3 d => Print_3 (Expr_3 d) where
 371         print_3 (Lit_3 v)     = show v
 372         print_3 (Add_3 e1 e2) = "(" ++ print_3 e1 ++ "+" ++ print_3 e2 ++ ")"
 373         print_3 (Ext_3 d x)   = getPrint_3 d x
 374 instance HasPrint_3 d => Print_3 (Sub_3 d) where
 375         print_3 (Sub_3 e1 e2) = "(" ++ print_3 e1 ++ "-" ++ print_3 e2 ++ ")"
 376 \end{lstHaskellLhstex}
 377
 378 \section{Transformation semantics}
 379 Most semantics convert a term to some final representation and can be expressed just by functions on the cases.
 380 However, the implementation of semantics such as transformation or optimisation may benefit from a so-called intentional analysis of the abstract syntax tree.
 381 In shallow embedding, the implementation for these types of semantics is difficult because there is no tangible abstract syntax tree.
 382 In off-the-shelf deep embedding this is effortless since the function can pattern match on the constructor or structures of constructors.
 383
 384 To demonstrate intensional analyses in classy deep embedding we write an optimizer that removes addition and subtraction by zero.
 385 In classy deep embedding, adding new semantics means first adding a new type class housing the function including the machinery for the extension constructor.
 386
 387 \begin{lstHaskellLhstex}
 388 class Opt_3 v where
 389         opt_3 :: v -> v
 390
 391 newtype OptDict_3 v = OptDict_3 (v -> v)
 392
 393 class HasOpt_3 d where
 394         getOpt_3 :: d v -> v -> v
 395
 396 instance HasOpt_3 OptDict_3 where
 397         getOpt_3 (OptDict_3 e) = e
 398
 399 instance Opt_3 v => GDict (OptDict_3 v) where
 400         gdict = OptDict_3 opt_3
 401 \end{lstHaskellLhstex}
 402
 403 The implementation of the optimizer for the \haskelllhstexinline{Expr_3} data type is no complicated task.
 404 The only interesting bit occurs in the \haskelllhstexinline{Add_3} constructor, where we pattern match on the optimised children to determine whether an addition with zero is performed.
 405 If this is the case, the addition is removed.
 406
 407 \begin{lstHaskellLhstex}
 408 instance HasOpt_3 d => Opt_3 (Expr_3 d) where
 409         opt_3 (Lit_3 v)     = Lit_3 v
 410         opt_3 (Add_3 e1 e2) = case (opt_3 e1, opt_3 e2) of
 411                 (Lit_3 0, e2p    ) -> e2p
 412                 (e1p,     Lit_3 0) -> e1p
 413                 (e1p,     e2p    ) -> Add_3 e1p e2p
 414         opt_3 (Ext_3 d x)   = Ext_3 d (getOpt_3 d x)
 415 \end{lstHaskellLhstex}
 416
 417 Replicating this for the \haskelllhstexinline{Opt_3} instance of \haskelllhstexinline{Sub_3} seems a clear-cut task at first glance.
 418
 419 \begin{lstHaskellLhstex}
 420 instance HasOpt_3 d => Opt_3 (Sub_3 d) where
 421         opt_3 (Sub_3 e1 e2) = case (opt_3 e1, opt_3 e2) of
 422                 (e1p, Lit_3 0) -> e1p
 423                 (e1p, e2p    ) -> Sub_3 e1p e2p
 424 \end{lstHaskellLhstex}
 425
 426 Unsurprisingly, this code is rejected by the compiler.
 427 When a literal zero is matched as the right-hand side of a subtraction, the left-hand side of type \haskelllhstexinline{Expr_3} is returned.
 428 However, the type signature of the function dictates that it should be of type \haskelllhstexinline{Sub_3}.
 429 To overcome this problem we add a convolution constructor.
 430
 431 \subsection{Convolution}
 432 Adding a loopback case or convolution constructor to \haskelllhstexinline{Sub_3} allows the removal of the \haskelllhstexinline{Sub_3} constructor while remaining the \haskelllhstexinline{Sub_3} type.
 433 It should be noted that a loopback case is \emph{only} required if the transformation actually removes tags.
 434 This changes the \haskelllhstexinline{Sub_3} data type as follows.
 435
 436 \begin{lstHaskellLhstex}
 437 data Sub_4 d
 438         = Sub_4     (Expr_4 d) (Expr_4 d)
 439         | SubLoop_4 (Expr_4 d)
 440
 441 instance HasEval_4 d => Eval_4 (Sub_4 d) where
 442         eval_4 (Sub_4     e1 e2) = eval_4 e1 - eval_4 e2
 443         eval_4 (SubLoop_4 e1)    = eval_4 e1
 444 \end{lstHaskellLhstex}
 445
 446 With this loopback case in the toolbox, the following \haskelllhstexinline{Sub} instance optimises away subtraction with zero literals.
 447
 448 \begin{lstHaskellLhstex}
 449 instance HasOpt_4 d => Opt_4 (Sub_4 d) where
 450         opt_4 (Sub_4 e1 e2) = case (opt_4 e1, opt_4 e2) of
 451                 (e1p, Lit_4 0) -> SubLoop_4 e1p
 452                 (e1p, e2p    ) -> Sub_4 e1p e2p
 453         opt_4 (SubLoop_4 e) =  SubLoop_4 (opt_4 e)
 454 \end{lstHaskellLhstex}
 455
 456 \subsection{Pattern matching}
 457 Pattern matching within datatypes and from an extension to the main data type works out of the box.
 458 Cross-extensional pattern matching on the other hand---matching on a particular extension---is something that requires a bit of extra care.
 459 Take for example negation propagation and double negation elimination.
 460 Pattern matching on values with an existential type is not possible without leveraging dynamic typing~\citep{abadi_dynamic_1991,baars_typing_2002}.
 461 To enable dynamic typing support, the \haskelllhstexinline{Typeable} type class as provided by \haskelllhstexinline{Data.Dynamic}~\citep{ghc_team_datadynamic_2021} is added to the list of constraints in all places where we need to pattern match across extensions.
 462 As a result, the \haskelllhstexinline{Typeable} type class constraints are added to the quantified type variable \haskelllhstexinline{x} of the \haskelllhstexinline{Ext_4} constructor and to \haskelllhstexinline{d}s in the smart constructors.
 463
 464 \begin{lstHaskellLhstex}
 465 data Expr_4 d
 466         =                         Lit_4 Int
 467         |                         Add_4 (Expr_4 d) (Expr_4 d)
 468         | forall x. Typeable x => Ext_4 (d x) x
 469 \end{lstHaskellLhstex}
 470
 471 First let us add negation to the language by defining a datatype representing it.
 472 Negation elimination requires the removal of negation constructors, so a convolution constructor is defined as well.
 473
 474 \begin{lstHaskellLhstex}
 475 data Neg_4 d
 476         = Neg_4     (Expr_4 d)
 477         | NegLoop_4 (Expr_4 d)
 478
 479 neg_4 :: (Typeable d, GDict (d (Neg_4 d))) => Expr_4 d -> Expr_4 d
 480 neg_4 e = Ext_4 gdict (Neg_4 e)
 481 \end{lstHaskellLhstex}
 482
 483 The evaluation and printer instances for the \haskelllhstexinline{Neg_4} datatype are defined as follows.
 484
 485 \begin{lstHaskellLhstex}
 486 instance HasEval_4 d => Eval_4 (Neg_4 d) where
 487         eval_4 (Neg_4     e) = negate (eval_4 e)
 488         eval_4 (NegLoop_4 e) = eval_4 e
 489
 490 instance HasPrint_4 d => Print_4 (Neg_4 d) where
 491         print_4 (Neg_4     e) = "(~" ++ print_4 e ++ ")"
 492         print_4 (NegLoop_4 e) = print_4 e
 493 \end{lstHaskellLhstex}
 494
 495 The \haskelllhstexinline{Opt_4} instance contains the interesting bit.
 496 If the sub expression of a negation is an addition, negation is propagated downwards.
 497 If the sub expression is again a negation, something that can only be found out by a dynamic pattern match, it is replaced by a \haskelllhstexinline{NegLoop_4} constructor.
 498
 499 \begin{lstHaskellLhstex}
 500 instance (Typeable d, GDict (d (Neg_4 d)), HasOpt_4 d) => Opt_4 (Neg_4 d) where
 501         opt_4 (Neg_4 (Add_4 e1 e2))
 502                 = NegLoop_4 (Add_4 (opt_4 (neg_4 e1)) (opt_4 (neg_4 e2)))
 503         opt_4 (Neg_4 (Ext_4 d x))
 504                 = case fromDynamic (toDyn (getOpt_4 d x)) of
 505                         Just (Neg_4 e) -> NegLoop_4 e
 506                         _             -> Neg_4 (Ext_4 d (getOpt_4 d x))
 507         opt_4 (Neg_4     e) = Neg_4 (opt_4 e)
 508         opt_4 (NegLoop_4 e) = NegLoop_4 (opt_4 e)
 509 \end{lstHaskellLhstex}
 510
 511 Loopback cases do make cross-extensional pattern matching less modular in general.
 512 For example, \haskelllhstexinline{Ext_4 d (SubLoop_4 (Lit_4 0))} is equivalent to \haskelllhstexinline{Lit_4 0} in the optimisation semantics and would require an extra pattern match.
 513 Fortunately, this problem can be mitigated---if required---by just introducing an additional optimisation semantics that removes loopback cases.
 514 Luckily, one does not need to resort to these arguably blunt matters often.
 515 Dependent language functionality often does not need to span extensions, i.e.\ it is possible to group them in the same data type.
 516
 517 \subsection{Chaining semantics}
 518 Now that the data types are parametrised by the semantics a final problem needs to be overcome.
 519 The data type is parametrised by the semantics, thus, using multiple semantics, such as evaluation after optimising is not straightforwardly possible.
 520 Luckily, a solution is readily at hand: introduce an ad-hoc combination semantics.
 521
 522 \begin{lstHaskellLhstex}
 523 data OptPrintDict_4 v = OPD_4 (OptDict_4 v) (PrintDict_4 v)
 524
 525 instance HasOpt_4   OptPrintDict_4 where
 526         getOpt_4   (OPD_4 v _) = getOpt_4 v
 527 instance HasPrint_4 OptPrintDict_4 where
 528         getPrint_4 (OPD_4 _ v) = getPrint_4 v
 529
 530 instance (Opt_4 v, Print_4 v) => GDict (OptPrintDict_4 v) where
 531         gdict = OPD_4 gdict gdict
 532 \end{lstHaskellLhstex}
 533
 534 And this allows us to write \haskelllhstexinline{print_4 (opt_4 e1)} resulting in \verb|"((~42)+(~38))"| when \haskelllhstexinline{e1} represents $(\sim(42+38))-0$ and is thus defined as follows.
 535
 536 \begin{lstHaskellLhstex}
 537 e1 :: Expr_4  OptPrintDict_4
 538 e1 = neg_4 (Lit_4 42 `Add_4` Lit_4 38) `sub_4` Lit_4 0
 539 \end{lstHaskellLhstex}
 540
 541 When using classy deep embedding to the fullest, the ability of the compiler to infer very general types expires.
 542 As a consequence, defining reusable expressions that are overloaded in their semantics requires quite some type class constraints that cannot be inferred by the compiler (yet) if they use many extensions.
 543 Solving this remains future work.
 544 For example, the expression $\sim(42-38)+1$ has to be defined as:
 545
 546 \begin{lstHaskellLhstex}
 547 e3 :: (Typeable d, GDict (d (Neg_4 d)), GDict (d (Sub_4 d))) => Expr_4 d
 548 e3 = neg_4 (Lit_4 42 `sub_4` Lit_4 38) `Add_4` Lit_4 1
 549 \end{lstHaskellLhstex}
 550
 551 \section{Generalised algebraic data types}%
 552 Generalised algebraic data types (GADTs) are enriched data types that allow the type instantiation of the constructor to be explicitly defined~\citep{cheney_first-class_2003,hinze_fun_2003}.
 553 Leveraging GADTs, deeply embedded DSLs can be made statically type safe even when different value types are supported.
 554 Even when GADTs are not supported natively in the language, they can be simulated using embedding-projection pairs or equivalence types~\citep[Sec.~2.2]{cheney_lightweight_2002}.
 555 Where some solutions to the expression problem do not easily generalise to GADTs (see \cref{sec:cde:related}), classy deep embedding does.
 556 Generalising the data structure of our DSL is fairly straightforward and to spice things up a bit, we add an equality and boolean not language construct.
 557 To make the existing DSL constructs more general, we relax the types of those constructors.
 558 For example, operations on integers now work on all numerals instead.
 559 Moreover, the \haskelllhstexinline{Lit_g} constructor can be used to lift values of any type to the DSL domain as long as they have a \haskelllhstexinline{Show} instance, required for the printer.
 560 Since some optimisations on \haskelllhstexinline{Not_g} remove constructors and therefore use cross-extensional pattern matches, \haskelllhstexinline{Typeable} constraints are added to \haskelllhstexinline{a}.
 561 Furthermore, because the optimisations for \haskelllhstexinline{Add_g} and \haskelllhstexinline{Sub_g} are now more general, they do not only work for \haskelllhstexinline{Int}s but for any type with a \haskelllhstexinline{Num} instance, the \haskelllhstexinline{Eq} constraint is added to these constructors as well.
 562 Finally, not to repeat ourselves too much, we only show the parts that substantially changed.
 563 The omitted definitions and implementation can be found in \cref{sec:cde:appendix}.
 564
 565 \begin{lstHaskellLhstex}
 566 data Expr_g d a where
 567         Lit_g     :: Show a        => a -> Expr_g d a
 568         Add_g     :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Expr_g d a
 569         Ext_g     :: Typeable x    => d x -> x a -> Expr_g d a
 570 data Neg_g  d a where
 571         Neg_g     :: (Typeable a, Num a) => Expr_g d a -> Neg_g d a
 572         NegLoop_g ::                        Expr_g d a -> Neg_g d a
 573 data Not_g  d a where
 574         Not_g     :: Expr_g d Bool -> Not_g d Bool
 575         NotLoop_g :: Expr_g d a    -> Not_g d a
 576 \end{lstHaskellLhstex}
 577
 578 The smart constructors for the language extensions inherit the class constraints of their data types and include a \haskelllhstexinline{Typeable} constraint on the \haskelllhstexinline{d} type variable for it to be usable in the \haskelllhstexinline{Ext_g} constructor as can be seen in the smart constructor for \haskelllhstexinline{Neg_g}:
 579
 580 \begin{lstHaskellLhstex}
 581 neg_g  :: (Typeable d, GDict (d (Neg_g d)), Typeable a, Num a) =>
 582         Expr_g d a -> Expr_g d a
 583 neg_g e = Ext_g gdict (Neg_g e)
 584
 585 not_g  :: (Typeable d,  GDict (d (Not_g d))) =>
 586         Expr_g d Bool -> Expr_g d Bool
 587 not_g e = Ext_g gdict (Not_g e)
 588 \end{lstHaskellLhstex}
 589
 590 Upgrading the semantics type classes to support GADTs is done by an easy textual search and replace.
 591 All occurrences of \haskelllhstexinline{v} are now parametrised by type variable \haskelllhstexinline{a}:
 592
 593 \begin{lstHaskellLhstex}
 594 class Eval_g  v where
 595         eval_g  :: v a -> a
 596 class Print_g v where
 597         print_g :: v a -> String
 598 class Opt_g   v where
 599         opt_g   :: v a -> v a
 600 \end{lstHaskellLhstex}
 601
 602 Now that the shape of the type classes has changed, the dictionary data types and the type classes need to be adapted as well.
 603 The introduced type variable \haskelllhstexinline{a} is not an argument to the type class, so it should not be an argument to the dictionary data type.
 604 To represent this type class function, a rank-2 polymorphic function is needed~\citep[Chp.~6.4.15]{ghc_team_ghc_2021}\citep{odersky_putting_1996}.
 605 Concretely, for the evaluatior this results in the following definitions:
 606
 607 \begin{lstHaskellLhstex}
 608 newtype EvalDict_g v = EvalDict_g (forall a. v a -> a)
 609 class HasEval_g d where
 610         getEval_g :: d v -> v a -> a
 611 instance HasEval_g EvalDict_g where
 612         getEval_g (EvalDict_g e) = e
 613 \end{lstHaskellLhstex}
 614
 615 The \haskelllhstexinline{GDict} type class is general enough, so the instances can remain the same.
 616 The \haskelllhstexinline{Eval_g} instance of \haskelllhstexinline{GDict} looks as follows:
 617
 618 \begin{lstHaskellLhstex}
 619 instance Eval_g v => GDict (EvalDict_g v) where
 620         gdict = EvalDict_g eval_g
 621 \end{lstHaskellLhstex}
 622
 623 Finally, the implementations for the instances can be ported without complication show using the optimisation instance of \haskelllhstexinline{Not_g}:
 624
 625 \begin{lstHaskellLhstex}
 626 instance (Typeable d, GDict (d (Not_g d)), HasOpt_g d) => Opt_g (Not_g d) where
 627         opt_g (Not_g (Ext_g d x))
 628                 = case fromDynamic (toDyn (getOpt_g d x)) :: Maybe (Not_g d Bool) of
 629                         Just (Not_g e) -> NotLoop_g e
 630                         _             -> Not_g (Ext_g d (getOpt_g d x))
 631         opt_g (Not_g e)     = Not_g (opt_g e)
 632         opt_g (NotLoop_g e) = NotLoop_g (opt_g e)
 633 \end{lstHaskellLhstex}
 634
 635 \section{Conclusion}%
 636
 637 Classy deep embedding is a novel organically grown embedding technique that alleviates deep embedding from the extensibility problem in most cases.
 638
 639 By abstracting the semantics functions to type classes they become overloaded in the language constructs.
 640 Thus, making it possible to add new language constructs in a separate type.
 641 These extensions are brought together in a special extension constructor residing in the main data type.
 642 This extension case is overloaded by the language construct using a data type containing the class dictionary.
 643 As a result, orthogonal extension is possible for language constructs and semantics using only little syntactic overhead or type annotations.
 644 The basic technique only requires---well established through history and relatively standard---existential data types.
 645 However, if needed, the technique generalises to \glspl{GADT} as well, adding rank-2 types to the list of type system requirements as well.
 646 Finally, the abstract syntax tree remains observable which makes it suitable for intensional analyses, albeit using occasional dynamic typing for truly cross-extensional transformations.
 647
 648 Defining reusable expressions overloaded in semantics or using multiple semantics on a single expression requires some boilerplate still, getting around this remains future work.
 649
 650 \section{Related work}%
 651 \label{sec:cde:related}
 652
 653 Embedded DSL techniques in functional languages have been a topic of research for many years, thus we do not claim a complete overview of related work.
 654
 655 Clearly, classy deep embedding bears most similarity to the \emph{Datatypes \`a la Carte}~\citep{swierstra_data_2008}.
 656 In Swierstra's approach, semantics are lifted to type classes similarly to classy deep embedding.
 657 Each language construct is their own datatype parametrised by a type parameter.
 658 This parameter contains some type level representation of language constructs that are in use.
 659 In classy deep embedding, extensions do not have to be enumerated at the type level but are captured in the extension case.
 660 Because all the constructs are expressed in the type system, nifty type system tricks need to be employed to convince the compiler that everything is type safe and the class constraints can be solved.
 661 Furthermore, it requires some boilerplate code such as functor instances for the data types.
 662 In return, pattern matching is easier and does not require dynamic typing.
 663 Classy deep embedding only strains the programmer with writing the extension case for the main data type and the occasional loopback constructor.
 664
 665 L\"oh and Hinze proposed a language extension that allows open data types and open functions, i.e.\ functions and data types that can be extended with more cases later on~\citep{loh_open_2006}.
 666 They hinted at the possibility of using type classes for open functions but had serious concerns that pattern matching would be crippled because constructors are becoming types, thus ultimately becoming impossible to type.
 667 In contrast, this paper shows that pattern matching is easily attainable---albeit using dynamic types---and that the terms can be typed without complicated type system extensions.
 668
 669 A technique similar to classy deep embedding was proposed by Najd and Peyton~Jones to tackle a slightly different problem, namely that of reusing a data type for multiple purposes in a slightly different form~\citep{najd_trees_2017}.
 670 For example to decorate the abstract syntax tree of a compiler differently for each phase of the compiler.
 671 They propose to add an extension descriptor as a type variable to a data type and a type family that can be used to decorate constructors with extra information and add additional constructors to the data type using an extension constructor.
 672 Classy deep embedding works similarly but uses existentially quantified type variables to describe possible extensions instead of type variables and type families.
 673 In classy deep embedding, the extensions do not need to be encoded in the type system and less boilerplate is required.
 674 Furthermore, pattern matching on extensions becomes a bit more complicated but in return it allows for multiple extensions to be added orthogonally and avoids the necessity for type system extensions.
 675
 676 Tagless-final embedding is the shallowly embedded counterpart of classy deep embedding and was invented for the same purpose; overcoming the issues with standard shallow embedding~\citep{carette_finally_2009}.
 677 Classy deep embedding was organically grown from observing the evolution of tagless-final embedding.
 678 The main difference between tagless-final embedding and classy deep embedding---and in general between shallow and deep embedding---is that intensional analyses of the abstract syntax tree is more difficult because there is no tangible abstract syntax tree data structure.
 679 In classy deep embedding, it is possible to define transformations even across extensions.
 680
 681 Hybrid approaches between deep and shallow embedding exist as well.
 682 For example, Svenningson et al.\ show that by expressing the deeply embedded language in a shallowly embedded core language, extensions can be made orthogonally as well~\citep{svenningsson_combining_2013}.
 683 This paper differs from those approaches in the sense that it does not require a core language in which all extensions need to be expressible.
 684
 685 \section*{Acknowledgements}
 686 This research is partly funded by the Royal Netherlands Navy.
 687 Furthermore, I would like to thank Pieter and Rinus for the fruitful discussions, Ralf for inspiring me to write a functional pearl, and the anonymous reviewers for their valuable and honest comments.
 688
 689 \begin{subappendices}
 690 \section{Data types and definitions}%
 691 \label{sec:cde:appendix}
 692 \begin{lstHaskellLhstex}[caption={Data type definitions.}]
 693 data Sub_g d a where
 694         Sub_g     :: (Eq a, Num a)      => Expr_g d a -> Expr_g d a -> Sub_g d a
 695         SubLoop_g ::                       Expr_g d a -> Sub_g d a
 696
 697 data Eq_g d a  where
 698         Eq_g      :: (Typeable a, Eq a) => Expr_g d a -> Expr_g d a -> Eq_g d Bool
 699         EqLoop_g  ::                       Expr_g d a -> Eq_g d a
 700 \end{lstHaskellLhstex}
 701
 702 \begin{lstHaskellLhstex}[caption={Smart constructions.}]
 703 sub_g :: (Typeable d, GDict (d (Sub_g d)), Eq a, Num a) =>
 704         Expr_g d a -> Expr_g d a -> Expr_g d a
 705 sub_g e1 e2 = Ext_g gdict (Sub_g e1 e2)
 706
 707 eq_g :: (Typeable d, GDict (d (Eq_g d)), Eq a, Typeable a) =>
 708         Expr_g d a -> Expr_g d a -> Expr_g d Bool
 709 eq_g e1 e2 = Ext_g gdict (Eq_g e1 e2)
 710 \end{lstHaskellLhstex}
 711
 712 \begin{lstHaskellLhstex}[caption={Semantics classes and data types.}]
 713 newtype PrintDict_g v = PrintDict_g (forall a.v a -> String)
 714
 715 class HasPrint_g d where
 716         getPrint_g :: d v -> v a -> String
 717
 718 instance HasPrint_g PrintDict_g where
 719         getPrint_g (PrintDict_g e) = e
 720
 721 newtype OptDict_g v = OptDict_g (forall a.v a -> v a)
 722
 723 class HasOpt_g d where
 724         getOpt_g :: d v -> v a -> v a
 725
 726 instance HasOpt_g OptDict_g where
 727         getOpt_g (OptDict_g e) = e
 728 \end{lstHaskellLhstex}
 729
 730 \begin{lstHaskellLhstex}[caption={\texorpdfstring{\haskelllhstexinline{GDict}}{GDict} instances}]
 731 instance Print_g v => GDict (PrintDict_g v) where
 732         gdict = PrintDict_g print_g
 733 instance Opt_g v   => GDict (OptDict_g v)   where
 734         gdict = OptDict_g opt_g
 735 \end{lstHaskellLhstex}
 736
 737 \begin{lstHaskellLhstex}[caption={Evaluator instances}]
 738 instance HasEval_g d => Eval_g (Expr_g d) where
 739         eval_g (Lit_g v)     = v
 740         eval_g (Add_g e1 e2) = eval_g e1 + eval_g e2
 741         eval_g (Ext_g d x)   = getEval_g d x
 742
 743 instance HasEval_g d => Eval_g (Sub_g d) where
 744         eval_g (Sub_g     e1 e2) = eval_g e1 - eval_g e2
 745         eval_g (SubLoop_g e)     = eval_g e
 746
 747 instance HasEval_g d => Eval_g (Neg_g d) where
 748         eval_g (Neg_g     e) = negate (eval_g e)
 749         eval_g (NegLoop_g e) = eval_g e
 750
 751 instance HasEval_g d => Eval_g (Eq_g d) where
 752         eval_g (Eq_g     e1 e2) = eval_g e1 == eval_g e2
 753         eval_g (EqLoop_g e)     = eval_g e
 754
 755 instance HasEval_g d => Eval_g (Not_g d) where
 756         eval_g (Not_g     e) = not (eval_g e)
 757         eval_g (NotLoop_g e) = eval_g e
 758 \end{lstHaskellLhstex}
 759
 760 \begin{lstHaskellLhstex}[caption={Printer instances}]
 761 instance HasPrint_g d => Print_g (Expr_g d) where
 762         print_g (Lit_g v)     = show v
 763         print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")"
 764         print_g (Ext_g d x)   = getPrint_g d x
 765
 766 instance HasPrint_g d => Print_g (Sub_g d) where
 767         print_g (Sub_g     e1 e2) = "(" ++ print_g e1 ++ "-" ++ print_g e2 ++ ")"
 768         print_g (SubLoop_g e)     = print_g e
 769
 770 instance HasPrint_g d => Print_g (Neg_g d) where
 771         print_g (Neg_g     e) = "(negate " ++ print_g e ++ ")"
 772         print_g (NegLoop_g e) = print_g e
 773
 774 instance HasPrint_g d => Print_g (Eq_g d) where
 775         print_g (Eq_g     e1 e2) = "(" ++ print_g e1 ++ "==" ++ print_g e2 ++ ")"
 776         print_g (EqLoop_g e)     = print_g e
 777
 778 instance HasPrint_g d => Print_g (Not_g d) where
 779         print_g (Not_g     e) = "(not " ++ print_g e ++ ")"
 780         print_g (NotLoop_g e) = print_g e
 781 \end{lstHaskellLhstex}
 782
 783 \begin{lstHaskellLhstex}[caption={Optimisation instances}]
 784 instance HasOpt_g d => Opt_g (Expr_g d) where
 785         opt_g (Lit_g v)     = Lit_g v
 786         opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of
 787                 (Lit_g 0, e2p    ) -> e2p
 788                 (e1p,     Lit_g 0) -> e1p
 789                 (e1p,     e2p    ) -> Add_g e1p e2p
 790         opt_g (Ext_g d x)   = Ext_g d (getOpt_g d x)
 791
 792 instance HasOpt_g d => Opt_g (Sub_g d) where
 793         opt_g (Sub_g     e1 e2) = case (opt_g e1, opt_g e2) of
 794                 (e1p, Lit_g 0) -> SubLoop_g e1p
 795                 (e1p, e2p    ) -> Sub_g e1p e2p
 796         opt_g (SubLoop_g e) = SubLoop_g (opt_g e)
 797
 798 instance (Typeable d, GDict (d (Neg_g d)), HasOpt_g d) => Opt_g (Neg_g d) where
 799         opt_g (Neg_g (Add_g e1 e2))
 800                 = NegLoop_g (Add_g (opt_g (neg_g e1)) (opt_g (neg_g e2)))
 801         opt_g (Neg_g (Ext_g d x))
 802                 = case fromDynamic (toDyn (getOpt_g d x)) of
 803                         Just (Neg_g e) -> NegLoop_g e
 804                         _             -> Neg_g (Ext_g d (getOpt_g d x))
 805         opt_g (Neg_g     e) = Neg_g (opt_g e)
 806         opt_g (NegLoop_g e) = NegLoop_g (opt_g e)
 807
 808 instance HasOpt_g d => Opt_g (Eq_g d) where
 809         opt_g (Eq_g     e1 e2) = Eq_g (opt_g e1) (opt_g e2)
 810         opt_g (EqLoop_g e)     = EqLoop_g (opt_g e)
 811 \end{lstHaskellLhstex}
 812
 813 \section{Chaining semantics (reprise)}\label{sec:classy_reprise}
 814 \todo{Verbeteren}
 815 One of the unique selling points of this novel \gls{DSL} embedding technique is that it, in its basic form, does not require advanced type system extensions.
 816 However, while generalising the technique to \glspl{GADT} arguably unleashes a cesspool of \emph{unsafe} compiler extensions.
 817 If we are willing to work with extensions, much of the boilerplate can be either generated or omitted entirely.
 818
 819 The \gls{DSL} datatype is parametrised by a type variable providing a witness to the view on the language.
 820 Using constraint kinds, we can make these witnesses explicit, tying into \gls{HASKELL}'s type system immediately.
 821 Furthermore, when resorting to the \GHCmod{DataKinds} and \GHCmod{PolyKinds} extensions, this constraint does not necessarily has to be a single constraint, using a \haskelllhstexinline{Record} auxiliary type, we can encode list of witnesses.
 822 The data type for this list of witnesses is \haskelllhstexinline{Record}.
 823 Record is parametrised by two type variables, the first type variable (\haskelllhstexinline{dt}) is the data type on which the constraints can be applied.
 824 The second type variable (\haskelllhstexinline{clist}) is the list of constraints itself.
 825 It is not just a list of \haskelllhstexinline{Constraint} but it is a list containing constraint constructors that will, when given a type of kind \haskelllhstexinline{k}, produce a constraint.
 826 This means that when \haskelllhstexinline{Cons} is pattern matched, the type class constraint for \haskelllhstexinline{c dt} can be solved by the compiler.
 827
 828 \begin{lstHaskellLhstex}[caption={Data type for a list of constraints}]
 829 data Record (dt :: k) (clist :: [k -> Constraint]) where
 830         Nil  :: Record dt '[]
 831     Cons :: c dt => Record dt cs -> Record dt (c ': cs)
 832 \end{lstHaskellLhstex}
 833
 834 To incorporate this type in the \haskelllhstexinline{Expr} type, the \haskelllhstexinline{Ext} constructor changes as follows:
 835
 836 \begin{lstHaskellLhstex}[caption={Data type for a list of constraints}]
 837 data Expr c
 838     = Lit Int
 839     | Add (Expr c) (Expr c)
 840     | Ext (Record x c) x
 841 \end{lstHaskellLhstex}
 842
 843 Furthermore, we define a type class that allows us to extract explicit dictionaries \haskelllhstexinline{Dict} from these records if the constraint can is present in the list.
 844
 845 \begin{lstHaskellLhstex}[caption={Membership functions for constraints}]
 846 class c `In` cs where
 847     project :: Record dt cs -> Dict (c dt)
 848 instance {-# OVERLAPPING #-} c `In` (c ': cs) where
 849     project (Cons _) = Dict
 850 instance {-# OVERLAPPING #-} c `In` cs => c `In` (b ': cs) where
 851     project (Cons xs) = project xs
 852 \end{lstHaskellLhstex}
 853
 854 Finally, creating these \haskelllhstexinline{Record} witnesses is a chore so this can be automated as well using a \haskelllhstexinline{CreateRecord} type class that will create a record structure cell by cell if and only if all type class constraints are available.
 855
 856 \begin{lstHaskellLhstex}[caption={Membership functions for constraints}]
 857 class CreateRecord dt c where
 858     createRecord :: Record dt c
 859 instance CreateRecord d '[] where
 860     createRecord = Nil
 861 instance (c (d c0), CreateRecord (d c0) cs) =>
 862                 CreateRecord (d c0) (c ': cs) where
 863     createRecord = Cons createRecord
 864 \end{lstHaskellLhstex}
 865
 866 The class constraints for the interpretation instances can now be greatly simplified, as shown in the evaluation instance for \haskelllhstexinline{Expr}.
 867 The implementation remains the same, only that for the extension case, a trick needs to be applied to convince the compiler of the correct instances.
 868 Using \haskelllhstexinline{`In`}'s \haskelllhstexinline{project} function, a dictionary can be brought into scope.
 869 This dictionary can then subsequently be used to apply the type class function on the extension using the \haskelllhstexinline{withDict} function from the \haskelllhstexinline{Data.Constraint} library\footnote{\haskelllhstexinline{withDict :: Dict c -> (c => r) -> r}}.
 870 The \GHCmod{ScopedTypeVariables} extension is used to make sure the existentially quantified type variable for the extension is matched to the type of the dictionary.
 871
 872 \begin{lstHaskellLhstex}[caption={Evaluation instance for the main data type}]
 873 class Eval v where eval :: v -> Int
 874
 875 instance Eval `In` s => Eval (Expr s) where
 876     eval (Lit i) = i
 877     eval (Add l r) = eval l + eval r
 878     eval (Ext r (e :: x)) = withDict (project r :: Dict (Eval x)) eval e
 879 \end{lstHaskellLhstex}
 880
 881 Smart constructors need to be adapted as well, as can be seen from the smart constructor \haskelllhstexinline{subst}.
 882
 883 \begin{lstHaskellLhstex}[caption={Substitution smart constructor}]
 884 subst :: (Typeable c, CreateRecord (Subt c) c) => Expr c -> Expr c -> Expr c
 885 subst l r = Ext createRecord (l `Subt` r)
 886 \end{lstHaskellLhstex}
 887
 888 Finally, defining terms in the language is can be done immediately if the interpretations are known.
 889 For example, if we want to print the term $~(~(42+(38-4)))$, we can define it as follows.
 890
 891 \begin{lstHaskellLhstex}[caption={Substitution smart constructor}]
 892 e0 :: Expr '[Print,Opt]
 893 e0 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4)))
 894 \end{lstHaskellLhstex}
 895
 896 It is also possible to define terms in the \gls{DSL} as being overloaded in the interpretation.
 897 This does require enumerating all the \haskelllhstexinline{CreateRecord} type classes for every extension.
 898 At the call site, the concrete list of constraints must be known.
 899
 900 \begin{lstHaskellLhstex}[caption={Substitution smart constructor}]
 901 e1 :: (Typeable c
 902         , CreateRecord (Neg c) c
 903         , CreateRecord (Subst c) c)
 904         => Expr c
 905 e1 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4)))
 906 \end{lstHaskellLhstex}
 907
 908 \end{subappendices}
 909
 910 \input{subfilepostamble}
 911 \end{document}