dsl/class.tex

   1 \documentclass[../thesis.tex]{subfiles}
   2
   3 \input{subfilepreamble}
   4
   5 \setcounter{chapter}{1}
   6
   7 \begin{document}
   8 \input{subfileprefix}
   9 \chapter{Deep embedding with class}%
  10 \label{chp:classy_deep_embedding}
  11 \begin{chapterabstract}
  12         The two flavours of \gls{DSL} embedding are shallow and deep embedding.
  13         In functional languages, shallow embedding models the language constructs as functions in which the semantics are embedded.
  14         Adding semantics is therefore cumbersome while adding constructs is a breeze.
  15         Upgrading the functions to type classes lifts this limitation to a certain extent.
  16
  17         Deeply embedded languages represent their language constructs as data and the semantics are functions on it.
  18         As a result, the language constructs are embedded in the semantics, hence adding new language constructs is laborious where adding semantics is trouble free.
  19
  20         This chapter shows that by abstracting the semantics functions in deep embedding to type classes, it is possible to easily add language constructs as well.
  21         So-called classy deep embedding results in \glspl{DSL} that are extensible both in language constructs and in semantics while maintaining a concrete abstract syntax tree.
  22         Additionally, little type-level trickery or complicated boilerplate code is required to achieve this.
  23 \end{chapterabstract}
  24
  25 \section{Introduction}
  26 The two flavours of \gls{DSL} embedding are deep and shallow embedding \citep{boulton_experience_1992}.
  27 In \gls{FP} languages, shallow embedding models language constructs as functions in the host language.
  28 As a result, adding new language constructs---extra functions---is easy.
  29 However, the semantics of the language is embedded in these functions, making it troublesome to add semantics since it requires updating all existing language constructs.
  30
  31 On the other hand, deep embedding models language constructs as data in the host language.
  32 The semantics of the language are represented by functions over the data.
  33 Consequently, adding new semantics, i.e.\ novel functions, is straightforward.
  34 It can be stated that the language constructs are embedded in the functions that form a semantics.
  35 If one wants to add a language construct, all semantics functions must be revisited and revised to avoid ending up with partial functions.
  36
  37 This juxtaposition has been known for many years \citep{reynolds_user-defined_1978} and discussed by many others \citep{krishnamurthi_synthesizing_1998} but most famously dubbed the \emph{expression problem} by \citet{wadler_expression_1998}:
  38
  39 \begin{quote}
  40         The \emph{expression problem} is a new name for an old problem.
  41         The goal is to define a data type by cases, where one can add new cases to the data type and new functions over the data type, without recompiling existing code, and while retaining static type safety (e.g., no casts).
  42 \end{quote}
  43
  44 In shallow embedding, abstracting the functions to type classes disentangles the language constructs from the semantics, allowing extension both ways.
  45 This technique is dubbed tagless-final embedding \citep{carette_finally_2009}, nonetheless it is no silver bullet.
  46 Some semantics that require an intensional analysis of the syntax tree, such as transformation and optimisations, are difficult to implement in shallow embedding due to the lack of an explicit data structure representing the abstract syntax tree.
  47 The semantics of the \gls{DSL} have to be combined and must hold some kind of state or context, so that structural information is not lost \citep{kiselyov_typed_2012}.
  48
  49 \subsection{Research contribution}
  50 This chapter shows how to apply the technique observed in tagless-final embedding to deep embedding.
  51 The presented basic technique, christened \emph{classy deep embedding}, does not require advanced type system extensions to be used.
  52 However, it is suitable for type system extensions such as \glspl{GADT}.
  53 While this chapter is written as a literate
  54 \Gls{HASKELL} \citep{peyton_jones_haskell_2003} program using some minor extensions provided by \gls{GHC} \citep{ghc_team_ghc_2021}, the idea is applicable to other languages as well\footnotemark.
  55 \footnotetext{Lubbers, M. (2021): Literate Haskell/lhs2\TeX{} source code of the paper ``Deep Embedding
  56 with Class'': TFP 2022.\ Zenodo.\ \url{https://doi.org/10.5281/zenodo.6650880}.}
  57
  58 \section{Deep embedding}
  59 Pick the language of literal integers and addition \citep{bender_benderrule_2019}\todo{nodig? grappig?}.
  60 %Pick a \gls{DSL}, any \gls{DSL}, pick the language of literal integers and addition.
  61 In deep embedding, terms in the language are represented by data in the host language.
  62 Hence, defining the constructs is as simple as creating the following algebraic data type\footnote{All data types and functions are subscripted to indicate the evolution. When definitions are omitted for version $n$, version $n-1$ is assumed.}.
  63
  64 \begin{lstHaskellLhstex}
  65 data Expr_0
  66         = Lit_0 Int
  67         | Add_0 Expr_0 Expr_0
  68 \end{lstHaskellLhstex}
  69
  70 Semantics are defined as functions on the \haskelllhstexinline{Expr_0} data type.
  71 For example, a function transforming the term to an integer---an evaluator---is implemented as follows.
  72
  73 \begin{lstHaskellLhstex}
  74 eval_0 :: Expr_0 -> Int
  75 eval_0 (Lit_0 e)     = e
  76 eval_0 (Add_0 e1 e2) = eval_0 e1 + eval_0 e2
  77 \end{lstHaskellLhstex}
  78
  79 Adding semantics---e.g.\ a printer---just means adding another function while the existing functions remain untouched.
  80 I.e.\ the key property of deep embedding.
  81 The following function, transforming the \haskelllhstexinline{Expr_0} data type to a string, defines a simple printer for our language.
  82
  83 \begin{lstHaskellLhstex}
  84 print_0 :: Expr_0 -> String
  85 print_0 (Lit_0 v)     = show v
  86 print_0 (Add_0 e1 e2) = "(" ++ print_0 e1 ++ "-" ++ print_0 e2 ++ ")"
  87 \end{lstHaskellLhstex}
  88
  89 While the language is concise and elegant, it is not very expressive.
  90 Traditionally, extending the language is achieved by adding a case to the \haskelllhstexinline{Expr_0} data type.
  91 So, adding subtraction to the language results in the following revised data type.
  92
  93 \begin{lstHaskellLhstex}
  94 data Expr_0
  95         = Lit_0 Int
  96         | Add_0 Expr_0 Expr_0
  97         | Sub_0 Expr_0 Expr_0
  98 \end{lstHaskellLhstex}
  99
 100 Extending the \gls{DSL} with language constructs exposes the Achilles' heel of deep embedding.
 101 Adding a case to the data type means that all semantics functions have become partial and need to be updated to be able to handle this new case.
 102 This does not seem like an insurmountable problem, but it does pose a problem if either the functions or the data type itself are written by others or are contained in a closed library.
 103
 104 \section{Shallow embedding}
 105 Conversely, let us see how this would be done in shallow embedding.
 106 First, the data type is represented by functions in the host language with embedded semantics.
 107 Therefore, the evaluators for literals and addition both become a function in the host language as follows.
 108
 109 \begin{lstHaskellLhstex}
 110 type Sem_s = Int
 111
 112 lit_s :: Int -> Sem_s
 113 lit_s i = i
 114
 115 add_s :: Sem_s -> Sem_s -> Sem_s
 116 add_s e1 e2 = e1 + e2
 117 \end{lstHaskellLhstex}
 118
 119 Adding constructions to the language is done by adding functions.
 120 Hence, the following function adds subtraction to our language.
 121
 122 \begin{lstHaskellLhstex}
 123 sub_s :: Sem_s -> Sem_s -> Sem_s
 124 sub_s e1 e2 = e1 - e2
 125 \end{lstHaskellLhstex}
 126
 127 Adding semantics on the other hand---e.g.\ a printer---is not that simple because the semantics are part of the functions representing the language constructs.
 128 One way to add semantics is to change all functions to execute both semantics at the same time.
 129 In our case this means changing the type of \haskelllhstexinline{Sem_s} to be \haskelllhstexinline{(Int, String)} so that all functions operate on a tuple containing the result of the evaluator and the printed representation at the same time. %chktex 36
 130 Alternatively, a single semantics can be defined that represents a fold over the language constructs \citep{gibbons_folding_2014}, delaying the selection of semantics to the moment the fold is applied.
 131
 132 \subsection{Tagless-final embedding}\label{sec:tagless-final_embedding}
 133 Tagless-final embedding overcomes the limitations of standard shallow embedding.
 134 To upgrade to this embedding technique, the language constructs are changed from functions to type classes.
 135 For our language this results in the following type class definition.
 136
 137 \begin{lstHaskellLhstex}
 138 class Expr_t s where
 139         lit_t :: Int -> s
 140         add_t :: s -> s -> s
 141 \end{lstHaskellLhstex}
 142
 143 Semantics become data types implementing these type classes, resulting in the following instance for the evaluator\footnotemark.
 144 \footnotetext{%
 145         In this case \haskelllhstexinline{newtype}s are used instead of regular \haskelllhstexinline{data} declarations.
 146         A \haskelllhstexinline{newtype} is a special data type with a single constructor containing a single value only to which it is isomorphic.
 147         It allows the programmer to define separate class instances that the instances of the isomorphic type without any overhead.
 148         During compilation the constructor is completely removed \citep[\citesection{4.2.3}]{peyton_jones_haskell_2003}.
 149 }
 150
 151 \begin{lstHaskellLhstex}
 152 newtype Eval_t = E_t Int
 153
 154 instance Expr_t Eval_t where
 155         lit_t v               = E_t v
 156         add_t (E_t e1) (E_t e2) = E_t (e1 + e2)
 157 \end{lstHaskellLhstex}
 158
 159 Adding constructs---e.g.\ subtraction---just results in an extra type class and corresponding instances.
 160
 161 \begin{lstHaskellLhstex}
 162 class Sub_t s where
 163         sub_t :: s -> s -> s
 164
 165 instance Sub_t Eval_t where
 166         sub_t (E_t e1) (E_t e2) = E_t (e1 - e2)
 167 \end{lstHaskellLhstex}
 168
 169 Finally, adding semantics such as a printer for the language is achieved by providing a data type representing the semantics accompanied by instances for the language constructs.
 170
 171 \begin{lstHaskellLhstex}
 172 newtype Printer_t = P_t String
 173
 174 instance Expr_t Printer_t where
 175         lit_t i               = P_t (show i)
 176         add_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "+" ++ e2 ++ ")")
 177
 178 instance Sub_t Printer_t where
 179         sub_t (P_t e1) (P_t e2) = P_t ("(" ++ e1 ++ "-" ++ e2 ++ ")")
 180 \end{lstHaskellLhstex}
 181
 182 \section{Lifting the interpretations}%
 183 Let us rethink the deeply embedded \gls{DSL} design.
 184 Remember that in shallow embedding, the semantics are embedded in the language construct functions.
 185 Obtaining extensibility both in constructs and semantics was accomplished by abstracting the semantics functions to type classes, making the constructs overloaded in the semantics.
 186 In deep embedding, the constructs are embedded in the semantics functions instead.
 187 So, let us apply the same technique, i.e.\ make the semantics overloaded in the language constructs by abstracting the semantics functions to type classes.
 188 The same effect may be achieved when using similar techniques such as explicit dictionary passing or ML style modules.
 189 In our language this results in the following type class.
 190
 191 \begin{lstHaskellLhstex}
 192 class Eval_1 v where
 193         eval_1 :: v -> Int
 194
 195 data Expr_1
 196         = Lit_1 Int
 197         | Add_1 Expr_1 Expr_1
 198 \end{lstHaskellLhstex}
 199
 200 Implementing the semantics type class instances for the \haskelllhstexinline{Expr_1} data type is an elementary exercise.
 201 By a copy-paste and some modifications, we come to the following implementation.
 202
 203 \begin{lstHaskellLhstex}
 204 instance Eval_1 Expr_1 where
 205         eval_1 (Lit_1 v)     = v
 206         eval_1 (Add_1 e1 e2) = eval_1 e1 + eval_1 e2
 207 \end{lstHaskellLhstex}
 208
 209 Subtraction can now be defined in a separate data type, leaving the original data type intact.
 210 Instances for the additional semantics can now be implemented separately as instances of the type classes.
 211
 212 \begin{lstHaskellLhstex}
 213 data Sub_1 = Sub_1 Expr_1 Expr_1
 214
 215 instance Eval_1 Sub_1 where
 216         eval_1 (Sub_1 e1 e2) = eval_1 e1 - eval_1 e2
 217 \end{lstHaskellLhstex}
 218
 219 \section{Existential data types}%
 220
 221 The astute reader might have noticed that we have dissociated ourselves from the original data type.
 222 It is only possible to create an expression with a subtraction on the top level.
 223 The recursive knot is left untied and as a result, \haskelllhstexinline{Sub_1} can never be reached from an \haskelllhstexinline{Expr_1}.
 224
 225 Luckily, we can reconnect them by adding a special constructor to the \haskelllhstexinline{Expr_1} data type for housing extensions.
 226 It contains an existentially quantified \citep{mitchell_abstract_1988} type with type class constraints \citep{laufer_combining_1994,laufer_type_1996} for all semantics type classes \citep[\citesection{6.4.6}]{ghc_team_ghc_2021} to allow it to house not just subtraction but any future extension.
 227
 228 \begin{lstHaskellLhstex}
 229 data Expr_2
 230         =                      Lit_2 Int
 231         |                      Add_2 Expr_2 Expr_2
 232         | forall x. Eval_2 x => Ext_2 x
 233 \end{lstHaskellLhstex}
 234
 235 The implementation of the extension case in the semantics type classes is in most cases just a matter of calling the function for the argument as can be seen in the semantics instances shown below.
 236
 237 \begin{lstHaskellLhstex}
 238 instance Eval_2 Expr_2 where
 239         eval_2 (Lit_2  v)      = v
 240         eval_2 (Add_2  e1 e2)  = eval_2 e1 + eval_2 e2
 241         eval_2 (Ext_2  x)      = eval_2 x
 242 \end{lstHaskellLhstex}
 243
 244 Adding language construct extensions in different data types does mean that an extra \haskelllhstexinline{Ext_2} tag is introduced when using the extension.
 245 This burden can be relieved by creating a smart constructor for it that automatically wraps the extension with the \haskelllhstexinline{Ext_2} constructor so that it is of the type of the main data type.
 246
 247 \begin{lstHaskellLhstex}
 248 sub_2 :: Expr_2 -> Expr_2 -> Expr_2
 249 sub_2 e1 e2 = Ext_2 (Sub_2 e1 e2)
 250 \end{lstHaskellLhstex}
 251
 252 In our example this means that the programmer can write\footnotemark:
 253 \footnotetext{%
 254         Backticks are used to use functions or constructors in an infix fashion \citep[\citesection{4.3.3}]{peyton_jones_haskell_2003}.
 255 }
 256 \begin{lstHaskellLhstex}
 257 e2 :: Expr_2
 258 e2 = Lit_2 42 `sub_2` Lit_2 1
 259 \end{lstHaskellLhstex}
 260 instead of having to write
 261 \begin{lstHaskellLhstex}
 262 e2p :: Expr_2
 263 e2p = Ext_2 (Lit_2 42 `Sub_2` Lit_2 1)
 264 \end{lstHaskellLhstex}
 265
 266 \subsection{Unbraiding the semantics from the data}
 267 This approach does reveal a minor problem.
 268 Namely, that all semantics type classes are braided into our datatypes via the \haskelllhstexinline{Ext_2} constructor.
 269 Say if we add the printer again, the \haskelllhstexinline{Ext_2} constructor has to be modified to contain the printer type class constraint as well.\footnote{Resulting in the following constructor: \haskelllhstexinline{forall x.(Eval_2 x, Print_2 x) => Ext_2 x}.} %chktex 36
 270 Thus, if we add semantics, the main data type's type class constraints in the \haskelllhstexinline{Ext_2} constructor need to be updated.
 271 To avoid this, the type classes can be bundled in a type class alias or type class collection as follows.
 272
 273 \begin{lstHaskellLhstex}
 274 class (Eval_2 x, Print_2 x) => Semantics_2 x
 275
 276 data Expr_2
 277         =                           Lit_2 Int
 278         |                           Add_2 Expr_2 Expr_2
 279         | forall x. Semantics_2 x => Ext_2 x
 280 \end{lstHaskellLhstex}
 281
 282 The class alias removes the need for the programmer to visit the main data type when adding additional semantics.
 283 Unfortunately, the compiler does need to visit the main data type again.
 284 Some may argue that adding semantics happens less frequently than adding language constructs, but in reality it means that we have to concede that the language is not as easily extensible in semantics as in language constructs.
 285 More exotic type system extensions such as constraint kinds \citep{bolingbroke_constraint_2011,yorgey_giving_2012} can untangle the semantics from the data types by making the data types parametrised by the particular semantics.
 286 However, by adding some boilerplate, even without this extension, the language constructs can be parametrised by the semantics by putting the semantics functions in a data type.
 287 First the data types for the language constructs are parametrised by the type variable \haskelllhstexinline{d} as follows.
 288
 289 \begin{lstHaskellLhstex}
 290 data Expr_3 d
 291         =           Lit_3 Int
 292         |           Add_3 (Expr_3 d) (Expr_3 d)
 293         | forall x. Ext_3 (d x) x
 294
 295 data Sub_3 d = Sub_3 (Expr_3 d) (Expr_3 d)
 296 \end{lstHaskellLhstex}
 297
 298 The \haskelllhstexinline{d} type variable is inhabited by an explicit dictionary for the semantics, i.e.\ a witness to the class instance.
 299 Therefore, for all semantics type classes, a data type is defined which contains the semantics function for the given semantics.
 300 This means that for \haskelllhstexinline{Eval_3}, a dictionary with the function \haskelllhstexinline{EvalDict_3} is defined, a type class \haskelllhstexinline{HasEval_3} for retrieving the function from the dictionary and an instance for \haskelllhstexinline{HasEval_3} for \haskelllhstexinline{EvalDict_3}.
 301
 302 \begin{lstHaskellLhstex}
 303 newtype EvalDict_3 v = EvalDict_3 (v -> Int)
 304
 305 class HasEval_3 d where
 306         getEval_3 :: d v -> v -> Int
 307
 308 instance HasEval_3 EvalDict_3 where
 309         getEval_3 (EvalDict_3 e) = e
 310 \end{lstHaskellLhstex}
 311
 312 The instances for the type classes change as well according to the change in the datatype.
 313 Given that there is a \haskelllhstexinline{HasEval_3} instance for the witness type \haskelllhstexinline{d}, we can provide an implementation of \haskelllhstexinline{Eval_3} for \haskelllhstexinline{Expr_3 d}.
 314
 315 \begin{lstHaskellLhstex}
 316 instance HasEval_3 d => Eval_3 (Expr_3 d) where
 317         eval_3 (Lit_3 v)     = v
 318         eval_3 (Add_3 e1 e2) = eval_3 e1 + eval_3 e2
 319         eval_3 (Ext_3 d x)   = getEval_3 d x
 320
 321 instance HasEval_3 d => Eval_3 (Sub_3 d) where
 322         eval_3 (Sub_3 e1 e2) = eval_3 e1 - eval_3 e2
 323 \end{lstHaskellLhstex}
 324
 325 Because the \haskelllhstexinline{Ext_3} constructor from \haskelllhstexinline{Expr_3} now contains a value of type \haskelllhstexinline{d}, the smart constructor for \haskelllhstexinline{Sub_3} must somehow come up with this value.
 326 To achieve this, a type class is introduced that allows the generation of such a dictionary.
 327
 328 \begin{lstHaskellLhstex}
 329 class GDict a where
 330         gdict :: a
 331 \end{lstHaskellLhstex}
 332
 333 This type class has individual instances for all semantics dictionaries, linking the class instance to the witness value.
 334 I.e.\ if there is a type class instance known, a witness value can be conjured using the \haskelllhstexinline{gdict} function.
 335
 336 \begin{lstHaskellLhstex}
 337 instance Eval_3 v => GDict (EvalDict_3 v) where
 338         gdict = EvalDict_3 eval_3
 339 \end{lstHaskellLhstex}
 340
 341 With these instances, the semantics function can be retrieved from the \haskelllhstexinline{Ext_3} constructor and in the smart constructors they can be generated as follows:
 342
 343 \begin{lstHaskellLhstex}
 344 sub_3 :: GDict (d (Sub_3 d)) => Expr_3 d -> Expr_3 d -> Expr_3 d
 345 sub_3 e1 e2 = Ext_3 gdict (Sub_3 e1 e2)
 346 \end{lstHaskellLhstex}
 347
 348 Finally, we reached the end goal, orthogonal extension of both language constructs as shown by adding subtraction to the language, and in language semantics.
 349 Adding the printer can now be done without touching the original code as follows.
 350 First the printer type class, dictionaries and instances for \haskelllhstexinline{GDict} are defined.
 351
 352 \begin{lstHaskellLhstex}
 353 class Print_3 v where
 354         print_3 :: v -> String
 355
 356 newtype PrintDict_3 v = PrintDict_3 (v -> String)
 357
 358 class HasPrint_3 d where
 359         getPrint_3 :: d v -> v -> String
 360
 361 instance HasPrint_3 PrintDict_3 where
 362         getPrint_3 (PrintDict_3 e) = e
 363
 364 instance Print_3 v => GDict (PrintDict_3 v) where
 365         gdict = PrintDict_3 print_3
 366 \end{lstHaskellLhstex}
 367
 368 Then the instances for \haskelllhstexinline{Print_3} of all the language constructs can be defined.
 369
 370 \begin{lstHaskellLhstex}
 371 instance HasPrint_3 d => Print_3 (Expr_3 d) where
 372         print_3 (Lit_3 v)     = show v
 373         print_3 (Add_3 e1 e2) = "(" ++ print_3 e1 ++ "+" ++ print_3 e2 ++ ")"
 374         print_3 (Ext_3 d x)   = getPrint_3 d x
 375 instance HasPrint_3 d => Print_3 (Sub_3 d) where
 376         print_3 (Sub_3 e1 e2) = "(" ++ print_3 e1 ++ "-" ++ print_3 e2 ++ ")"
 377 \end{lstHaskellLhstex}
 378
 379 \section{Transformation semantics}
 380 Most semantics convert a term to some final representation and can be expressed just by functions on the cases.
 381 However, the implementation of semantics such as transformation or optimisation may benefit from a so-called intentional analysis of the abstract syntax tree.
 382 In shallow embedding, the implementation for these types of semantics is difficult because there is no tangible abstract syntax tree.
 383 In off-the-shelf deep embedding this is effortless since the function can pattern match on the constructor or structures of constructors.
 384
 385 To demonstrate intensional analyses in classy deep embedding we write an optimiser that removes addition and subtraction by zero.
 386 In classy deep embedding, adding new semantics means first adding a new type class housing the function including the machinery for the extension constructor.
 387
 388 \begin{lstHaskellLhstex}
 389 class Opt_3 v where
 390         opt_3 :: v -> v
 391
 392 newtype OptDict_3 v = OptDict_3 (v -> v)
 393
 394 class HasOpt_3 d where
 395         getOpt_3 :: d v -> v -> v
 396
 397 instance HasOpt_3 OptDict_3 where
 398         getOpt_3 (OptDict_3 e) = e
 399
 400 instance Opt_3 v => GDict (OptDict_3 v) where
 401         gdict = OptDict_3 opt_3
 402 \end{lstHaskellLhstex}
 403
 404 The implementation of the optimiser for the \haskelllhstexinline{Expr_3} data type is no complicated task.
 405 The only interesting bit occurs in the \haskelllhstexinline{Add_3} constructor, where we pattern match on the optimised children to determine whether an addition with zero is performed.
 406 If this is the case, the addition is removed.
 407
 408 \begin{lstHaskellLhstex}
 409 instance HasOpt_3 d => Opt_3 (Expr_3 d) where
 410         opt_3 (Lit_3 v)     = Lit_3 v
 411         opt_3 (Add_3 e1 e2) = case (opt_3 e1, opt_3 e2) of
 412                 (Lit_3 0, e2p    ) -> e2p
 413                 (e1p,     Lit_3 0) -> e1p
 414                 (e1p,     e2p    ) -> Add_3 e1p e2p
 415         opt_3 (Ext_3 d x)   = Ext_3 d (getOpt_3 d x)
 416 \end{lstHaskellLhstex}
 417
 418 Replicating this for the \haskelllhstexinline{Opt_3} instance of \haskelllhstexinline{Sub_3} seems a clear-cut task at first glance.
 419
 420 \begin{lstHaskellLhstex}
 421 instance HasOpt_3 d => Opt_3 (Sub_3 d) where
 422         opt_3 (Sub_3 e1 e2) = case (opt_3 e1, opt_3 e2) of
 423                 (e1p, Lit_3 0) -> e1p
 424                 (e1p, e2p    ) -> Sub_3 e1p e2p
 425 \end{lstHaskellLhstex}
 426
 427 Unsurprisingly, this code is rejected by the compiler.
 428 When a literal zero is matched as the right-hand side of a subtraction, the left-hand side of type \haskelllhstexinline{Expr_3} is returned.
 429 However, the type signature of the function dictates that it should be of type \haskelllhstexinline{Sub_3}.
 430 To overcome this problem we add a convolution constructor.
 431
 432 \subsection{Convolution}
 433 Adding a loopback case or convolution constructor to \haskelllhstexinline{Sub_3} allows the removal of the \haskelllhstexinline{Sub_3} constructor while remaining the \haskelllhstexinline{Sub_3} type.
 434 It should be noted that a loopback case is \emph{only} required if the transformation actually removes tags.
 435 This changes the \haskelllhstexinline{Sub_3} data type as follows.
 436
 437 \begin{lstHaskellLhstex}
 438 data Sub_4 d
 439         = Sub_4     (Expr_4 d) (Expr_4 d)
 440         | SubLoop_4 (Expr_4 d)
 441
 442 instance HasEval_4 d => Eval_4 (Sub_4 d) where
 443         eval_4 (Sub_4     e1 e2) = eval_4 e1 - eval_4 e2
 444         eval_4 (SubLoop_4 e1)    = eval_4 e1
 445 \end{lstHaskellLhstex}
 446
 447 With this loopback case in the toolbox, the following \haskelllhstexinline{Sub} instance optimises away subtraction with zero literals.
 448
 449 \begin{lstHaskellLhstex}
 450 instance HasOpt_4 d => Opt_4 (Sub_4 d) where
 451         opt_4 (Sub_4 e1 e2) = case (opt_4 e1, opt_4 e2) of
 452                 (e1p, Lit_4 0) -> SubLoop_4 e1p
 453                 (e1p, e2p    ) -> Sub_4 e1p e2p
 454         opt_4 (SubLoop_4 e) =  SubLoop_4 (opt_4 e)
 455 \end{lstHaskellLhstex}
 456
 457 \subsection{Pattern matching}
 458 Pattern matching within datatypes and from an extension to the main data type works out of the box.
 459 Cross-extensional pattern matching on the other hand---matching on a particular extension---is something that requires a bit of extra care.
 460 Take for example negation propagation and double negation elimination.
 461 Pattern matching on values with an existential type is not possible without leveraging dynamic typing \citep{abadi_dynamic_1991,baars_typing_2002}.
 462 To enable dynamic typing support, the \haskelllhstexinline{Typeable} type class as provided by \haskelllhstexinline{Data.Dynamic} \citep{ghc_team_datadynamic_2021} is added to the list of constraints in all places where we need to pattern match across extensions.
 463 As a result, the \haskelllhstexinline{Typeable} type class constraints are added to the quantified type variable \haskelllhstexinline{x} of the \haskelllhstexinline{Ext_4} constructor and to \haskelllhstexinline{d}s in the smart constructors.
 464
 465 \begin{lstHaskellLhstex}
 466 data Expr_4 d
 467         =                         Lit_4 Int
 468         |                         Add_4 (Expr_4 d) (Expr_4 d)
 469         | forall x. Typeable x => Ext_4 (d x) x
 470 \end{lstHaskellLhstex}
 471
 472 First let us add negation to the language by defining a datatype representing it.
 473 Negation elimination requires the removal of negation constructors, so a convolution constructor is defined as well.
 474
 475 \begin{lstHaskellLhstex}
 476 data Neg_4 d
 477         = Neg_4     (Expr_4 d)
 478         | NegLoop_4 (Expr_4 d)
 479
 480 neg_4 :: (Typeable d, GDict (d (Neg_4 d))) => Expr_4 d -> Expr_4 d
 481 neg_4 e = Ext_4 gdict (Neg_4 e)
 482 \end{lstHaskellLhstex}
 483
 484 The evaluation and printer instances for the \haskelllhstexinline{Neg_4} datatype are defined as follows.
 485
 486 \begin{lstHaskellLhstex}
 487 instance HasEval_4 d => Eval_4 (Neg_4 d) where
 488         eval_4 (Neg_4     e) = negate (eval_4 e)
 489         eval_4 (NegLoop_4 e) = eval_4 e
 490
 491 instance HasPrint_4 d => Print_4 (Neg_4 d) where
 492         print_4 (Neg_4     e) = "(~" ++ print_4 e ++ ")"
 493         print_4 (NegLoop_4 e) = print_4 e
 494 \end{lstHaskellLhstex}
 495
 496 The \haskelllhstexinline{Opt_4} instance contains the interesting bit.
 497 If the sub expression of a negation is an addition, negation is propagated downwards.
 498 If the sub expression is again a negation, something that can only be found out by a dynamic pattern match, it is replaced by a \haskelllhstexinline{NegLoop_4} constructor.
 499
 500 \begin{lstHaskellLhstex}
 501 instance (Typeable d, GDict (d (Neg_4 d)), HasOpt_4 d) => Opt_4 (Neg_4 d) where
 502         opt_4 (Neg_4 (Add_4 e1 e2))
 503                 = NegLoop_4 (Add_4 (opt_4 (neg_4 e1)) (opt_4 (neg_4 e2)))
 504         opt_4 (Neg_4 (Ext_4 d x))
 505                 = case fromDynamic (toDyn (getOpt_4 d x)) of
 506                         Just (Neg_4 e) -> NegLoop_4 e
 507                         _             -> Neg_4 (Ext_4 d (getOpt_4 d x))
 508         opt_4 (Neg_4     e) = Neg_4 (opt_4 e)
 509         opt_4 (NegLoop_4 e) = NegLoop_4 (opt_4 e)
 510 \end{lstHaskellLhstex}
 511
 512 Loopback cases do make cross-extensional pattern matching less modular in general.
 513 For example, \haskelllhstexinline{Ext_4 d (SubLoop_4 (Lit_4 0))} is equivalent to \haskelllhstexinline{Lit_4 0} in the optimisation semantics and would require an extra pattern match.
 514 Fortunately, this problem can be mitigated---if required---by just introducing an additional optimisation semantics that removes loopback cases.
 515 Luckily, one does not need to resort to these arguably blunt matters often.
 516 Dependent language functionality often does not need to span extensions, i.e.\ it is possible to group them in the same data type.
 517
 518 \subsection{Chaining semantics}
 519 Now that the data types are parametrised by the semantics, a final problem needs to be overcome.
 520 The data type is parametrised by the semantics, thus, using multiple semantics, such as evaluation after optimising is not straightforwardly possible.
 521 Luckily, a solution is readily at hand: introduce an ad-hoc combination semantics.
 522
 523 \begin{lstHaskellLhstex}
 524 data OptPrintDict_4 v = OPD_4 (OptDict_4 v) (PrintDict_4 v)
 525
 526 instance HasOpt_4   OptPrintDict_4 where
 527         getOpt_4   (OPD_4 v _) = getOpt_4 v
 528 instance HasPrint_4 OptPrintDict_4 where
 529         getPrint_4 (OPD_4 _ v) = getPrint_4 v
 530
 531 instance (Opt_4 v, Print_4 v) => GDict (OptPrintDict_4 v) where
 532         gdict = OPD_4 gdict gdict
 533 \end{lstHaskellLhstex}
 534
 535 And this allows us to write \haskelllhstexinline{print_4 (opt_4 e1)} resulting in \verb|"((~42)+(~38))"| when \haskelllhstexinline{e1} represents $(\sim(42+38))-0$ and is thus defined as follows.
 536
 537 \begin{lstHaskellLhstex}
 538 e1 :: Expr_4  OptPrintDict_4
 539 e1 = neg_4 (Lit_4 42 `Add_4` Lit_4 38) `sub_4` Lit_4 0
 540 \end{lstHaskellLhstex}
 541
 542 When using classy deep embedding to the fullest, the ability of the compiler to infer very general types expires.
 543 As a consequence, defining reuseable expressions that are overloaded in their semantics requires quite some type class constraints that cannot be inferred by the compiler (yet) if they use many extensions.
 544 Solving this remains future work.
 545 For example, the expression $\sim(42-38)+1$ has to be defined as:
 546
 547 \begin{lstHaskellLhstex}
 548 e3 :: (Typeable d, GDict (d (Neg_4 d)), GDict (d (Sub_4 d))) => Expr_4 d
 549 e3 = neg_4 (Lit_4 42 `sub_4` Lit_4 38) `Add_4` Lit_4 1
 550 \end{lstHaskellLhstex}
 551
 552 \section{Generalised algebraic data types}%
 553 \Glspl{GADT} are enriched data types that allow the type instantiation of the constructor to be explicitly defined \citep{cheney_first-class_2003,hinze_fun_2003}.
 554 Leveraging \glspl{GADT}, deeply embedded \glspl{DSL} can be made statically type safe even when different value types are supported.
 555 Still when \glspl{GADT} are not supported natively in the language, they can be simulated using embedding-projection pairs or equivalence types \citep[\citesection{2.2}]{cheney_lightweight_2002}.
 556 Where some solutions to the expression problem do not easily generalise to \glspl{GADT} (see \cref{sec:cde:related}), classy deep embedding does.
 557 Generalising the data structure of our \gls{DSL} is fairly straightforward and to spice things up a bit, we add an equality and boolean negation language construct.
 558 To make the existing \gls{DSL} constructs more general, we relax the types of those constructors.
 559 For example, operations on integers now work on all numerals instead.
 560 Moreover, the \haskelllhstexinline{Lit_g} constructor can be used to lift values of any type to the \gls{DSL} domain as long as they have a \haskelllhstexinline{Show} instance, required for the printer.
 561 Since some optimisations on \haskelllhstexinline{Not_g} remove constructors and therefore use cross-extensional pattern matches, \haskelllhstexinline{Typeable} constraints are added to \haskelllhstexinline{a}.
 562 Furthermore, because the optimisations for \haskelllhstexinline{Add_g} and \haskelllhstexinline{Sub_g} are now more general, they do not only work for \haskelllhstexinline{Int}s but for any type with a \haskelllhstexinline{Num} instance, the \haskelllhstexinline{Eq} constraint is added to these constructors as well.
 563 Finally, not to repeat ourselves too much, we only show the parts that substantially changed.
 564 The omitted definitions and implementation can be found in \cref{sec:cde:appendix}.
 565
 566 \begin{lstHaskellLhstex}
 567 data Expr_g d a where
 568         Lit_g     :: Show a        => a -> Expr_g d a
 569         Add_g     :: (Eq a, Num a) => Expr_g d a -> Expr_g d a -> Expr_g d a
 570         Ext_g     :: Typeable x    => d x -> x a -> Expr_g d a
 571 data Neg_g  d a where
 572         Neg_g     :: (Typeable a, Num a) => Expr_g d a -> Neg_g d a
 573         NegLoop_g ::                        Expr_g d a -> Neg_g d a
 574 data Not_g  d a where
 575         Not_g     :: Expr_g d Bool -> Not_g d Bool
 576         NotLoop_g :: Expr_g d a    -> Not_g d a
 577 \end{lstHaskellLhstex}
 578
 579 The smart constructors for the language extensions inherit the class constraints of their data types and include a \haskelllhstexinline{Typeable} constraint on the \haskelllhstexinline{d} type variable for it to be useable in the \haskelllhstexinline{Ext_g} constructor as can be seen in the smart constructor for \haskelllhstexinline{Neg_g}:
 580
 581 \begin{lstHaskellLhstex}
 582 neg_g  :: (Typeable d, GDict (d (Neg_g d)), Typeable a, Num a) =>
 583         Expr_g d a -> Expr_g d a
 584 neg_g e = Ext_g gdict (Neg_g e)
 585
 586 not_g  :: (Typeable d,  GDict (d (Not_g d))) =>
 587         Expr_g d Bool -> Expr_g d Bool
 588 not_g e = Ext_g gdict (Not_g e)
 589 \end{lstHaskellLhstex}
 590
 591 Upgrading the semantics type classes to support \glspl{GADT} is done by an easy textual search and replace.
 592 All occurrences of \haskelllhstexinline{v} are now parametrised by type variable \haskelllhstexinline{a}:
 593
 594 \begin{lstHaskellLhstex}
 595 class Eval_g  v where
 596         eval_g  :: v a -> a
 597 class Print_g v where
 598         print_g :: v a -> String
 599 class Opt_g   v where
 600         opt_g   :: v a -> v a
 601 \end{lstHaskellLhstex}
 602
 603 Now that the shape of the type classes has changed, the dictionary data types and the type classes need to be adapted as well.
 604 The introduced type variable \haskelllhstexinline{a} is not an argument to the type class, so it should not be an argument to the dictionary data type.
 605 To represent this type class function, a rank-2 polymorphic function is needed \citep[\citesection{6.4.15}]{ghc_team_ghc_2021}\citep{odersky_putting_1996}.
 606 Concretely, for the evaluator this results in the following definitions:
 607
 608 \begin{lstHaskellLhstex}
 609 newtype EvalDict_g v = EvalDict_g (forall a. v a -> a)
 610 class HasEval_g d where
 611         getEval_g :: d v -> v a -> a
 612 instance HasEval_g EvalDict_g where
 613         getEval_g (EvalDict_g e) = e
 614 \end{lstHaskellLhstex}
 615
 616 The \haskelllhstexinline{GDict} type class is general enough, so the instances can remain the same.
 617 The \haskelllhstexinline{Eval_g} instance of \haskelllhstexinline{GDict} looks as follows:
 618
 619 \begin{lstHaskellLhstex}
 620 instance Eval_g v => GDict (EvalDict_g v) where
 621         gdict = EvalDict_g eval_g
 622 \end{lstHaskellLhstex}
 623
 624 Finally, the implementations for the instances can be ported without complication show using the optimisation instance of \haskelllhstexinline{Not_g}:
 625
 626 \begin{lstHaskellLhstex}
 627 instance (Typeable d, GDict (d (Not_g d)), HasOpt_g d) => Opt_g (Not_g d) where
 628         opt_g (Not_g (Ext_g d x))
 629                 = case fromDynamic (toDyn (getOpt_g d x)) :: Maybe (Not_g d Bool) of
 630                         Just (Not_g e) -> NotLoop_g e
 631                         _             -> Not_g (Ext_g d (getOpt_g d x))
 632         opt_g (Not_g e)     = Not_g (opt_g e)
 633         opt_g (NotLoop_g e) = NotLoop_g (opt_g e)
 634 \end{lstHaskellLhstex}
 635
 636 \section{Related work}%
 637 \label{sec:cde:related}
 638
 639 Embedded \gls{DSL} techniques in functional languages have been a topic of research for many years, thus we do not claim a complete overview of related work.
 640
 641 Clearly, classy deep embedding bears most similarity to the \emph{Datatypes \`a la Carte} \citep{swierstra_data_2008}.
 642 In \citeauthor{swierstra_data_2008}'s approach, semantics are lifted to type classes similarly to classy deep embedding.
 643 Each language construct is their own datatype parametrised by a type parameter.
 644 This parameter contains some type level representation of language constructs that are in use.
 645 In classy deep embedding, extensions only have to be enumerated at the type level when the term is required to be overloaded, in all other cases they are captured in the extension case.
 646 Because all the constructs are expressed in the type system, nifty type system tricks need to be employed to convince the compiler that everything is type safe and the class constraints can be solved.
 647 Furthermore, it requires some boilerplate code such as functor instances for the data types.
 648 In return, pattern matching is easier and does not require dynamic typing.
 649 Classy deep embedding only strains the programmer with writing the extension case for the main data type and the occasional loopback constructor.
 650
 651 \Citet{loh_open_2006} proposed a language extension that allows open data types and open functions, i.e.\ functions and data types that can be extended with more cases later on.
 652 They hinted at the possibility of using type classes for open functions but had serious concerns that pattern matching would be crippled because constructors are becoming types, thus ultimately becoming impossible to type.
 653 In contrast, this chapter shows that pattern matching is easily attainable---albeit using dynamic types---and that the terms can be typed without complicated type system extensions.
 654
 655 A technique similar to classy deep embedding was proposed by \citet{najd_trees_2017} to tackle a slightly different problem, namely that of reusing a data type for multiple purposes in a slightly different form.
 656 For example to decorate the abstract syntax tree of a compiler differently for each phase of the compiler.
 657 They propose to add an extension descriptor as a type variable to a data type and a type family that can be used to decorate constructors with extra information and add additional constructors to the data type using an extension constructor.
 658 Classy deep embedding works similarly but uses existentially quantified type variables to describe possible extensions instead of type variables and type families.
 659 In classy deep embedding, the extensions do not need to be encoded in the type system and less boilerplate is required.
 660 Furthermore, pattern matching on extensions becomes a bit more complicated but in return it allows for multiple extensions to be added orthogonally and avoids the necessity for type system extensions.
 661
 662 Tagless-final embedding is the shallowly embedded counterpart of classy deep embedding and was invented for the same purpose; overcoming the issues with standard shallow embedding \citep{carette_finally_2009}.
 663 Classy deep embedding was organically grown from observing the evolution of tagless-final embedding.
 664 The main difference between tagless-final embedding and classy deep embedding---and in general between shallow and deep embedding---is that intensional analyses of the abstract syntax tree is more difficult because there is no tangible abstract syntax tree data structure.
 665 In classy deep embedding, it is possible to define transformations even across extensions.
 666 Furthermore, in classy deep embedding, defining (mutual) dependent interpretations is automatically supported whereas in tagless-final embedding this requires some amount of code duplication \citep{sun_compositional_2022}.
 667
 668 Hybrid approaches between deep and shallow embedding exist as well.
 669 For example, \citet{svenningsson_combining_2013} show that by expressing the deeply embedded language in a shallowly embedded core language, extensions can be made orthogonally as well.
 670 Classy deep embedding differs from the hybrid approaches in the sense that it does not require the language extensions to be expressible in the core language.
 671
 672 \subsection{Comparison}
 673 No single \gls{DSL} embedding technique is the silver bullet, there is no way of perfectly satisfying all requirements programmers have.
 674 \Citet{sun_compositional_2022} provided a thorough comparison of embedding techniques including more axes than just the two stated in the expression problem.
 675
 676 \Cref{tbl:dsl_comparison_brief} shows a variant of their comparison table.
 677 The first two rows describe the two axes of the original expression problem and the third row describes the added axis of modular dependency handling as stated by \citeauthor{sun_compositional_2022}.
 678 The \emph{poly.} style of embedding---including tagless-final---falls short of this requirement.
 679
 680 Intensional analysis is an umbrella term for pattern matching and transformations.
 681 In shallow embedding, intensional analysis is more complex and requires stateful views describing context, but it is possible to implement though.
 682
 683 Simple type system describes the whether it is possible to encode this embedding technique without many type system extensions.
 684 In classy deep embedding, there is either a bit more scaffolding and boilerplate required or advanced type system extensions need to be used.
 685
 686 Minimal boilerplate denotes the amount of scaffolding and boilerplate required.
 687 For example, hybrid embedding requires a transcoding step between the deep syntax and the shallow core language.
 688
 689 \begin{table}
 690         \begin{threeparttable}[b]
 691                 \small
 692                 \caption{Comparison of embedding techniques, extended from \citet[\citesection{3.6}]{sun_compositional_2022}.}%
 693                 \label{tbl:dsl_comparison_brief}
 694                 \begin{tabular}{llllllll}
 695                         \toprule
 696                                                                  & Shallow                & Deep                   & Hybrid
 697                                                                  & Poly.                  & Comp.                  & \`a la
 698                                                                  & Classy\\
 699                         \midrule
 700                         Extend constructs    & \CIRCLE{}              & \Circle{}              & \LEFTcircle{}\tnote{1}
 701                                                                  & \CIRCLE{}              & \CIRCLE{}              & \CIRCLE{}
 702                                                                  & \CIRCLE{}\\
 703                         Extend views         & \Circle{}              & \CIRCLE{}              & \CIRCLE{}
 704                                                                  & \CIRCLE{}              & \CIRCLE{}              & \CIRCLE{}
 705                                                                  & \CIRCLE{}\\
 706                         Modular dependencies & \Circle{}              & \CIRCLE{}              & \CIRCLE{}
 707                                                                  & \Circle{}              & \CIRCLE{}              & \CIRCLE{}
 708                                                                  & \CIRCLE{}\\
 709                         Intensional analysis & \LEFTcircle{}\tnote{2} & \CIRCLE{}              & \CIRCLE{}
 710                                                                  & \LEFTcircle{}\tnote{2} & \LEFTcircle{}\tnote{2} & \CIRCLE{}
 711                                                                  & \CIRCLE{}\tnote{3}\\
 712                         Simple type system   & \CIRCLE{}              & \CIRCLE{}              & \Circle{}
 713                                                                  & \CIRCLE{}              & \CIRCLE{}              & \Circle{}
 714                                                                  & \textcolor{gray}{\CIRCLE{}}\tnote{4}\\
 715                         Minimal boilerplate  & \CIRCLE{}              & \CIRCLE{}              & \Circle{}
 716                                                                  & \CIRCLE{}              & \CIRCLE{}              & \Circle{}
 717                                                                  & \textcolor{gray}{\CIRCLE{}}\tnote{4}\\
 718                         \bottomrule
 719                 \end{tabular}
 720                 \begin{tablenotes}
 721                         \item [1] Only if the extension is expressible in the core language.
 722                         \item [2] Requires ingenuity and are sometimes awkward to write.
 723                         \item [3] Cross-extensional pattern matching requires \emph{safe} dynamic typing.
 724                         \item [4] Either a simple type system or little boilerplate.
 725                 \end{tablenotes}
 726         \end{threeparttable}
 727 \end{table}
 728
 729 \section{Conclusion}%
 730 Classy deep embedding is a novel organically grown embedding technique that alleviates deep embedding from the extensibility problem in most cases.
 731
 732 By abstracting the semantics functions to type classes they become overloaded in the language constructs.
 733 Thus, making it possible to add new language constructs in a separate type.
 734 These extensions are brought together in a special extension constructor residing in the main data type.
 735 This extension case is overloaded by the language construct using a data type containing the class dictionary.
 736 As a result, orthogonal extension is possible for language constructs and semantics using only little syntactic overhead or type annotations.
 737 The basic technique only requires---well established through history and relatively standard---existential data types.
 738 However, if needed, the technique generalises to \glspl{GADT} as well, adding rank-2 types to the list of type system requirements as well.
 739 Finally, the abstract syntax tree remains observable which makes it suitable for intensional analyses, albeit using occasional dynamic typing for truly cross-extensional transformations.
 740
 741 Defining reuseable expressions overloaded in semantics or using multiple semantics on a single expression requires some boilerplate still, getting around these issues remains future work.
 742 \Cref{sec:classy_reprise} shows how the boilerplate can be minimised using advanced type system extensions.
 743
 744 \section*{Acknowledgements}
 745 This research is partly funded by the Royal Netherlands Navy.
 746 Furthermore, I would like to thank Pieter and Rinus for the fruitful discussions, Ralf for inspiring me to write a functional pearl, and the anonymous reviewers for their valuable and honest comments.
 747
 748 \begin{subappendices}
 749 \section{Reprise: reducing boilerplate}%
 750 \label{sec:classy_reprise}
 751 One of the unique selling points of classy deep embedding is that it, in its basic form, does not require advanced type system extensions nor a lot of boilerplate.
 752 However, generalising the technique to \glspl{GADT} arguably unleashes a cesspool of \emph{unsafe} compiler extensions.
 753 If we are willing to work with extensions, almost all the boilerplate can be inferred or generated.
 754
 755 In classy deep embedding, the \gls{DSL} datatype is parametrised by a type variable providing a witness to the interpretation on the language.
 756 When using multiple interpretations, these need to be bundled in a data type.
 757 Using the \gls{GHC}'s \GHCmod{ConstraintKind} extension, we can make these witnesses explicit, tying into \gls{HASKELL}'s type system immediately.
 758 Furthermore, this constraint does not necessarily have to be a single constraint, after enabling \GHCmod{DataKinds} and \GHCmod{TypeOperators}, we can encode lists of witnesses instead \citep{yorgey_giving_2012}.
 759 The data type for this list of witnesses is \haskelllhstexinline{Record} as shown in \cref{lst_cbde:record_type}.
 760 This \gls{GADT} is parametrised by two type variables.
 761 The first type variable (\haskelllhstexinline{dt}) is the type or type constructor on which the constraints can be applied and the second type variable (\haskelllhstexinline{clist}) is the list of constraints constructors itself.
 762 This means that when \haskelllhstexinline{Cons} is pattern matched, the overloading of the type class constraint for \haskelllhstexinline{c dt} can be solved by the compiler.
 763 \GHCmod{KindSignatures} is used to force the kinds of the type parameters and the kind of \haskelllhstexinline{dt} is polymorphic (\GHCmod{PolyKinds}) so that the \haskelllhstexinline{Record} data type can be used for \glspl{DSL} using type classes but also type constructor classes (e.g.\ when using \glspl{GADT}).
 764
 765 \begin{lstHaskellLhstex}[label={lst_cbde:record_type},caption={Data type for a list of constraints.}]
 766 data Record (dt :: k) (clist :: [k -> Constraint]) where
 767         Nil  :: Record dt '[]
 768         Cons :: c dt => Record dt cs -> Record dt (c ': cs)
 769 \end{lstHaskellLhstex}
 770
 771 To incorporate this type in the \haskelllhstexinline{Expr} type, the \haskelllhstexinline{Ext} constructor changes from containing a single witness dictionary to a \haskelllhstexinline{Record} type containing all the required dictionaries.
 772
 773 \begin{lstHaskellLhstex}[caption={Main data type with extension constructor.}]
 774 data Expr c
 775         = Lit Int
 776         | Add (Expr c) (Expr c)
 777         | Ext (Record x c) x
 778 \end{lstHaskellLhstex}
 779
 780 Furthermore, we define a type class (\haskelllhstexinline{In}) that allows us to extract explicit dictionaries \haskelllhstexinline{Dict} from these records if the constraint can is present in the list.
 781 Since the constraints become available as soon as the \haskelllhstexinline{Cons} constructor is matched, the implementation is a type-level list traversal.
 782
 783 \begin{lstHaskellLhstex}[caption={Membership functions for constraints.}]
 784 class c `In` cs where
 785         project :: Record dt cs -> Dict (c dt)
 786 instance {-# OVERLAPPING #-} c `In` (c ': cs) where
 787         project (Cons _) = Dict
 788 instance {-# OVERLAPPING #-} c `In` cs => c `In` (b ': cs) where
 789         project (Cons xs) = project xs
 790 \end{lstHaskellLhstex}
 791
 792 The final scaffolding is a multi-parameter type class \haskelllhstexinline{CreateRecord} (requiring the \GHCmod{MultiParamTypeclasses} and \GHCmod{FlexibleInstances} extension) to create these \haskelllhstexinline{Record} witnesses automatically.
 793 This type class creates a record structure cons by cons if and only if all type class constraints are available in the list of constraints.
 794 It is not required to provide instances for this for specific records or type classes, the two instances describe all the required constraints.
 795
 796 \begin{lstHaskellLhstex}[caption={Functions for creating the witness records.}]
 797 class CreateRecord dt c where
 798         createRecord :: Record dt c
 799 instance CreateRecord d '[] where
 800         createRecord = Nil
 801 instance (c (d c0), CreateRecord (d c0) cs) =>
 802                 CreateRecord (d c0) (c ': cs) where
 803         createRecord = Cons createRecord
 804 \end{lstHaskellLhstex}
 805
 806 The class constraints for the interpretation instances can now be greatly simplified, as shown in the evaluation instance for \haskelllhstexinline{Expr}.
 807 The implementation remains the same, only that for the extension case, a trick needs to be applied to convince the compiler of the correct instances.
 808 Using \haskelllhstexinline{`In`}'s \haskelllhstexinline{project} function, a dictionary can be brought into scope.
 809 This dictionary can then subsequently be used to apply the type class function on the extension using the \haskelllhstexinline{withDict} function from the \haskelllhstexinline{Data.Constraint} library.\footnote{\haskelllhstexinline{withDict :: Dict c -> (c => r) -> r}}
 810 The \GHCmod{ScopedTypeVariables} extension is used to make sure the existentially quantified type variable for the extension is matched to the type of the dictionary.
 811 Furthermore, because the class constraint is not smaller than the instance head, \GHCmod{UndecidableInstances} should be enabled.
 812
 813 \begin{lstHaskellLhstex}
 814 class Eval v where
 815         eval :: v -> Int
 816
 817 instance Eval `In` s => Eval (Expr s) where
 818         eval (Lit i) = i
 819         eval (Add l r) = eval l + eval r
 820         eval (Ext r (e :: x)) = withDict (project r :: Dict (Eval x)) eval e
 821 \end{lstHaskellLhstex}
 822
 823 Smart constructors need to be adapted as well, as can be seen from the smart constructor \haskelllhstexinline{subst}.
 824 Instead of a \haskelllhstexinline{GDict} class constraint, a \haskelllhstexinline{CreateRecord} class constraint needs to be added.
 825
 826 \begin{lstHaskellLhstex}
 827 subst :: (Typeable c, CreateRecord (Subt c) c)
 828         => Expr c -> Expr c -> Expr c
 829 subst l r = Ext createRecord (l `Subt` r)
 830 \end{lstHaskellLhstex}
 831
 832 Finally, defining terms in the language can be done immediately if the interpretations are known.
 833 For example, if we want to print and/or optimise the term $\displaystyle ~(~(42+(38-4)))$, we can define it as follows:
 834
 835 \begin{lstHaskellLhstex}
 836 e0 :: Expr '[Print,Opt]
 837 e0 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4)))
 838 \end{lstHaskellLhstex}
 839
 840 It is also possible to define terms in the \gls{DSL} as being overloaded in the interpretation.
 841 This does require enumerating all the \haskelllhstexinline{CreateRecord} type classes for every extension similarly as was required for \haskelllhstexinline{GDict}.
 842 At the call site, the concrete list of constraints must be known.
 843
 844 \begin{lstHaskellLhstex}
 845 e1 :: (Typeable c, CreateRecord (Neg c) c, CreateRecord (Subst c) c)
 846         => Expr c
 847 e1 = neg (neg (Lit 42 `Add` (Lit 38 `subt` Lit 4)))
 848 \end{lstHaskellLhstex}
 849
 850 Finally, using the \GHCmod{TypeFamilies} extension, type families can be created for bundling \haskelllhstexinline{`In`} constraints (\haskelllhstexinline{UsingExt}) and \haskelllhstexinline{CreateRecord} constraints \mbox{(\haskelllhstexinline{DependsOn})}, making the syntax even more descriptive.
 851 E.g.\ \mbox{\haskelllhstexinline{UsingExt '[A, B, C] c}} expands to\newline\haskelllhstexinline{(CreateRecord (A c) c, CreateRecord (B c) c, CreateRecord (C c) c)}.
 852 \hspace{-1.42284pt} Similarly, \haskelllhstexinline{DependsOn '[A, B, C] s} expands to \haskelllhstexinline{(A `In` s, B `In` s, C `In` s)}.
 853
 854 \begin{lstHaskellLhstex}
 855 type family UsingExt cs c :: Constraint where
 856         UsingExt '[] c = ()
 857         UsingExt (d ': cs) c = (CreateRecord (d c) c, UsingExt cs c)
 858
 859 type family DependsOn cs c :: Constraint where
 860         DependsOn '[] c = ()
 861         DependsOn (d ': cs) c = (d `In` c, DependsOn cs c)
 862 \end{lstHaskellLhstex}
 863
 864 Defining the previous expression can now be done with the following shortened type that describes the semantics better:
 865
 866 \begin{lstHaskellLhstex}
 867 e1 :: (Typeable c, UsingExt '[Neg, Subst]) => Expr c
 868 \end{lstHaskellLhstex}
 869
 870 Giving an instance for \haskelllhstexinline{Interp} for \haskelllhstexinline{DataType} that uses the extensions \haskelllhstexinline{e_1, e2, ...} and depends on interpretations \haskelllhstexinline{i_1,i_2, ...} is done as follows:
 871
 872 \begin{lstHaskellLhstex}
 873 instance ( UsingExt  '[e_1, e_2, ...] s, DependsOn '[i_1, i_2, ...] s)
 874                 => Interp (DataType s) where
 875         ...
 876 \end{lstHaskellLhstex}
 877
 878 With these enhancements, there is hardly any boilerplate required to use classy deep embedding.
 879 The \haskelllhstexinline{Record} data type; the \haskelllhstexinline{CreateRecord} type class; and the \haskelllhstexinline{UsingExt} and \haskelllhstexinline{DependsOn} type families can be provided as a library only requiring the programmer to create the extension constructors with their respective implementations and smart constructors for language construct extensions.
 880 The source code for this extension can be found here: \url{https://gitlab.com/mlubbers/classydeepembedding}.\footnote{Lubbers, M. (2022): Library and examples for enhanced classy deep embedding.\ Zenodo.\ \href{https://doi.org/10.5281/zenodo.7277498}{10.5281/zenodo.7277498}.}
 881 It contains examples for expressions, expressions using \glspl{GADT}, detection of sharing in expressions (modelled after \citet{kiselyov_implementing_2011}), a \gls{GADT} version of sharing detection, and a region \gls{DSL} (modelled after \citet{sun_compositional_2022}).
 882
 883 \section{Data types and definitions}%
 884 \label{sec:cde:appendix}
 885 This appendix contains all definitions omitted for brevity.
 886
 887 \lstset{basicstyle=\tt\footnotesize}
 888 \begin{lstHaskellLhstex}[caption={Data type definitions.}]
 889 data Sub_g d a where
 890         Sub_g     :: (Eq a, Num a)      => Expr_g d a -> Expr_g d a -> Sub_g d a
 891         SubLoop_g ::                       Expr_g d a -> Sub_g d a
 892
 893 data Eq_g d a  where
 894         Eq_g      :: (Typeable a, Eq a) => Expr_g d a -> Expr_g d a -> Eq_g d Bool
 895         EqLoop_g  ::                       Expr_g d a -> Eq_g d a
 896 \end{lstHaskellLhstex}
 897
 898 \begin{lstHaskellLhstex}[caption={Smart constructors.}]
 899 sub_g :: (Typeable d, GDict (d (Sub_g d)), Eq a, Num a) =>
 900         Expr_g d a -> Expr_g d a -> Expr_g d a
 901 sub_g e1 e2 = Ext_g gdict (Sub_g e1 e2)
 902
 903 eq_g :: (Typeable d, GDict (d (Eq_g d)), Eq a, Typeable a) =>
 904         Expr_g d a -> Expr_g d a -> Expr_g d Bool
 905 eq_g e1 e2 = Ext_g gdict (Eq_g e1 e2)
 906 \end{lstHaskellLhstex}
 907
 908 \begin{lstHaskellLhstex}[caption={Semantics classes and data types.}]
 909 newtype PrintDict_g v = PrintDict_g (forall a.v a -> String)
 910
 911 class HasPrint_g d where
 912         getPrint_g :: d v -> v a -> String
 913
 914 instance HasPrint_g PrintDict_g where
 915         getPrint_g (PrintDict_g e) = e
 916
 917 newtype OptDict_g v = OptDict_g (forall a.v a -> v a)
 918
 919 class HasOpt_g d where
 920         getOpt_g :: d v -> v a -> v a
 921
 922 instance HasOpt_g OptDict_g where
 923         getOpt_g (OptDict_g e) = e
 924 \end{lstHaskellLhstex}
 925
 926 \begin{lstHaskellLhstex}[caption={{\tt GDict} instances.}]
 927 instance Print_g v => GDict (PrintDict_g v) where
 928         gdict = PrintDict_g print_g
 929 instance Opt_g v   => GDict (OptDict_g v)   where
 930         gdict = OptDict_g opt_g
 931 \end{lstHaskellLhstex}
 932
 933 \begin{lstHaskellLhstex}[caption={Evaluator instances.}]
 934 instance HasEval_g d => Eval_g (Expr_g d) where
 935         eval_g (Lit_g v)     = v
 936         eval_g (Add_g e1 e2) = eval_g e1 + eval_g e2
 937         eval_g (Ext_g d x)   = getEval_g d x
 938
 939 instance HasEval_g d => Eval_g (Sub_g d) where
 940         eval_g (Sub_g     e1 e2) = eval_g e1 - eval_g e2
 941         eval_g (SubLoop_g e)     = eval_g e
 942
 943 instance HasEval_g d => Eval_g (Neg_g d) where
 944         eval_g (Neg_g     e) = negate (eval_g e)
 945         eval_g (NegLoop_g e) = eval_g e
 946
 947 instance HasEval_g d => Eval_g (Eq_g d) where
 948         eval_g (Eq_g     e1 e2) = eval_g e1 == eval_g e2
 949         eval_g (EqLoop_g e)     = eval_g e
 950
 951 instance HasEval_g d => Eval_g (Not_g d) where
 952         eval_g (Not_g     e) = not (eval_g e)
 953         eval_g (NotLoop_g e) = eval_g e
 954 \end{lstHaskellLhstex}
 955
 956 \begin{lstHaskellLhstex}[caption={Printer instances.}]
 957 instance HasPrint_g d => Print_g (Expr_g d) where
 958         print_g (Lit_g v)     = show v
 959         print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")"
 960         print_g (Ext_g d x)   = getPrint_g d x
 961
 962 instance HasPrint_g d => Print_g (Sub_g d) where
 963         print_g (Sub_g     e1 e2) = "(" ++ print_g e1 ++ "-" ++ print_g e2 ++ ")"
 964         print_g (SubLoop_g e)     = print_g e
 965
 966 instance HasPrint_g d => Print_g (Neg_g d) where
 967         print_g (Neg_g     e) = "(negate " ++ print_g e ++ ")"
 968         print_g (NegLoop_g e) = print_g e
 969
 970 instance HasPrint_g d => Print_g (Eq_g d) where
 971         print_g (Eq_g     e1 e2) = "(" ++ print_g e1 ++ "==" ++ print_g e2 ++ ")"
 972         print_g (EqLoop_g e)     = print_g e
 973
 974 instance HasPrint_g d => Print_g (Not_g d) where
 975         print_g (Not_g     e) = "(not " ++ print_g e ++ ")"
 976         print_g (NotLoop_g e) = print_g e
 977 \end{lstHaskellLhstex}
 978
 979 \begin{lstHaskellLhstex}[caption={Optimisation instances.}]
 980 instance HasOpt_g d => Opt_g (Expr_g d) where
 981         opt_g (Lit_g v)     = Lit_g v
 982         opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of
 983                 (Lit_g 0, e2p    ) -> e2p
 984                 (e1p,     Lit_g 0) -> e1p
 985                 (e1p,     e2p    ) -> Add_g e1p e2p
 986         opt_g (Ext_g d x)   = Ext_g d (getOpt_g d x)
 987
 988 instance HasOpt_g d => Opt_g (Sub_g d) where
 989         opt_g (Sub_g     e1 e2) = case (opt_g e1, opt_g e2) of
 990                 (e1p, Lit_g 0) -> SubLoop_g e1p
 991                 (e1p, e2p    ) -> Sub_g e1p e2p
 992         opt_g (SubLoop_g e) = SubLoop_g (opt_g e)
 993
 994 instance (Typeable d, GDict (d (Neg_g d)), HasOpt_g d) => Opt_g (Neg_g d) where
 995         opt_g (Neg_g (Add_g e1 e2))
 996                 = NegLoop_g (Add_g (opt_g (neg_g e1)) (opt_g (neg_g e2)))
 997         opt_g (Neg_g (Ext_g d x))
 998                 = case fromDynamic (toDyn (getOpt_g d x)) of
 999                         Just (Neg_g e) -> NegLoop_g e
1000                         _             -> Neg_g (Ext_g d (getOpt_g d x))
1001         opt_g (Neg_g     e) = Neg_g (opt_g e)
1002         opt_g (NegLoop_g e) = NegLoop_g (opt_g e)
1003
1004 instance HasOpt_g d => Opt_g (Eq_g d) where
1005         opt_g (Eq_g     e1 e2) = Eq_g (opt_g e1) (opt_g e2)
1006         opt_g (EqLoop_g e)     = EqLoop_g (opt_g e)
1007 \end{lstHaskellLhstex}
1008 \lstset{basicstyle=\tt\small}
1009
1010 \end{subappendices}
1011
1012 \input{subfilepostamble}
1013 \end{document}