Deeply embedded languages represent their language constructs as data and the semantics are functions on it.
As a result, the language constructs are embedded in the semantics, hence adding new language constructs is laborious where adding semantics is trouble free.
Deeply embedded languages represent their language constructs as data and the semantics are functions on it.
As a result, the language constructs are embedded in the semantics, hence adding new language constructs is laborious where adding semantics is trouble free.
So-called classy deep embedding results in \glspl{DSL} that are extensible both in language constructs and in semantics while maintaining a concrete abstract syntax tree.
Additionally, little type-level trickery or complicated boilerplate code is required to achieve this.
\end{chapterabstract}
So-called classy deep embedding results in \glspl{DSL} that are extensible both in language constructs and in semantics while maintaining a concrete abstract syntax tree.
Additionally, little type-level trickery or complicated boilerplate code is required to achieve this.
\end{chapterabstract}
The semantics of the \gls{DSL} have to be combined and must hold some kind of state or context, so that structural information is not lost \citep{kiselyov_typed_2012}.
\subsection{Research contribution}
The semantics of the \gls{DSL} have to be combined and must hold some kind of state or context, so that structural information is not lost \citep{kiselyov_typed_2012}.
\subsection{Research contribution}
The presented basic technique, christened \emph{classy deep embedding}, does not require advanced type system extensions to be used.
However, it is suitable for type system extensions such as \glspl{GADT}.
The presented basic technique, christened \emph{classy deep embedding}, does not require advanced type system extensions to be used.
However, it is suitable for type system extensions such as \glspl{GADT}.
\Gls{HASKELL} \citep{peyton_jones_haskell_2003} program using some minor extensions provided by \gls{GHC} \citep{ghc_team_ghc_2021}, the idea is applicable to other languages as well\footnotemark.
\footnotetext{Lubbers, M. (2021): Literate Haskell/lhs2\TeX{} source code of the paper ``Deep Embedding
with Class'': TFP 2022.\ DANS.\ \url{https://doi.org/10.5281/zenodo.5081386}.}
\Gls{HASKELL} \citep{peyton_jones_haskell_2003} program using some minor extensions provided by \gls{GHC} \citep{ghc_team_ghc_2021}, the idea is applicable to other languages as well\footnotemark.
\footnotetext{Lubbers, M. (2021): Literate Haskell/lhs2\TeX{} source code of the paper ``Deep Embedding
with Class'': TFP 2022.\ DANS.\ \url{https://doi.org/10.5281/zenodo.5081386}.}
\section{Deep embedding}
Pick a \gls{DSL}, any \gls{DSL}, pick the language of literal integers and addition.
In deep embedding, terms in the language are represented by data in the host language.
\section{Deep embedding}
Pick a \gls{DSL}, any \gls{DSL}, pick the language of literal integers and addition.
In deep embedding, terms in the language are represented by data in the host language.
-Hence, defining the constructs is as simple as creating the following algebraic data type\footnote{All data types and functions are subscripted to indicate the evolution.}.
+Hence, defining the constructs is as simple as creating the following algebraic data type\footnote{All data types and functions are subscripted to indicate the evolution. When definitions are omitted for version $n$, version $n-1$ is assumed.}.
Let us rethink the deeply embedded \gls{DSL} design.
Remember that in shallow embedding, the semantics are embedded in the language construct functions.
Obtaining extensibility both in constructs and semantics was accomplished by abstracting the semantics functions to type classes, making the constructs overloaded in the semantics.
Let us rethink the deeply embedded \gls{DSL} design.
Remember that in shallow embedding, the semantics are embedded in the language construct functions.
Obtaining extensibility both in constructs and semantics was accomplished by abstracting the semantics functions to type classes, making the constructs overloaded in the semantics.
In shallow embedding, the implementation for these types of semantics is difficult because there is no tangible abstract syntax tree.
In off-the-shelf deep embedding this is effortless since the function can pattern match on the constructor or structures of constructors.
In shallow embedding, the implementation for these types of semantics is difficult because there is no tangible abstract syntax tree.
In off-the-shelf deep embedding this is effortless since the function can pattern match on the constructor or structures of constructors.
The only interesting bit occurs in the \haskelllhstexinline{Add_3} constructor, where we pattern match on the optimised children to determine whether an addition with zero is performed.
If this is the case, the addition is removed.
The only interesting bit occurs in the \haskelllhstexinline{Add_3} constructor, where we pattern match on the optimised children to determine whether an addition with zero is performed.
If this is the case, the addition is removed.
-As a consequence, defining reusable expressions that are overloaded in their semantics requires quite some type class constraints that cannot be inferred by the compiler (yet) if they use many extensions.
+As a consequence, defining reuseable expressions that are overloaded in their semantics requires quite some type class constraints that cannot be inferred by the compiler (yet) if they use many extensions.
Solving this remains future work.
For example, the expression $\sim(42-38)+1$ has to be defined as:
Solving this remains future work.
For example, the expression $\sim(42-38)+1$ has to be defined as:
\Glspl{GADT} are enriched data types that allow the type instantiation of the constructor to be explicitly defined \citep{cheney_first-class_2003,hinze_fun_2003}.
Leveraging \glspl{GADT}, deeply embedded \glspl{DSL} can be made statically type safe even when different value types are supported.
Even when \glspl{GADT} are not supported natively in the language, they can be simulated using embedding-projection pairs or equivalence types \citep[\citesection{2.2}]{cheney_lightweight_2002}.
Where some solutions to the expression problem do not easily generalise to \glspl{GADT} (see \cref{sec:cde:related}), classy deep embedding does.
\Glspl{GADT} are enriched data types that allow the type instantiation of the constructor to be explicitly defined \citep{cheney_first-class_2003,hinze_fun_2003}.
Leveraging \glspl{GADT}, deeply embedded \glspl{DSL} can be made statically type safe even when different value types are supported.
Even when \glspl{GADT} are not supported natively in the language, they can be simulated using embedding-projection pairs or equivalence types \citep[\citesection{2.2}]{cheney_lightweight_2002}.
Where some solutions to the expression problem do not easily generalise to \glspl{GADT} (see \cref{sec:cde:related}), classy deep embedding does.
To make the existing \gls{DSL} constructs more general, we relax the types of those constructors.
For example, operations on integers now work on all numerals instead.
Moreover, the \haskelllhstexinline{Lit_g} constructor can be used to lift values of any type to the \gls{DSL} domain as long as they have a \haskelllhstexinline{Show} instance, required for the printer.
To make the existing \gls{DSL} constructs more general, we relax the types of those constructors.
For example, operations on integers now work on all numerals instead.
Moreover, the \haskelllhstexinline{Lit_g} constructor can be used to lift values of any type to the \gls{DSL} domain as long as they have a \haskelllhstexinline{Show} instance, required for the printer.
-The smart constructors for the language extensions inherit the class constraints of their data types and include a \haskelllhstexinline{Typeable} constraint on the \haskelllhstexinline{d} type variable for it to be usable in the \haskelllhstexinline{Ext_g} constructor as can be seen in the smart constructor for \haskelllhstexinline{Neg_g}:
+The smart constructors for the language extensions inherit the class constraints of their data types and include a \haskelllhstexinline{Typeable} constraint on the \haskelllhstexinline{d} type variable for it to be useable in the \haskelllhstexinline{Ext_g} constructor as can be seen in the smart constructor for \haskelllhstexinline{Neg_g}:
\begin{lstHaskellLhstex}
neg_g :: (Typeable d, GDict (d (Neg_g d)), Typeable a, Num a) =>
\begin{lstHaskellLhstex}
neg_g :: (Typeable d, GDict (d (Neg_g d)), Typeable a, Num a) =>
Now that the shape of the type classes has changed, the dictionary data types and the type classes need to be adapted as well.
The introduced type variable \haskelllhstexinline{a} is not an argument to the type class, so it should not be an argument to the dictionary data type.
To represent this type class function, a rank-2 polymorphic function is needed \citep[\citesection{6.4.15}]{ghc_team_ghc_2021}\citep{odersky_putting_1996}.
Now that the shape of the type classes has changed, the dictionary data types and the type classes need to be adapted as well.
The introduced type variable \haskelllhstexinline{a} is not an argument to the type class, so it should not be an argument to the dictionary data type.
To represent this type class function, a rank-2 polymorphic function is needed \citep[\citesection{6.4.15}]{ghc_team_ghc_2021}\citep{odersky_putting_1996}.
\Citet{loh_open_2006} proposed a language extension that allows open data types and open functions, i.e.\ functions and data types that can be extended with more cases later on.
They hinted at the possibility of using type classes for open functions but had serious concerns that pattern matching would be crippled because constructors are becoming types, thus ultimately becoming impossible to type.
\Citet{loh_open_2006} proposed a language extension that allows open data types and open functions, i.e.\ functions and data types that can be extended with more cases later on.
They hinted at the possibility of using type classes for open functions but had serious concerns that pattern matching would be crippled because constructors are becoming types, thus ultimately becoming impossible to type.
A technique similar to classy deep embedding was proposed by \citet{najd_trees_2017} to tackle a slightly different problem, namely that of reusing a data type for multiple purposes in a slightly different form.
For example to decorate the abstract syntax tree of a compiler differently for each phase of the compiler.
A technique similar to classy deep embedding was proposed by \citet{najd_trees_2017} to tackle a slightly different problem, namely that of reusing a data type for multiple purposes in a slightly different form.
For example to decorate the abstract syntax tree of a compiler differently for each phase of the compiler.
\Citet{sun_compositional_2022} provided a thorough comparison of embedding techniques including more axes than just the two stated in the expression problem.
\Cref{tbl:dsl_comparison_brief} shows a variant of their comparison table.
\Citet{sun_compositional_2022} provided a thorough comparison of embedding techniques including more axes than just the two stated in the expression problem.
\Cref{tbl:dsl_comparison_brief} shows a variant of their comparison table.
The \emph{poly.} style of embedding---including tagless-final---falls short of this requirement.
Intensional analysis is an umbrella term for pattern matching and transformations.
The \emph{poly.} style of embedding---including tagless-final---falls short of this requirement.
Intensional analysis is an umbrella term for pattern matching and transformations.
Simple type system describes the whether it is possible to encode this embedding technique without many type system extensions.
In classy deep embedding, there is either a bit more scaffolding and boilerplate required or advanced type system extensions need to be used.
Simple type system describes the whether it is possible to encode this embedding technique without many type system extensions.
In classy deep embedding, there is either a bit more scaffolding and boilerplate required or advanced type system extensions need to be used.
Little boilerplate denotes the amount of scaffolding and boilerplate required.
For example, hybrid embedding requires a transcoding step between the deep syntax and the shallow core language.
Little boilerplate denotes the amount of scaffolding and boilerplate required.
For example, hybrid embedding requires a transcoding step between the deep syntax and the shallow core language.
\begin{threeparttable}[b]
\small
\caption{Comparison of embedding techniques, extended from \citet[\citesection{3.6}]{sun_compositional_2022}.}%
\begin{threeparttable}[b]
\small
\caption{Comparison of embedding techniques, extended from \citet[\citesection{3.6}]{sun_compositional_2022}.}%
However, if needed, the technique generalises to \glspl{GADT} as well, adding rank-2 types to the list of type system requirements as well.
Finally, the abstract syntax tree remains observable which makes it suitable for intensional analyses, albeit using occasional dynamic typing for truly cross-extensional transformations.
However, if needed, the technique generalises to \glspl{GADT} as well, adding rank-2 types to the list of type system requirements as well.
Finally, the abstract syntax tree remains observable which makes it suitable for intensional analyses, albeit using occasional dynamic typing for truly cross-extensional transformations.
\Cref{sec:classy_reprise} shows how the boilerplate can be minimised using advanced type system extensions.
\section*{Acknowledgements}
\Cref{sec:classy_reprise} shows how the boilerplate can be minimised using advanced type system extensions.
\section*{Acknowledgements}
\label{sec:classy_reprise}
One of the unique selling points of this novel \gls{DSL} embedding technique is that it, in its basic form, does not require advanced type system extensions nor a lot of boilerplate.
However, generalising the technique to \glspl{GADT} arguably unleashes a cesspool of \emph{unsafe} compiler extensions.
\label{sec:classy_reprise}
One of the unique selling points of this novel \gls{DSL} embedding technique is that it, in its basic form, does not require advanced type system extensions nor a lot of boilerplate.
However, generalising the technique to \glspl{GADT} arguably unleashes a cesspool of \emph{unsafe} compiler extensions.
In classy deep embedding, the \gls{DSL} datatype is parametrised by a type variable providing a witness to the interpretation on the language.
When using multiple interpretations, these need to be bundled in a data type.
Using the \gls{GHC}'s \GHCmod{ConstraintKind} extension, we can make these witnesses explicit, tying into \gls{HASKELL}'s type system immediately.
In classy deep embedding, the \gls{DSL} datatype is parametrised by a type variable providing a witness to the interpretation on the language.
When using multiple interpretations, these need to be bundled in a data type.
Using the \gls{GHC}'s \GHCmod{ConstraintKind} extension, we can make these witnesses explicit, tying into \gls{HASKELL}'s type system immediately.
-Furthermore, this constraint does not necessarily has to be a single constraint, after enabling \GHCmod{DataKinds} and \GHCmod{TypeOperators}, we can encode lists of witnesses instead \citep{yorgey_giving_2012}.
+Furthermore, this constraint does not necessarily have to be a single constraint, after enabling \GHCmod{DataKinds} and \GHCmod{TypeOperators}, we can encode lists of witnesses instead \citep{yorgey_giving_2012}.
The data type for this list of witnesses is \haskelllhstexinline{Record} as shown in \cref{lst_cbde:record_type}.
This \gls{GADT} is parametrised by two type variables.
The first type variable (\haskelllhstexinline{dt}) is the type or type constructor on which the constraints can be applied and the second type variable (\haskelllhstexinline{clist}) is the list of constraints constructors itself.
This means that when \haskelllhstexinline{Cons} is pattern matched, the overloading of the type class constraint for \haskelllhstexinline{c dt} can be solved by the compiler.
\GHCmod{KindSignatures} is used to force the kinds of the type parameters and the kind of \haskelllhstexinline{dt} is polymorphic (\GHCmod{PolyKinds}) so that the \haskelllhstexinline{Record} data type can be used for \glspl{DSL} using type classes but also type constructor classes (e.g.\ when using \glspl{GADT}).
The data type for this list of witnesses is \haskelllhstexinline{Record} as shown in \cref{lst_cbde:record_type}.
This \gls{GADT} is parametrised by two type variables.
The first type variable (\haskelllhstexinline{dt}) is the type or type constructor on which the constraints can be applied and the second type variable (\haskelllhstexinline{clist}) is the list of constraints constructors itself.
This means that when \haskelllhstexinline{Cons} is pattern matched, the overloading of the type class constraint for \haskelllhstexinline{c dt} can be solved by the compiler.
\GHCmod{KindSignatures} is used to force the kinds of the type parameters and the kind of \haskelllhstexinline{dt} is polymorphic (\GHCmod{PolyKinds}) so that the \haskelllhstexinline{Record} data type can be used for \glspl{DSL} using type classes but also type constructor classes (e.g.\ when using \glspl{GADT}).
data Record (dt :: k) (clist :: [k -> Constraint]) where
Nil :: Record dt '[]
Cons :: c dt => Record dt cs -> Record dt (c ': cs)
data Record (dt :: k) (clist :: [k -> Constraint]) where
Nil :: Record dt '[]
Cons :: c dt => Record dt cs -> Record dt (c ': cs)
To incorporate this type in the \haskelllhstexinline{Expr} type, the \haskelllhstexinline{Ext} constructor changes from containing a single witness dictionary to a \haskelllhstexinline{Record} type containing all the required dictionaries.
To incorporate this type in the \haskelllhstexinline{Expr} type, the \haskelllhstexinline{Ext} constructor changes from containing a single witness dictionary to a \haskelllhstexinline{Record} type containing all the required dictionaries.
Furthermore, we define a type class (\haskelllhstexinline{In}) that allows us to extract explicit dictionaries \haskelllhstexinline{Dict} from these records if the constraint can is present in the list.
Since the constraints become available as soon as the \haskelllhstexinline{Cons} constructor is matched, the implementation is a type-level list traversal.
Furthermore, we define a type class (\haskelllhstexinline{In}) that allows us to extract explicit dictionaries \haskelllhstexinline{Dict} from these records if the constraint can is present in the list.
Since the constraints become available as soon as the \haskelllhstexinline{Cons} constructor is matched, the implementation is a type-level list traversal.
This type class creates a record structure cons by cons if and only if all type class constraints are available in the list of constraints.
It is not required to provide instances for this for specific records or type classes, the two instances describe all the required constraints.
This type class creates a record structure cons by cons if and only if all type class constraints are available in the list of constraints.
It is not required to provide instances for this for specific records or type classes, the two instances describe all the required constraints.
class CreateRecord dt c where
createRecord :: Record dt c
instance CreateRecord d '[] where
class CreateRecord dt c where
createRecord :: Record dt c
instance CreateRecord d '[] where
Finally, using the \GHCmod{TypeFamilies} extension, type families can be created for bundling \haskelllhstexinline{`In`} constraints (\haskelllhstexinline{UsingExt}) and \haskelllhstexinline{CreateRecord} constraints \mbox{(\haskelllhstexinline{DependsOn})}, making the syntax even more descriptive.
E.g.\ \mbox{\haskelllhstexinline{UsingExt '[A, B, C] c}} expands to\newline\haskelllhstexinline{(CreateRecord (A c) c, CreateRecord (B c) c, CreateRecord (C c) c)}.
Finally, using the \GHCmod{TypeFamilies} extension, type families can be created for bundling \haskelllhstexinline{`In`} constraints (\haskelllhstexinline{UsingExt}) and \haskelllhstexinline{CreateRecord} constraints \mbox{(\haskelllhstexinline{DependsOn})}, making the syntax even more descriptive.
E.g.\ \mbox{\haskelllhstexinline{UsingExt '[A, B, C] c}} expands to\newline\haskelllhstexinline{(CreateRecord (A c) c, CreateRecord (B c) c, CreateRecord (C c) c)}.
-Similarly, \haskelllhstexinline{DependsOn '[A, B, C] s} expands to \haskelllhstexinline{(A `In` s, B `In` s, C `In` s)}.
+\hspace{-1.42284pt} Similarly, \haskelllhstexinline{DependsOn '[A, B, C] s} expands to \haskelllhstexinline{(A `In` s, B `In` s, C `In` s)}.
sub_g :: (Typeable d, GDict (d (Sub_g d)), Eq a, Num a) =>
Expr_g d a -> Expr_g d a -> Expr_g d a
sub_g e1 e2 = Ext_g gdict (Sub_g e1 e2)
sub_g :: (Typeable d, GDict (d (Sub_g d)), Eq a, Num a) =>
Expr_g d a -> Expr_g d a -> Expr_g d a
sub_g e1 e2 = Ext_g gdict (Sub_g e1 e2)
instance Print_g v => GDict (PrintDict_g v) where
gdict = PrintDict_g print_g
instance Opt_g v => GDict (OptDict_g v) where
gdict = OptDict_g opt_g
\end{lstHaskellLhstex}
instance Print_g v => GDict (PrintDict_g v) where
gdict = PrintDict_g print_g
instance Opt_g v => GDict (OptDict_g v) where
gdict = OptDict_g opt_g
\end{lstHaskellLhstex}
instance HasPrint_g d => Print_g (Expr_g d) where
print_g (Lit_g v) = show v
print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")"
instance HasPrint_g d => Print_g (Expr_g d) where
print_g (Lit_g v) = show v
print_g (Add_g e1 e2) = "(" ++ print_g e1 ++ "+" ++ print_g e2 ++ ")"
instance HasOpt_g d => Opt_g (Expr_g d) where
opt_g (Lit_g v) = Lit_g v
opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of
instance HasOpt_g d => Opt_g (Expr_g d) where
opt_g (Lit_g v) = Lit_g v
opt_g (Add_g e1 e2) = case (opt_g e1, opt_g e2) of