\label{chp:first-class_datatypes}%
\begin{chapterabstract}
\Gls{FP} languages are excellent for hosting \glspl{EDSL} because of their rich type systems, minimal syntax, and referential transparency.
\label{chp:first-class_datatypes}%
\begin{chapterabstract}
\Gls{FP} languages are excellent for hosting \glspl{EDSL} because of their rich type systems, minimal syntax, and referential transparency.
\end{enumerate*}
The functions for this are simply not available automatically in the embedded language.
For some semantics---such as an interpreter---it is possible to directly lift the functions from the host language to the \gls{DSL}.
\end{enumerate*}
The functions for this are simply not available automatically in the embedded language.
For some semantics---such as an interpreter---it is possible to directly lift the functions from the host language to the \gls{DSL}.
-In other cases---e.g.\ \emph{compiling} \glspl{DSL} such as a compiler or a printer---this is not possible \citep{elliott_compiling_2003}. %the torget this is not possible. cannot just be lifted from the host language to the \gls{DSL} so it requires a lot of boilerplate to define and implement them.
-Thus, all of the operations on the data type have to be defined by hand requiring a lot of plumbing and resulting in a lot of boilerplate code.
+In other cases---e.g.\ \emph{compiling} \glspl{DSL} such as a compiler or a printer---this is not possible \citep{elliott_compiling_2003}. %the torget this is not possible. Cannot just be lifted from the host language to the \gls{DSL} so it requires a lot of boilerplate to define and implement them.
+Thus, all the operations on the data type have to be defined by hand requiring a lot of plumbing and resulting in a lot of boilerplate code.
To relieve the burden of adding all these functions, metaprogramming\nobreak---\nobreak\hskip0pt and custom quasiquoters---can be used.
Metaprogramming entails that some parts of the program are generated by a program itself, i.e.\ the program is data.
To relieve the burden of adding all these functions, metaprogramming\nobreak---\nobreak\hskip0pt and custom quasiquoters---can be used.
Metaprogramming entails that some parts of the program are generated by a program itself, i.e.\ the program is data.
\haskellinline{newtype}s are special data types only consisting a single constructor with one field to which the type is isomorphic.
During compilation the constructor is completely removed resulting in no overhead \citep[\citesection{4.2.3}]{peyton_jones_haskell_2003}.
}
\haskellinline{newtype}s are special data types only consisting a single constructor with one field to which the type is isomorphic.
During compilation the constructor is completely removed resulting in no overhead \citep[\citesection{4.2.3}]{peyton_jones_haskell_2003}.
}
-Since the language is typed, the printer data type has to have a type variable but it is only used during typing---i.e.\ a phantom type \citep{leijen_domain_2000}:
+Since the language is typed, the printer data type has to have a type variable, but it is only used during typing---i.e.\ a phantom type \citep{leijen_domain_2000}:
\subsection{Functions}
Adding functions to the language is achieved by adding a multi-parameter class to the \gls{DSL}.
The type of the class function allows for the implementation to only allow first-order functions by supplying the arguments in a tuple.
\subsection{Functions}
Adding functions to the language is achieved by adding a multi-parameter class to the \gls{DSL}.
The type of the class function allows for the implementation to only allow first-order functions by supplying the arguments in a tuple.
Finally, by defining the functions as a \gls{HOAS} type safety is achieved \citep{pfenning_higher-order_1988,chlipala_parametric_2008}.
The complete definition looks as follows:
Finally, by defining the functions as a \gls{HOAS} type safety is achieved \citep{pfenning_higher-order_1988,chlipala_parametric_2008}.
The complete definition looks as follows:
Lifting values from the host language to the \gls{DSL} is possible using the \haskellinline{lit} function as long as the type of the value has instances for all the class constraints.
Unfortunately, once lifted, it is not possible to do anything with values of the user-defined data type other than passing them around.
It is not possible to construct new values from expressions in the \gls{DSL}, to deconstruct a value into the fields, nor to test of which constructor the value is.
Lifting values from the host language to the \gls{DSL} is possible using the \haskellinline{lit} function as long as the type of the value has instances for all the class constraints.
Unfortunately, once lifted, it is not possible to do anything with values of the user-defined data type other than passing them around.
It is not possible to construct new values from expressions in the \gls{DSL}, to deconstruct a value into the fields, nor to test of which constructor the value is.
-Furthermore, while in the our language the only constraint is the automatically derivable \haskellinline{Show}, in real-world languages the class constraints may be very difficult to satisfy for complex types, for example serialisation to a single stack cell in the case of a compiler.
+Furthermore, while in our language the only constraint is the automatically derivable \haskellinline{Show}, in real-world languages the class constraints may be very difficult to satisfy for complex types, for example serialisation to a single stack cell in the case of a compiler.
As a consequence, for user-defined data types---such as a pro\-gram\-mer-defined list type\footnotemark---to become first-class citizens in the \gls{DSL}, language constructs for constructors, deconstructors and constructor predicates must be defined.
Field selectors are also useful functions for working with user-defined data types, they are not considered for the sake of brevity but can be implemented using the deconstructor functions.
As a consequence, for user-defined data types---such as a pro\-gram\-mer-defined list type\footnotemark---to become first-class citizens in the \gls{DSL}, language constructs for constructors, deconstructors and constructor predicates must be defined.
Field selectors are also useful functions for working with user-defined data types, they are not considered for the sake of brevity but can be implemented using the deconstructor functions.
Furthermore, instances for the \gls{DSL}'s views need to be created.
For example, to use the interpreter, the following instance must be available.
Note that at first glance, it would feel natural to have \haskellinline{isNil} and \haskellinline{isCons} return \haskellinline{Nothing} since we are in the \haskellinline{Maybe} monad.
Furthermore, instances for the \gls{DSL}'s views need to be created.
For example, to use the interpreter, the following instance must be available.
Note that at first glance, it would feel natural to have \haskellinline{isNil} and \haskellinline{isCons} return \haskellinline{Nothing} since we are in the \haskellinline{Maybe} monad.
The \haskellinline{Q} monad facilitates failure, reification and fresh identifier generation for hygienic macros \citep{kohlbecker_hygienic_1986}.
Within the \haskellinline{Q} monad, capturable and non-capturable identifiers can be generated using the \haskellinline{mkName} and \haskellinline{newName} functions respectively.
The \emph{Peter Parker principle}\footnote{With great power comes great responsibility.} holds for the \haskellinline{Q} monad as well because it executes at compile time and is very powerful.
The \haskellinline{Q} monad facilitates failure, reification and fresh identifier generation for hygienic macros \citep{kohlbecker_hygienic_1986}.
Within the \haskellinline{Q} monad, capturable and non-capturable identifiers can be generated using the \haskellinline{mkName} and \haskellinline{newName} functions respectively.
The \emph{Peter Parker principle}\footnote{With great power comes great responsibility.} holds for the \haskellinline{Q} monad as well because it executes at compile time and is very powerful.
-For example it can subvert module boundaries, thus accessing constructors that were hidden; access the structure of abstract types; and it may cause side effects during compilation because it is possible to call \haskellinline{IO} operations \citep{terei_safe_2012}.
+For example, it can subvert module boundaries, thus accessing constructors that were hidden; access the structure of abstract types; and it may cause side effects during compilation because it is possible to call \haskellinline{IO} operations \citep{terei_safe_2012}.
Reification is done using the \haskellinline{reify :: Name -> Q Info} function.
The \haskellinline{Info} type is an \gls{ADT} containing all the---known to the compiler---information about the matching type: constructors, instances, \etc.
Reification is done using the \haskellinline{reify :: Name -> Q Info} function.
The \haskellinline{Info} type is an \gls{ADT} containing all the---known to the compiler---information about the matching type: constructors, instances, \etc.
With the power of metaprogramming, we can generate the boilerplate code for our user-defined data types automatically at compile time.
To generate the code required for the \gls{DSL}, we define the \haskellinline{genDSL} function.
The type belonging to the name passed as an argument to this function is made available for the \gls{DSL} by generating the \haskellinline{typeDSL} class and view instances.
With the power of metaprogramming, we can generate the boilerplate code for our user-defined data types automatically at compile time.
To generate the code required for the \gls{DSL}, we define the \haskellinline{genDSL} function.
The type belonging to the name passed as an argument to this function is made available for the \gls{DSL} by generating the \haskellinline{typeDSL} class and view instances.
\end{lstHaskell}
To be able to define a printer that is somewhat more powerful, we provide instances for \haskellinline{MonadWriter}; add a state for fresh variables and a context; and define some helper functions the \haskellinline{Printer} datatype.
\end{lstHaskell}
To be able to define a printer that is somewhat more powerful, we provide instances for \haskellinline{MonadWriter}; add a state for fresh variables and a context; and define some helper functions the \haskellinline{Printer} datatype.
\haskellinline{printCons} prints a constructor name followed by an expression, it inserts parenthesis only when required depending on the state.
\haskellinline{paren} always prints parenthesis around the given printer.
\haskellinline{>->} is a variant of the sequence operator \haskellinline{>>} from the \haskellinline{Monad} class, it prints whitespace in between the arguments.
\haskellinline{printCons} prints a constructor name followed by an expression, it inserts parenthesis only when required depending on the state.
\haskellinline{paren} always prints parenthesis around the given printer.
\haskellinline{>->} is a variant of the sequence operator \haskellinline{>>} from the \haskellinline{Monad} class, it prints whitespace in between the arguments.
Custom quasiquoters allow the \gls{DSL} user to enter fragments verbatim, bypassing the syntax of the host language.
Pattern matching in general is not suitable for a custom quasiquoter because it does not really fit in one of the four syntactic categories for which custom quasiquoter support is available.
However, a concrete use of pattern matching, interesting enough to be beneficial, but simple enough for a demonstration is the \emph{simple case expression}, a case expression that does not contain nested patterns and is always exhaustive.
Custom quasiquoters allow the \gls{DSL} user to enter fragments verbatim, bypassing the syntax of the host language.
Pattern matching in general is not suitable for a custom quasiquoter because it does not really fit in one of the four syntactic categories for which custom quasiquoter support is available.
However, a concrete use of pattern matching, interesting enough to be beneficial, but simple enough for a demonstration is the \emph{simple case expression}, a case expression that does not contain nested patterns and is always exhaustive.
In contrast to the binary literal quasiquoter example, we do not create the parser by hand.
The parser combinator library \emph{parsec} is used instead to ease the creation of the parser \citep{leijen_parsec_2001}.
In contrast to the binary literal quasiquoter example, we do not create the parser by hand.
The parser combinator library \emph{parsec} is used instead to ease the creation of the parser \citep{leijen_parsec_2001}.
\Citet{mcdonell_embedded_2022} extends on this idea, resulting in a very similar but different solution to ours.
They used the technique that \citeauthor{atkey_unembedding_2009} showed and applied it to deep embedding using the concrete syntax of the host language.
The scaffolding---e.g.\ generating the pairs, sums and injections---for embedding is automated using generics but the required pattern synonyms are generated using \gls{TH}.
\Citet{mcdonell_embedded_2022} extends on this idea, resulting in a very similar but different solution to ours.
They used the technique that \citeauthor{atkey_unembedding_2009} showed and applied it to deep embedding using the concrete syntax of the host language.
The scaffolding---e.g.\ generating the pairs, sums and injections---for embedding is automated using generics but the required pattern synonyms are generated using \gls{TH}.
Furthermore, our implementation does not require a generic function to trace all constructors, resulting in problems with (mutual) recursion.
\Citet{young_adding_2021} added pattern matching to a deeply \gls{EDSL} using a compiler plugin.
Furthermore, our implementation does not require a generic function to trace all constructors, resulting in problems with (mutual) recursion.
\Citet{young_adding_2021} added pattern matching to a deeply \gls{EDSL} using a compiler plugin.
Under the hood, this function translates the pattern match to constructors, deconstructors, and constructor predicates.
The main difference with this work is that it requires a compiler plugin while our metaprogramming approach works on any compiler supporting a metaprogramming system similar to \gls{TH}.
Under the hood, this function translates the pattern match to constructors, deconstructors, and constructor predicates.
The main difference with this work is that it requires a compiler plugin while our metaprogramming approach works on any compiler supporting a metaprogramming system similar to \gls{TH}.
Metaprogramming in general is a very broad research topic and has been around for years already.
We therefore do not claim an exhaustive overview of related work on all aspects of metaprogramming.
Metaprogramming in general is a very broad research topic and has been around for years already.
We therefore do not claim an exhaustive overview of related work on all aspects of metaprogramming.
\Citet{czarnecki_dsl_2004} provide a more detailed comparison of different metaprogramming techniques.
They compare staged interpreters, metaprogramming and templating by comparing MetaOCaml, \gls{TH} and \gls{CPP} templates.
\gls{TH} has been used to implement related work.
\Citet{czarnecki_dsl_2004} provide a more detailed comparison of different metaprogramming techniques.
They compare staged interpreters, metaprogramming and templating by comparing MetaOCaml, \gls{TH} and \gls{CPP} templates.
\gls{TH} has been used to implement related work.
\Citet{duregard_embedded_2011} wrote a parser generator using \gls{TH} and the custom quasiquoting facilities.
From a specification of the grammar, given in verbatim using a custom quasiquoter, a parser is generated at compile time.
\Citet{shioda_libdsl_2014} used metaprogramming in the D programming language to create a \gls{DSL} toolkit.
\Citet{duregard_embedded_2011} wrote a parser generator using \gls{TH} and the custom quasiquoting facilities.
From a specification of the grammar, given in verbatim using a custom quasiquoter, a parser is generated at compile time.
\Citet{shioda_libdsl_2014} used metaprogramming in the D programming language to create a \gls{DSL} toolkit.
\Citet{blanchette_liquid_2022} use \gls{TH} to simplify the development of Liquid \gls{HASKELL} proofs.
\Citet{folmer_high-level_2022} used \gls{TH} to synthesize C$\lambda$aSH \citep{baaij_digital_2015} \glspl{AST} to be processed.
In similar fashion, \citet{materzok_generating_2022} used \gls{TH} to translate YieldFSM programs to {C$\lambda$aSH}.
\subsubsection{Optimisation}
Besides generating code, it is also possible to analyse existing code and perform optimisations.
\Citet{blanchette_liquid_2022} use \gls{TH} to simplify the development of Liquid \gls{HASKELL} proofs.
\Citet{folmer_high-level_2022} used \gls{TH} to synthesize C$\lambda$aSH \citep{baaij_digital_2015} \glspl{AST} to be processed.
In similar fashion, \citet{materzok_generating_2022} used \gls{TH} to translate YieldFSM programs to {C$\lambda$aSH}.
\subsubsection{Optimisation}
Besides generating code, it is also possible to analyse existing code and perform optimisations.
For example, \citet{lynagh_unrolling_2003} implemented various optimisations in \gls{TH} such as automatic loop unrolling.
The compile-time executed functions analyse the recursive function and unroll the recursion to a fixed depth to trade execution speed for program space.
Also, \citet{odonnell_embedding_2004} embedded Hydra, a hardware description language, in \gls{HASKELL} utilising \gls{TH}.
For example, \citet{lynagh_unrolling_2003} implemented various optimisations in \gls{TH} such as automatic loop unrolling.
The compile-time executed functions analyse the recursive function and unroll the recursion to a fixed depth to trade execution speed for program space.
Also, \citet{odonnell_embedding_2004} embedded Hydra, a hardware description language, in \gls{HASKELL} utilising \gls{TH}.
\Citet{egi_embedding_2022} extended \gls{HASKELL} to support non-free data type pattern matching---i.e.\ data type with no standard form, e.g.\ sets, graphs---using \gls{TH}.
Using quasiquotation, they make a complicated embedding of non-linear pattern matching available through a simple lens.
\Citet{egi_embedding_2022} extended \gls{HASKELL} to support non-free data type pattern matching---i.e.\ data type with no standard form, e.g.\ sets, graphs---using \gls{TH}.
Using quasiquotation, they make a complicated embedding of non-linear pattern matching available through a simple lens.
\Gls{TTH} is a very recent extension/alternative to normal \gls{TH} \citep{pickering_multi-stage_2019,xie_staging_2022}.
Where in \gls{TH} you can manipulate arbitrary parts of the syntax tree, add top-level splices of data types, definitions and functions, in \gls{TTH} the programmer can only splice expressions but the \gls{AST} fragments representing the expressions are well-typed by construction instead of untyped.
\Gls{TTH} is a very recent extension/alternative to normal \gls{TH} \citep{pickering_multi-stage_2019,xie_staging_2022}.
Where in \gls{TH} you can manipulate arbitrary parts of the syntax tree, add top-level splices of data types, definitions and functions, in \gls{TTH} the programmer can only splice expressions but the \gls{AST} fragments representing the expressions are well-typed by construction instead of untyped.