\begin{chapterabstract}
\Gls{FP} languages are excellent for hosting \glspl{EDSL} because of their rich type systems, minimal syntax, and referential transparency.
However, data types defined in the host language are not automatically available in the embedded language.
To do so, all the operations on the data type must be ported to the \gls{EDSL} resulting in a lot of boilerplate.
\begin{chapterabstract}
\Gls{FP} languages are excellent for hosting \glspl{EDSL} because of their rich type systems, minimal syntax, and referential transparency.
However, data types defined in the host language are not automatically available in the embedded language.
To do so, all the operations on the data type must be ported to the \gls{EDSL} resulting in a lot of boilerplate.
We show this by providing an implementation in \gls{TH} for a typical \gls{DSL} with two different semantics.
Furthermore, we show that by utilising quasiquotation, there is hardly any burden on the syntax.
Finally, the paper also serves as a gentle introduction to \gls{TH}.
We show this by providing an implementation in \gls{TH} for a typical \gls{DSL} with two different semantics.
Furthermore, we show that by utilising quasiquotation, there is hardly any burden on the syntax.
Finally, the paper also serves as a gentle introduction to \gls{TH}.
Contrarily, adding semantics requires adapting all language constructs.
Lifting the functions to type classes, i.e.\ parametrising the constructs over the semantics, allows extension of the language both in constructs and in semantics orthogonally. This advanced style of embedding is called tagless-final or class-based shallow embedding \citep{kiselyov_typed_2012}.
Contrarily, adding semantics requires adapting all language constructs.
Lifting the functions to type classes, i.e.\ parametrising the constructs over the semantics, allows extension of the language both in constructs and in semantics orthogonally. This advanced style of embedding is called tagless-final or class-based shallow embedding \citep{kiselyov_typed_2012}.
-While it is often possible to lift values of a user-defined data type to a value in the \gls{DSL}, it is not possible to interact with it using \gls{DSL} constructs, they are not first-class citizens.
+While it is often possible to lift values of a user-defined data type to a value in the \gls{DSL}, it is not possible to interact with it using \gls{DSL} constructs, since they are not first-class citizens.
To relieve the burden of adding all these functions, metaprogramming\nobreak---\nobreak\hskip0pt and custom quasiquoters---can be used.
Metaprogramming entails that some parts of the program are generated by a program itself, i.e.\ the program is data.
To relieve the burden of adding all these functions, metaprogramming\nobreak---\nobreak\hskip0pt and custom quasiquoters---can be used.
Metaprogramming entails that some parts of the program are generated by a program itself, i.e.\ the program is data.
Metaprogramming allows functions to be added to the program at compile time based on the structure of user-defined data types.
\subsection{Contributions of the paper}
Metaprogramming allows functions to be added to the program at compile time based on the structure of user-defined data types.
\subsection{Contributions of the paper}
Furthermore, with the \haskellinline{:-} operator the syntax becomes usable.
Finally, by defining the functions as a \gls{HOAS} type safety is achieved \citep{pfenning_higher-order_1988,chlipala_parametric_2008}.
The complete definition looks as follows:
Furthermore, with the \haskellinline{:-} operator the syntax becomes usable.
Finally, by defining the functions as a \gls{HOAS} type safety is achieved \citep{pfenning_higher-order_1988,chlipala_parametric_2008}.
The complete definition looks as follows:
Unfortunately, once lifted, it is not possible to do anything with values of the user-defined data type other than passing them around.
It is not possible to construct new values from expressions in the \gls{DSL}, to deconstruct a value into the fields, nor to test of which constructor the value is.
Furthermore, while in the our language the only constraint is the automatically derivable \haskellinline{Show}, in real-world languages the class constraints may be very difficult to satisfy for complex types, for example serialisation to a single stack cell in the case of a compiler.
Unfortunately, once lifted, it is not possible to do anything with values of the user-defined data type other than passing them around.
It is not possible to construct new values from expressions in the \gls{DSL}, to deconstruct a value into the fields, nor to test of which constructor the value is.
Furthermore, while in the our language the only constraint is the automatically derivable \haskellinline{Show}, in real-world languages the class constraints may be very difficult to satisfy for complex types, for example serialisation to a single stack cell in the case of a compiler.
\gls{TH} is GHC's de facto metaprogramming system, implemented as a compiler extension together with a library \citep{sheard_template_2002}\citep[\citesection{6.13.1}]{ghc_team_ghc_2021}.
Readers already familiar with \gls{TH} can safely skip this section.
\gls{TH} is GHC's de facto metaprogramming system, implemented as a compiler extension together with a library \citep{sheard_template_2002}\citep[\citesection{6.13.1}]{ghc_team_ghc_2021}.
Readers already familiar with \gls{TH} can safely skip this section.
-\gls{TH} adds four main concepts to the language, na\-me\-ly AST data types, splicing, quasiquotation and reification.
-With this machinery, regular \gls{HASKELL} functions can be defined that are called at compile time, inserting generated code into the {AST}.
+\gls{TH} adds four main concepts to the language, na\-me\-ly \gls{AST} data types, splicing, quasiquotation and reification.
+With this machinery, regular \gls{HASKELL} functions can be defined that are called at compile time, inserting generated code into the \gls{AST}.
These functions are monadic functions operating in the \haskellinline{Q} monad.
The \haskellinline{Q} monad facilitates failure, reification and fresh identifier generation for hygienic macros \citep{kohlbecker_hygienic_1986}.
Within the \haskellinline{Q} monad, capturable and non-capturable identifiers can be generated using the \haskellinline{mkName} and \haskellinline{newName} functions respectively.
These functions are monadic functions operating in the \haskellinline{Q} monad.
The \haskellinline{Q} monad facilitates failure, reification and fresh identifier generation for hygienic macros \citep{kohlbecker_hygienic_1986}.
Within the \haskellinline{Q} monad, capturable and non-capturable identifiers can be generated using the \haskellinline{mkName} and \haskellinline{newName} functions respectively.
-Firstly, for all of \gls{HASKELL}'s AST elements, data types are provided that are mostly isomorphic to the actual data types used in the compiler.
+Firstly, for all of \gls{HASKELL}'s \gls{AST} elements, data types are provided that are mostly isomorphic to the actual data types used in the compiler.
With these data types, the entire syntax of a \gls{HASKELL} program can be specified.
Often, a data type is suffixed with the context, e.g.\ there is a \haskellinline{VarE} and a \haskellinline{VarP} for a variable in an expression or in a pattern respectively.
To give an impression of these data types, a selection of data types available in \gls{TH} is given below:
With these data types, the entire syntax of a \gls{HASKELL} program can be specified.
Often, a data type is suffixed with the context, e.g.\ there is a \haskellinline{VarE} and a \haskellinline{VarP} for a variable in an expression or in a pattern respectively.
To give an impression of these data types, a selection of data types available in \gls{TH} is given below:
For example, for the \haskellinline{LamE} constructor, the following \haskellinline{lamE} function is available.
\begin{lstHaskell}
For example, for the \haskellinline{LamE} constructor, the following \haskellinline{lamE} function is available.
\begin{lstHaskell}
Depending on the context and location of the splice, the result type is either a list of declarations, a type, an expression or a pattern.
The result of this function, when successful, is then spliced into the code and treated as regular code by the compiler.
Consequently, the code that is generated may not be type safe, in which case the compiler provides a type error on the generated code.
Depending on the context and location of the splice, the result type is either a list of declarations, a type, an expression or a pattern.
The result of this function, when successful, is then spliced into the code and treated as regular code by the compiler.
Consequently, the code that is generated may not be type safe, in which case the compiler provides a type error on the generated code.
\subsection{Quasiquotation}
Another key concept of \gls{TH} is Quasiquotation, the dual of splicing \citep{bawden_quasiquotation_1999}.
While it is possible to construct entire programs using the provided data types, it is a little cumbersome.
\subsection{Quasiquotation}
Another key concept of \gls{TH} is Quasiquotation, the dual of splicing \citep{bawden_quasiquotation_1999}.
While it is possible to construct entire programs using the provided data types, it is a little cumbersome.
-Using \emph{Oxford brackets} (\verb#[|# \ldots\verb#|]#) or single or double apostrophes, verbatim \gls{HASKELL} code can be entered that is converted automatically to the corresponding AST nodes easing the creation of language constructs.
+Using \emph{Oxford brackets} (\verb#[|# \ldots\verb#|]#) or single or double apostrophes, verbatim \gls{HASKELL} code can be entered that is converted automatically to the corresponding \gls{AST} nodes easing the creation of language constructs.
Depending on the context, different quasiquotes are used:
\begin{itemize*}
\item \haskellinline{[\|...\|]} or \haskellinline{[e\|...\|]} for expressions
Depending on the context, different quasiquotes are used:
\begin{itemize*}
\item \haskellinline{[\|...\|]} or \haskellinline{[e\|...\|]} for expressions
This information can then be used to generate code according to the structure of data types.
Reification is done using the \haskellinline{reify :: Name -> Q Info} function.
The \haskellinline{Info} type is an \gls{ADT} containing all the---known to the compiler---information about the matching type: constructors, instances, \etc.
This information can then be used to generate code according to the structure of data types.
Reification is done using the \haskellinline{reify :: Name -> Q Info} function.
The \haskellinline{Info} type is an \gls{ADT} containing all the---known to the compiler---information about the matching type: constructors, instances, \etc.
With the power of metaprogramming, we can generate the boilerplate code for our user-defined data types automatically at compile time.
To generate the code required for the \gls{DSL}, we define the \haskellinline{genDSL} function.
The type belonging to the name passed as an argument to this function is made available for the \gls{DSL} by generating the \haskellinline{typeDSL} class and view instances.
With the power of metaprogramming, we can generate the boilerplate code for our user-defined data types automatically at compile time.
To generate the code required for the \gls{DSL}, we define the \haskellinline{genDSL} function.
The type belonging to the name passed as an argument to this function is made available for the \gls{DSL} by generating the \haskellinline{typeDSL} class and view instances.
It is possible to construct and deconstruct values from other \gls{DSL} expressions, and to perform tests on the constructor but with a clunky and unwieldy syntax.
They have become first-class citizens in a grotesque way.
For example, given that we have some language constructs to denote failure and conditionals\footnotemark, writing a list summation function in our \gls{DSL} would be done as follows.
It is possible to construct and deconstruct values from other \gls{DSL} expressions, and to perform tests on the constructor but with a clunky and unwieldy syntax.
They have become first-class citizens in a grotesque way.
For example, given that we have some language constructs to denote failure and conditionals\footnotemark, writing a list summation function in our \gls{DSL} would be done as follows.
-For the sake of the argument we take a little shortcut here and assume that the interpretation of the \gls{DSL} supports lazy evaluation by using the host language as a metaprogramming language as well, allowing us to use functions in the host language to contstruct expressions in the \gls{DSL}.
+For the sake of the argument we take a little shortcut here and assume that the interpretation of the \gls{DSL} supports lazy evaluation by using the host language as a metaprogramming language as well, allowing us to use functions in the host language to construct expressions in the \gls{DSL}.
The code between \emph{dsl} brackets (\haskellinline{[dsl\|...\|]}) is preprocessed by the \haskellinline{dsl} quasiquoter.
Because the functions are executed at compile time, errors---thrown using the \haskellinline{MonadFail} instance of the \haskellinline{Q} monad---in these functions result in compile time errors.
The code between \emph{dsl} brackets (\haskellinline{[dsl\|...\|]}) is preprocessed by the \haskellinline{dsl} quasiquoter.
Because the functions are executed at compile time, errors---thrown using the \haskellinline{MonadFail} instance of the \haskellinline{Q} monad---in these functions result in compile time errors.
To illustrate writing a custom quasiquoter, we show an implementation of a quasiquoter for binary literals.
The \haskellinline{bin} quasiquoter is only defined for expressions and parses subsequent zeros and ones as a binary number and splices it back in the code as a regular integer.
To illustrate writing a custom quasiquoter, we show an implementation of a quasiquoter for binary literals.
The \haskellinline{bin} quasiquoter is only defined for expressions and parses subsequent zeros and ones as a binary number and splices it back in the code as a regular integer.
This location is inserted in the parsec parser so that errors are localised in the source code.
Then, the \haskellinline{expr} parser is called that returns an \haskellinline{Exp} in the \haskellinline{Q} monad.
The \haskellinline{expr} parser uses parsec's commodity expression parser primitive \haskellinline{buildExpressionParser}.
This location is inserted in the parsec parser so that errors are localised in the source code.
Then, the \haskellinline{expr} parser is called that returns an \haskellinline{Exp} in the \haskellinline{Q} monad.
The \haskellinline{expr} parser uses parsec's commodity expression parser primitive \haskellinline{buildExpressionParser}.
The most interesting parser is the parser for the case expression that is an alternative in the basic expression parser \haskellinline{basic}.
A case expression is parsed when a keyword \haskellinline{case} is followed by an expression that is in turn followed by a non-empty list of matches.
A match is parsed when a pattern (\haskellinline{pat}) is followed by an arrow and an expression.
The most interesting parser is the parser for the case expression that is an alternative in the basic expression parser \haskellinline{basic}.
A case expression is parsed when a keyword \haskellinline{case} is followed by an expression that is in turn followed by a non-empty list of matches.
A match is parsed when a pattern (\haskellinline{pat}) is followed by an arrow and an expression.
\Citet{mcdonell_embedded_2022} extends on this idea, resulting in a very similar but different solution to ours.
They used the technique that \citeauthor{atkey_unembedding_2009} showed and applied it to deep embedding using the concrete syntax of the host language.
The scaffolding---e.g.\ generating the pairs, sums and injections---for embedding is automated using generics but the required pattern synonyms are generated using \gls{TH}.
\Citet{mcdonell_embedded_2022} extends on this idea, resulting in a very similar but different solution to ours.
They used the technique that \citeauthor{atkey_unembedding_2009} showed and applied it to deep embedding using the concrete syntax of the host language.
The scaffolding---e.g.\ generating the pairs, sums and injections---for embedding is automated using generics but the required pattern synonyms are generated using \gls{TH}.
Furthermore, our implementation does not require a generic function to trace all constructors, resulting in problems with (mutual) recursion.
\Citet{young_adding_2021} added pattern matching to a deeply \gls{EDSL} using a compiler plugin.
Furthermore, our implementation does not require a generic function to trace all constructors, resulting in problems with (mutual) recursion.
\Citet{young_adding_2021} added pattern matching to a deeply \gls{EDSL} using a compiler plugin.
Under the hood, this function translates the pattern match to constructors, deconstructors, and constructor predicates.
The main difference with this work is that it requires a compiler plugin while our metaprogramming approach works on any compiler supporting a metaprogramming system similar to \gls{TH}.
Under the hood, this function translates the pattern match to constructors, deconstructors, and constructor predicates.
The main difference with this work is that it requires a compiler plugin while our metaprogramming approach works on any compiler supporting a metaprogramming system similar to \gls{TH}.
Metaprogramming in general is a very broad research topic and has been around for years already.
We therefore do not claim an exhaustive overview of related work on all aspects of metaprogramming.
However, we have have tried to present most research on metaprogramming in \gls{TH}.
Metaprogramming in general is a very broad research topic and has been around for years already.
We therefore do not claim an exhaustive overview of related work on all aspects of metaprogramming.
However, we have have tried to present most research on metaprogramming in \gls{TH}.
\Citet{duregard_embedded_2011} wrote a parser generator using \gls{TH} and the custom quasiquoting facilities.
From a specification of the grammar, given in verbatim using a custom quasiquoter, a parser is generated at compile time.
\Citet{shioda_libdsl_2014} used metaprogramming in the D programming language to create a \gls{DSL} toolkit.
\Citet{duregard_embedded_2011} wrote a parser generator using \gls{TH} and the custom quasiquoting facilities.
From a specification of the grammar, given in verbatim using a custom quasiquoter, a parser is generated at compile time.
\Citet{shioda_libdsl_2014} used metaprogramming in the D programming language to create a \gls{DSL} toolkit.
In similar fashion, \citet{materzok_generating_2022} used \gls{TH} to translate YieldFSM programs to {C$\lambda$aSH}.
\subsubsection{Optimisation}
In similar fashion, \citet{materzok_generating_2022} used \gls{TH} to translate YieldFSM programs to {C$\lambda$aSH}.
\subsubsection{Optimisation}
For example, \citet{lynagh_unrolling_2003} implemented various optimisations in \gls{TH} such as automatic loop unrolling.
The compile-time executed functions analyse the recursive function and unroll the recursion to a fixed depth to trade execution speed for program space.
Also, \citet{odonnell_embedding_2004} embedded Hydra, a hardware description language, in \gls{HASKELL} utilising \gls{TH}.
For example, \citet{lynagh_unrolling_2003} implemented various optimisations in \gls{TH} such as automatic loop unrolling.
The compile-time executed functions analyse the recursive function and unroll the recursion to a fixed depth to trade execution speed for program space.
Also, \citet{odonnell_embedding_2004} embedded Hydra, a hardware description language, in \gls{HASKELL} utilising \gls{TH}.
The authors mention that alternatively this could have be done using a monad but this hampers equational reasoning greatly, which is a key property of Hydra.
Finally, \citet{viera_staged_2018} present a way of embedding attribute grammars in \gls{HASKELL} in a staged fashion.
Checking several aspects of the grammar is done at compile time using \gls{TH} while other safety checks are performed at runtime.
The authors mention that alternatively this could have be done using a monad but this hampers equational reasoning greatly, which is a key property of Hydra.
Finally, \citet{viera_staged_2018} present a way of embedding attribute grammars in \gls{HASKELL} in a staged fashion.
Checking several aspects of the grammar is done at compile time using \gls{TH} while other safety checks are performed at runtime.
\Citet{egi_embedding_2022} extended \gls{HASKELL} to support non-free data type pattern matching---i.e.\ data type with no standard form, e.g.\ sets, graphs---using \gls{TH}.
Using quasiquotation, they make a complicated embedding of non-linear pattern matching available through a simple lens.
\Citet{egi_embedding_2022} extended \gls{HASKELL} to support non-free data type pattern matching---i.e.\ data type with no standard form, e.g.\ sets, graphs---using \gls{TH}.
Using quasiquotation, they make a complicated embedding of non-linear pattern matching available through a simple lens.
-Where in \gls{TH} you can manipulate arbitrary parts of the syntax tree, add top-level splices of data types, definitions and functions, in \gls{TTH} the programmer can only splice expressions but the abstract syntax tree fragments representing the expressions are well-typed by construction instead of untyped.
+Where in \gls{TH} you can manipulate arbitrary parts of the syntax tree, add top-level splices of data types, definitions and functions, in \gls{TTH} the programmer can only splice expressions but the \gls{AST} fragments representing the expressions are well-typed by construction instead of untyped.
\Citet{pickering_staged_2020} implemented staged compilation for the \emph{generics-sop} \citep{de_vries_true_2014} generics library to improve the efficiency of the code using \gls{TTH}.
\Citet{willis_staged_2020} used \gls{TTH} to remove the overhead of parsing combinators.
\Citet{pickering_staged_2020} implemented staged compilation for the \emph{generics-sop} \citep{de_vries_true_2014} generics library to improve the efficiency of the code using \gls{TTH}.
\Citet{willis_staged_2020} used \gls{TTH} to remove the overhead of parsing combinators.