X-Git-Url: https://git.martlubbers.net/?a=blobdiff_plain;f=top%2Fimp.tex;h=e70bc368192ec46abd5307d1495dc06c011deb6f;hb=5702796e5885e85b9e8dcc0d5160dccb9386b05f;hp=0955b4652a8976d8cf4ba35b538ef7c0bfefb635;hpb=d6c8d6cbc9c88d449ac784eda20a213fd6b46669;p=phd-thesis.git diff --git a/top/imp.tex b/top/imp.tex index 0955b46..e70bc36 100644 --- a/top/imp.tex +++ b/top/imp.tex @@ -2,19 +2,777 @@ \input{subfilepreamble} +\setcounter{chapter}{6} + \begin{document} \input{subfileprefix} - -\chapter{Implementation}% +\chapter{The implementation of mTask}% \label{chp:implementation} \begin{chapterabstract} - This chapter shows the implementation of the \gls{MTASK} system. - It is threefold: first it shows the implementation of the byte code compiler for \gls{MTASK}'s \gls{TOP} language, then is details of the implementation of \gls{MTASK}'s \gls{TOP} engine that executes the \gls{MTASK} tasks on the microcontroller, and finally it shows how the integration of \gls{MTASK} tasks and \glspl{SDS} is implemented both on the server and on the device. + This chapter shows the implementation of the \gls{MTASK} system by: + \begin{itemize} + \item showing the compilation and execution toolchain; + \item showing the implementation of the byte code compiler for the \gls{MTASK} language; + \item elaborating on the implementation and architecture of the \gls{RTS} of \gls{MTASK}; + \item and explaining the machinery used to automatically serialise and deserialise data to-and-fro the device. + \end{itemize} \end{chapterabstract} -IFL19 paper, bytecode instructieset~\cref{chp:bytecode_instruction_set} -\section{Integration with \texorpdfstring{\gls{ITASK}}{iTask}} -IFL18 paper stukken +The \gls{MTASK} system targets resource-constrained edge devices that have little memory, processor speed, and communication. +Such edge devices are often powered by microcontrollers, tiny computers specifically designed for embedded applications. +The microcontrollers usually have flash-based program memory which wears out fairly quickly. +For example, the flash memory of the popular atmega328p powering the \gls{ARDUINO} UNO is rated for \num{10000} write cycles. +While this sounds like a lot, if new tasks are sent to the device every minute or so, a lifetime of only seven days is guaranteed. +Hence, for dynamic applications, storing the program in the \gls{RAM} of the device and thus interpreting this code is necessary in order to save precious write cycles of the program memory. +In the \gls{MTASK} system, the \gls{MTASK} \gls{RTS}, a domain-specific \gls{OS}, is responsible for interpreting the programs. + +Programs in \gls{MTASK} are \gls{DSL} terms constructed at run time in an \gls{ITASK} system. +\Cref{fig:toolchain} shows the compilation and execution toolchain of such programs. +First, the source code is compiled to a byte code specification, this specification contains the compiled main expression, the functions, and the \gls{SDS} and peripheral configuration. +How an \gls{MTASK} task is compiled to this specification is shown in \cref{sec:compiler_imp}. +This package is then sent to the \gls{RTS} of the device for execution. +In order to execute a task, first the main expression is evaluated in the interpreter, resulting in a task tree. +Using small-step reduction, this task tree is continuously rewritten by the rewrite engine of the \gls{RTS}. +At times, the reduction requires the evaluation of expressions, using the interpreter. +During every rewrite step, a task value is produced. +On the device, the \gls{RTS} may have multiple tasks at the same time active. +By interleavig the rewrite steps, parallel operation is achieved. +The design, architecture and implementation of the \gls{RTS} is shown in \cref{sec:compiler_rts}. + +\begin{figure} + \centering + \centerline{\includestandalone{toolchain}} + \caption{Compilation and execution toolchain of \gls{MTASK} programs.}% + \label{fig:toolchain} +\end{figure} + +\section{Compiler}\label{sec:compiler_imp} +\subsection{Compiler infrastructure} +The byte code compiler interpretation for the \gls{MTASK} language is implemented as a monad stack containing a writer monad and a state monad. +The writer monad is used to generate code snippets locally without having to store them in the monadic values. +The state monad accumulates the code, and stores the state the compiler requires. +\Cref{lst:compiler_state} shows the data type for the state, storing: +function the compiler currently is in; +code of the main expression; +context (see \cref{ssec:step}); +code for the functions; +next fresh label; +a list of all the used \glspl{SDS}, either local \glspl{SDS} containing the initial value (\cleaninline{Left}) or lowered \glspl{SDS} (see \cref{sec:liftsds}) containing a reference to the associated \gls{ITASK} \gls{SDS}; +and finally there is a list of peripherals used. + +\begin{lstClean}[label={lst:compiler_state},caption={The type for the \gls{MTASK} byte code compiler.}] +:: BCInterpret a :== StateT BCState (WriterT [BCInstr] Identity) a +:: BCState = + { bcs_infun :: JumpLabel + , bcs_mainexpr :: [BCInstr] + , bcs_context :: [BCInstr] + , bcs_functions :: Map JumpLabel BCFunction + , bcs_freshlabel :: JumpLabel + , bcs_sdses :: [Either String255 MTLens] + , bcs_hardware :: [BCPeripheral] + } +:: BCFunction = + { bcf_instructions :: [BCInstr] + , bcf_argwidth :: UInt8 + , bcf_returnwidth :: UInt8 + } +\end{lstClean} + +Executing the compiler is done by providing an initial state and running the monad. +After compilation, several post-processing steps are applied to make the code suitable for the microprocessor. +First, in all tail call \cleaninline{BCReturn} instructions are replaced by \cleaninline{BCTailCall} instructions to optimise the tail calls. +Furthermore, all byte code is concatenated, resulting in one big program. +Many instructions have commonly used arguments so shorthands are introduced to reduce the program size. +For example, the \cleaninline{BCArg} instruction is often called with argument \numrange{0}{2} and can be replaced by the \numrange[parse-numbers=false]{\cleaninline{BCArg0}}{\cleaninline{BCArg2}} shorthands. +Furthermore, redundant instructions such as pop directly after push are removed as well in order not to burden the code generation with these intricacies. +Finally the labels are resolved to represent actual program addresses instead of the freshly generated identifiers. +After the byte code is ready, the lowered \glspl{SDS} are resolved to provide an initial value for them. +The byte code, \gls{SDS} specification and perpipheral specifications are the result of the process, ready to be sent to the device. + +\subsection{Instruction set} +The instruction set is a fairly standard stack machine instruction set extended with special \gls{TOP} instructions for creating task tree nodes. +All instructions are housed in a \gls{CLEAN} \gls{ADT} and serialised to the byte representation using generic functions (see \cref{sec:ccodegen}). +Type synonyms and newtypes are used to provide insight on the arguments of the instructions (\cref{lst:type_synonyms}). +Labels are always two bytes long, all other arguments are one byte long. + +\begin{lstClean}[caption={Type synonyms for instructions arguments.},label={lst:type_synonyms}] +:: ArgWidth :== UInt8 :: ReturnWidth :== UInt8 +:: Depth :== UInt8 :: Num :== UInt8 +:: SdsId :== UInt8 :: JumpLabel =: JL UInt16 +\end{lstClean} + +\Cref{lst:instruction_type} shows an excerpt of the \gls{CLEAN} type that represents the instruction set. +Shorthand instructions such as instructions with inlined arguments are omitted for brevity. +Detailed semantics for the instructions are given in \cref{chp:bytecode_instruction_set}. +One notable instruction is the \cleaninline{MkTask} instruction, it allocates and initialises a task tree node and pushes a pointer to it on the stack. + +\begin{lstClean}[caption={The type housing the instruction set in \gls{MTASK}.},label={lst:instruction_type}] +:: BCInstr + //Jumps + = BCJumpF JumpLabel | BCLabel JumpLabel | BCJumpSR ArgWidth JumpLabel + | BCReturn ReturnWidth ArgWidth + | BCTailcall ArgWidth ArgWidth JumpLabel + //Arguments + | BCArgs ArgWidth ArgWidth + //Task node creation and refinement + | BCMkTask BCTaskType | BCTuneRateMs | BCTuneRateSec + //Stack ops + | BCPush String255 | BCPop Num | BCRot Depth Num | BCDup | BCPushPtrs + //Casting + | BCItoR | BCItoL | BCRtoI | ... + // arith + | BCAddI | BCSubI | ... + ... + +:: BCTaskType + = BCStableNode ArgWidth | BCUnstableNode ArgWidth + // Pin io + | BCReadD | BCWriteD | BCReadA | BCWriteA | BCPinMode + // Interrupts + | BCInterrupt + // Repeat + | BCRepeat + // Delay + | BCDelay | BCDelayUntil + // Parallel + | BCTAnd | BCTOr + //Step + | BCStep ArgWidth JumpLabel + //Sds ops + | BCSdsGet SdsId | BCSdsSet SdsId | BCSdsUpd SdsId JumpLabel + // Rate limiter + | BCRateLimit + ////Peripherals + //DHT + | BCDHTTemp UInt8 | BCDHTHumid UInt8 + ... +\end{lstClean} + +\section{Compilation rules} +This section describes the compilation rules, the translation from \gls{AST} to byte code. +The compilation scheme consists of three schemes\slash{}functions. +Double vertical bars, e.g.\ $\stacksize{a_i}$, denote the number of stack cells required to store the argument. + +Some schemes have a context $r$ as an argument which contains information about the location of the arguments in scope. +More information is given in the schemes requiring such arguments. + +\begin{table} + \centering + \caption{An overview of the compilation schemes.} + \begin{tabularx}{\linewidth}{l X} + \toprule + Scheme & Description\\ + \midrule + $\cschemeE{e}{r}$ & Produces the value of expression $e$ given the context $r$ and pushes it on the stack. + The result can be a basic value or a pointer to a task.\\ + $\cschemeF{e}$ & Generates the bytecode for functions.\\ + $\cschemeS{e}{r}{w} $ & Generates the function for the step continuation given the context $r$ and the width $w$ of the left-hand side task value.\\ + \bottomrule + \end{tabularx} +\end{table} + +\subsection{Expressions} +Almost all expression constructions are compiled using $\mathcal{E}$. +The argument of $\mathcal{E}$ is the context (see \cref{ssec:functions}). +Values are always placed on the stack; tuples and other compound data types are unpacked. +Function calls, function arguments and tasks are also compiled using $\mathcal{E}$ but their compilations is explained later. + +\begin{align*} + \cschemeE{\text{\cleaninline{lit}}~e}{r} & = \text{\cleaninline{BCPush (bytecode e)}};\\ + \cschemeE{e_1\mathbin{\text{\cleaninline{+.}}}e_2}{r} & = \cschemeE{e_1}{r}; + \cschemeE{e_2}{r}; + \text{\cleaninline{BCAdd}};\\ + {} & \text{\emph{Similar for other binary operators}}\\ + \cschemeE{\text{\cleaninline{Not}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCNot}};\\ + {} & \text{\emph{Similar for other unary operators}}\\ + \cschemeE{\text{\cleaninline{If}}~e_1~e_2~e_3}{r} & = + \cschemeE{e_1}{r}; + \text{\cleaninline{BCJmpF}}\enskip l_{else}; \mathbin{\phantom{=}} \cschemeE{e_2}{r}; \text{\cleaninline{BCJmp}}\enskip l_{endif};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}\enskip l_{else}; \cschemeE{e_3}{r}; \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}\enskip l_{endif};\\ + {} & \text{\emph{Where $l_{else}$ and $l_{endif}$ are fresh labels}}\\ + \cschemeE{\text{\cleaninline{tupl}}~e_1~e_2}{r} & = + \cschemeE{e_1}{r}; + \cschemeE{e_2}{r};\\ + {} & \text{\emph{Similar for other unboxed compound data types}}\\ + \cschemeE{\text{\cleaninline{first}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCPop}}\enskip w;\\ + {} & \text{\emph{Where $w$ is the width of the right value and}}\\ + {} & \text{\emph{similar for other unboxed compound data types}}\\ + \cschemeE{\text{\cleaninline{second}}\enskip e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCRot}}\enskip (w_l+w_r)\enskip w_r; + \text{\cleaninline{BCPop}}\enskip w_l;\\ + {} & \text{\emph{Where $w_l$ is the width of the left and, $w_r$ of the right value}}\\ + {} & \text{\emph{similar for other unboxed compound data types}}\\ +\end{align*} + +Translating $\mathcal{E}$ to \gls{CLEAN} code is very straightforward, it basically means writing the instructions to the writer monad. +Almost always, the type of the interpretation is not used, i.e.\ it is a phantom type. +To still have the functions return the correct type, the \cleaninline{tell`}\footnote{\cleaninline{tell` :: [BCInstr] -> BCInterpret a}} helper is used. +This function is similar to the writer monad's \cleaninline{tell} function but is casted to the correct type. +\Cref{lst:imp_arith} shows the implementation for the arithmetic and conditional expressions. +Note that $r$, the context, is not an explicit argument here but stored in the state. + +\begin{lstClean}[caption={Interpretation implementation for the arithmetic and conditional functions.},label={lst:imp_arith}] +instance expr BCInterpret where + lit t = tell` [BCPush (toByteCode{|*|} t)] + (+.) a b = a >>| b >>| tell` [BCAdd] + ... + If c t e = freshlabel >>= \elselabel->freshlabel >>= \endiflabel-> + c >>| tell` [BCJumpF elselabel] >>| + t >>| tell` [BCJump endiflabel,BCLabel elselabel] >>| + e >>| tell` [BCLabel endiflabel] +\end{lstClean} + +\subsection{Functions}\label{ssec:functions} +Compiling functions and other top-level definitions is done using in $\mathcal{F}$, which generates bytecode for the complete program by iterating over the functions and ending with the main expression. +When compiling the body of the function, the arguments of the function are added to the context so that the addresses can be determined when referencing arguments. +The main expression is a special case of $\mathcal{F}$ since it neither has arguments nor something to continue. +Therefore, it is just compiled using $\mathcal{E}$ with an empty context. + +\begin{align*} + \cschemeF{main=m} & = + \cschemeE{m}{[]};\\ + \cschemeF{f~a_0 \ldots a_n = b~\text{\cleaninline{In}}~m} & = + \text{\cleaninline{BCLabel}}~f; \cschemeE{b}{[\langle f, i\rangle, i\in \{(\Sigma^n_{i=0}\stacksize{a_i})..0\}]};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCReturn}}~\stacksize{b}~n; \cschemeF{m};\\ +\end{align*} + +A function call starts by pushing the stack and frame pointer, and making space for the program counter (\cref{lst:funcall_pushptrs}) followed by evaluating the arguments in reverse order (\cref{lst:funcall_args}). +On executing \cleaninline{BCJumpSR}, the program counter is set and the interpreter jumps to the function (\cref{lst:funcall_jumpsr}). +When the function returns, the return value overwrites the old pointers and the arguments. +This occurs right after a \cleaninline{BCReturn} (\cref{lst:funcall_ret}). +Putting the arguments on top of pointers and not reserving space for the return value uses little space and facilitates tail call optimization. + +\begin{figure} + \begin{subfigure}{.24\linewidth} + \centering + \includestandalone{memory1} + \caption{\cleaninline{BCPushPtrs}.}\label{lst:funcall_pushptrs} + \end{subfigure} + \begin{subfigure}{.24\linewidth} + \centering + \includestandalone{memory2} + \caption{Arguments.}\label{lst:funcall_args} + \end{subfigure} + \begin{subfigure}{.24\linewidth} + \centering + \includestandalone{memory3} + \caption{\cleaninline{BCJumpSR}.}\label{lst:funcall_jumpsr} + \end{subfigure} + \begin{subfigure}{.24\linewidth} + \centering + \includestandalone{memory4} + \caption{\cleaninline{BCReturn}.}\label{lst:funcall_ret} + \end{subfigure} + \caption{The stack layout during function calls.}% +\end{figure} + +Calling a function and referencing function arguments are an extension to $\mathcal{E}$ as shown below. +Arguments may be at different places on the stack at different times (see \cref{ssec:step}) and therefore the exact location always is be determined from the context using \cleaninline{findarg}\footnote{\cleaninline{findarg [l`:r] l = if (l == l`) 0 (1 + findarg r l)}}. +Compiling argument $a_{f^i}$, the $i$th argument in function $f$, consists of traversing all positions in the current context. +Arguments wider than one stack cell are fetched in reverse to reconstruct the original order. + +\begin{align*} + \cschemeE{f(a_0, \ldots, a_n)}{r} & = + \text{\cleaninline{BCPushPtrs}}; \cschemeE{a_i}{r}~\text{for all}~i\in\{n\ldots 0\}; \text{\cleaninline{BCJumpSR}}~n~f;\\ + \cschemeE{a_{f^i}}{r} & = + \text{\cleaninline{BCArg}~findarg}(r, f, i)~\text{for all}~i\in\{w\ldots v\};\\ + {} & v = \Sigma^{i-1}_{j=0}\stacksize{a_{f^j}}~\text{ and }~ w = v + \stacksize{a_{f^i}}\\ +\end{align*} + +Translating the compilation schemes for functions to \gls{CLEAN} is not as straightforward as other schemes due to the nature of shallow embedding in combination with the use of state. +The \cleaninline{fun} class has a single function with a single argument. +This argument is a \gls{CLEAN} function that---when given a callable \gls{CLEAN} function representing the \gls{MTASK} function---produces the \cleaninline{main} expression and a callable function. +To compile this, the argument must be called with a function representing a function call in \gls{MTASK}. +\Cref{lst:fun_imp} shows the implementation for this as \gls{CLEAN} code. +To uniquely identify the function, a fresh label is generated. +The function is then called with the \cleaninline{callFunction} helper function that generates the instructions that correspond to calling the function. +That is, it pushes the pointers, compiles the arguments, and writes the \cleaninline{JumpSR} instruction. +The resulting structure (\cleaninline{g In m}) contains a function representing the mTask function (\cleaninline{g}) and the \cleaninline{main} structure to continue with. +To get the actual function, \cleaninline{g} must be called with representations for the argument, i.e.\ using \cleaninline{findarg} for all arguments. +The arguments are added to the context using \cleaninline{infun} and \cleaninline{liftFunction} is called with the label, the argument width and the compiler. +This function executes the compiler, decorates the instructions with a label and places them in the function dictionary together with the metadata such as the argument width. +After lifting the function, the context is cleared again and compilation continues with the rest of the program. + +\begin{lstClean}[label={lst:fun_imp},caption={The interpretation implementation for functions.}] +instance fun (BCInterpret a) BCInterpret | type a where + fun def = {main=freshlabel >>= \funlabel-> + let (g In m) = def \a->callFunction funlabel (toByteWidth a) [a] + argwidth = toByteWidth (argOf g) + in addToCtx funlabel zero argwidth + >>| infun funlabel + (liftFunction funlabel argwidth + (g (retrieveArgs funlabel zero argwidth) + ) ?None) + >>| clearCtx >>| m.main + } + +argOf :: ((m a) -> b) a -> UInt8 | toByteWidth a +callFunction :: JumpLabel UInt8 [BCInterpret b] -> BCInterpret c | ... +liftFunction :: JumpLabel UInt8 (BCInterpret a) (?UInt8) -> BCInterpret () +infun :: JumpLabel (BCInterpret a) -> BCInterpret a +\end{lstClean} + +\subsection{Tasks}\label{ssec:scheme_tasks} +Task trees are created with the \cleaninline{BCMkTask} instruction that allocates a node and pushes a pointer to it on the stack. +It pops arguments from the stack according to the given task type. +The following extension of $\mathcal{E}$ shows this compilation scheme (except for the step combinator, explained in \cref{ssec:step}). + +\begin{align*} + \cschemeE{\text{\cleaninline{rtrn}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCStable}}_{\stacksize{e}};\\ + \cschemeE{\text{\cleaninline{unstable}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCUnstable}}_{\stacksize{e}};\\ + \cschemeE{\text{\cleaninline{readA}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCReadA}};\\ + \cschemeE{\text{\cleaninline{writeA}}~e_1~e_2}{r} & = + \cschemeE{e_1}{r}; + \cschemeE{e_2}{r}; + \text{\cleaninline{BCMkTask BCWriteA}};\\ + \cschemeE{\text{\cleaninline{readD}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCReadD}};\\ + \cschemeE{\text{\cleaninline{writeD}}~e_1~e_2}{r} & = + \cschemeE{e_1}{r}; + \cschemeE{e_2}{r}; + \text{\cleaninline{BCMkTask BCWriteD}};\\ + \cschemeE{\text{\cleaninline{delay}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCDelay}};\\ + \cschemeE{\text{\cleaninline{rpeat}}~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCRepeat}};\\ + \cschemeE{e_1\text{\cleaninline{.\|\|.}}e_2}{r} & = + \cschemeE{e_1}{r}; + \cschemeE{e_2}{r}; + \text{\cleaninline{BCMkTask BCOr}};\\ + \cschemeE{e_1\text{\cleaninline{.&&.}}e_2}{r} & = + \cschemeE{e_1}{r}; + \cschemeE{e_2}{r}; + \text{\cleaninline{BCMkTask BCAnd}};\\ +\end{align*} + +This translates to Clean code by writing the correct \cleaninline{BCMkTask} instruction as exemplified in \cref{lst:imp_ret}. + +\begin{lstClean}[caption={The byte code interpretation implementation for \cleaninline{rtrn}.},label={lst:imp_ret}] +instance rtrn BCInterpret +where + rtrn m = m >>| tell` [BCMkTask (bcstable m)] +\end{lstClean} + +\subsection{Sequential combinator}\label{ssec:step} +The \cleaninline{step} construct is a special type of task because the task value of the left-hand side changes over time. +Therefore, the task continuations on the right-hand side are \emph{observing} this task value and acting upon it. +In the compilation scheme, all continuations are first converted to a single function that has two arguments: the stability of the task and its value. +This function either returns a pointer to a task tree or fails (denoted by $\bot$). +It is special because in the generated function, the task value of a task is inspected. +Furthermore, it is a lazy node in the task tree: the right-hand side may yield a new task tree after several rewrite steps, i.e.\ it is allowed to create infinite task trees using step combinators. +The function is generated using the $\mathcal{S}$ scheme that requires two arguments: the context $r$ and the width of the left-hand side so that it can determine the position of the stability which is added as an argument to the function. +The resulting function is basically a list of if-then-else constructions to check all predicates one by one. +Some optimization is possible here but has currently not been implemented. + +\begin{align*} + \cschemeE{t_1\text{\cleaninline{>>*.}}t_2}{r} & = + \cschemeE{a_{f^i}}{r}, \langle f, i\rangle\in r; + \text{\cleaninline{BCMkTask}}~\text{\cleaninline{BCStable}}_{\stacksize{r}}; \cschemeE{t_1}{r};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCMkTask}}~\text{\cleaninline{BCAnd}}; \text{\cleaninline{BCMkTask}}~(\text{\cleaninline{BCStep}}~(\cschemeS{t_2}{(r + [\langle l_s, i\rangle])}{\stacksize{t_1}}));\\ +\end{align*} + +\begin{align*} + \cschemeS{[]}{r}{w} & = + \text{\cleaninline{BCPush}}~\bot;\\ + \cschemeS{\text{\cleaninline{IfValue}}~f~t:cs}{r}{w} & = + \text{\cleaninline{BCArg}} (\stacksize{r} + w); + \text{\cleaninline{BCIsNoValue}};\\ + {} & \mathbin{\phantom{=}} \cschemeE{f}{r}; + \text{\cleaninline{BCAnd}};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCJmpF}}~l_1;\\ + {} & \mathbin{\phantom{=}} \cschemeE{t}{r}; + \text{\cleaninline{BCJmp}}~l_2;\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}~l_1; + \cschemeS{cs}{r}{w};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}~l_2;\\ + {} & \text{\emph{Where $l_1$ and $l_2$ are fresh labels}}\\ + {} & \text{\emph{Similar for \cleaninline{IfStable} and \cleaninline{IfUnstable}}}\\ +\end{align*} + +First the context is evaluated. +The context contains arguments from functions and steps that need to be preserved after rewriting. +The evaluated context is combined with the left-hand side task value by means of a \cleaninline{.&&.} combinator to store it in the task tree so that it is available after a rewrite. +This means that the task tree is be transformed as seen in \cref{lst:context_tree}. + +\begin{figure} + \begin{subfigure}{.5\textwidth} + \includestandalone{contexttree1} + \caption{Without the embedded context.} + \end{subfigure}% + \begin{subfigure}{.5\textwidth} + \includestandalone{contexttree2} + \caption{With the embedded context.} + \end{subfigure} + \caption{Context embedded in a task tree.}% + \label{lst:context_tree} +\end{figure} + +The translation to \gls{CLEAN} is given in \cref{lst:imp_seq}. + +\begin{lstClean}[caption={Byte code compilation interpretation implementation for the step class.},label={lst:imp_seq}] +instance step BCInterpret where + (>>*.) lhs cont + //Fetch a fresh label and fetch the context + = freshlabel >>= \funlab->gets (\s->s.bcs_context) + //Generate code for lhs + >>= \ctx->lhs + //Possibly add the context + >>| tell` (if (ctx =: []) [] + //The context is just the arguments up till now in reverse + ( [BCArg (UInt8 i)\\i<-reverse (indexList ctx)] + ++ map BCMkTask (bcstable (UInt8 (length ctx))) + ++ [BCMkTask BCTAnd] + )) + //Increase the context + >>| addToCtx funlab zero lhswidth + //Lift the step function + >>| liftFunction funlab + //Width of the arguments is the width of the lhs plus the + //stability plus the context + (one + lhswidth + (UInt8 (length ctx))) + //Body label ctx width continuations + (contfun funlab (UInt8 (length ctx))) + //Return width (always 1, a task pointer) + (Just one) + >>| modify (\s->{s & bcs_context=ctx}) + >>| tell` [BCMkTask (instr rhswidth funlab)] + +toContFun :: JumpLabel UInt8 -> BCInterpret a +toContFun steplabel contextwidth + = foldr tcf (tell` [BCPush fail]) cont +where + tcf (IfStable f t) + = If ((stability >>| tell` [BCIsStable]) &. f val) + (t val >>| tell` []) + ... + stability = tell` [BCArg (lhswidth + contextwidth)] + val = retrieveArgs steplabel zero lhswidth +\end{lstClean} + +\subsection{Shared data sources}\label{lst:imp_sds} +The compilation scheme for \gls{SDS} definitions is a trivial extension to $\mathcal{F}$ since there is no code generated as seen below. + +\begin{align*} + \cschemeF{\text{\cleaninline{sds}}~x=i~\text{\cleaninline{In}}~m} & = + \cschemeF{m};\\ +\end{align*} + +The \gls{SDS} access tasks have a compilation scheme similar to other tasks (see \cref{ssec:scheme_tasks}). +The \cleaninline{getSds} task just pushes a task tree node with the \gls{SDS} identifier embedded. +The \cleaninline{setSds} task evaluates the value, lifts that value to a task tree node and creates \pgls{SDS} set node. + +\begin{align*} + \cschemeE{\text{\cleaninline{getSds}}~s}{r} & = + \text{\cleaninline{BCMkTask}} (\text{\cleaninline{BCSdsGet}} s);\\ + \cschemeE{\text{\cleaninline{setSds}}~s~e}{r} & = + \cschemeE{e}{r}; + \text{\cleaninline{BCMkTask BCStable}}_{\stacksize{e}};\\ + {} & \mathbin{\phantom{=}} \text{\cleaninline{BCMkTask}} (\text{\cleaninline{BCSdsSet}} s);\\ +\end{align*} + +While there is no code generated in the definition, the byte code compiler is storing all \gls{SDS} data in the \cleaninline{bcs_sdses} field in the compilation state. +Regular \glspl{SDS} are stored as \cleaninline{Right String255} values. +The \glspl{SDS} are typed as functions in the host language so an argument for this function must be created that represents the \gls{SDS} on evaluation. +For this, an \cleaninline{BCInterpret} is created that emits this identifier. +When passing it to the function, the initial value of the \gls{SDS} is returned. +In the case of a local \gls{SDS}, this initial value is stored as a byte code encoded value in the state and the compiler continues with the rest of the program. + +\Cref{lst:comp_sds} shows the implementation of the \cleaninline{sds} type class. +First, the initial \gls{SDS} value is extracted from the expression by bootstrapping the fixed point with a dummy value. +This is safe because the expression on the right-hand side of the \cleaninline{In} is never evaluated. +Then, using \cleaninline{addSdsIfNotExist}, the identifier for this particular \gls{SDS} is either retrieved from the compiler state or generated freshly. +This identifier is then used to provide a reference to the \cleaninline{def} definition to evaluate the main expression. +Compiling \cleaninline{getSds} is a matter of executing the \cleaninline{BCInterpret} representing the \gls{SDS}, which yields the identifier that can be embedded in the instruction. +Setting the \gls{SDS} is similar: the identifier is retrieved and the value is written to put in a task tree so that the resulting task can remember the value it has written. + +% VimTeX: SynIgnore on +\begin{lstClean}[caption={Backend implementation for the SDS classes.},label={lst:comp_sds}] +:: Sds a = Sds Int +instance sds BCInterpret where + sds def = {main = + let (t In e) = def (abort "sds: expression too strict") + in addSdsIfNotExist (Left $ String255 (toByteCode{|*|} t)) + >>= \sdsi-> let (t In e) = def (pure (Sds sdsi)) + in e.main + } + getSds f = f >>= \(Sds i)-> tell` [BCMkTask (BCSdsGet (fromInt i))] + setSds f v = f >>= \(Sds i)->v >>| tell` + ( map BCMkTask (bcstable (byteWidth v)) + ++ [BCMkTask (BCSdsSet (fromInt i))]) +\end{lstClean} +% VimTeX: SynIgnore off + +Lowered \glspl{SDS} are stored in the compiler state as \cleaninline{Right MTLens} values. +The compilation of the code and the serialisation of the data throws away all typing information. +The \cleaninline{MTLens} is a type synonym for \pgls{SDS} that represents the typeless serialised value of the underlying \gls{SDS}. +This is done so that the \cleaninline{withDevice} task can write the received \gls{SDS} updates to the according \gls{SDS} while the \gls{SDS} is not in scope. +The \gls{ITASK} notification mechanism then takes care of the rest. +Such \pgls{SDS} is created by using the \cleaninline{mapReadWriteError} which, given a pair of read and write functions with error handling, produces \pgls{SDS} with the lens embedded. +The read function transforms converts the typed value to a typeless serialised value. +The write function will, given a new serialised value and the old typed value, produce a new typed value. +It tries to decode the serialised value, if that succeeds, it is written to the underlying \gls{SDS}, an error is thrown otherwise. +\Cref{lst:mtask_itasksds_lens} shows the implementation for this. + +% VimTeX: SynIgnore on +\begin{lstClean}[label={lst:mtask_itasksds_lens},caption={Lens applied to lowered \gls{ITASK} \glspl{SDS} in \gls{MTASK}.}] +lens :: (Shared sds a) -> MTLens | type a & RWShared sds +lens sds = mapReadWriteError + ( \r-> Ok (fromString (toByteCode{|*|} r) + , \w r-> ?Just <$> iTasksDecode (toString w) + ) ?None sds +\end{lstClean} +% VimTeX: SynIgnore off + +\Cref{lst:mtask_itasksds_lift} shows the code for the implementation of \cleaninline{lowerSds} that uses the \cleaninline{lens} function shown earlier. +It is very similar to the \cleaninline{sds} constructor in \cref{lst:comp_sds}, only now a \cleaninline{Right} value is inserted in the \gls{SDS} administration. + +% VimTeX: SynIgnore on +\begin{lstClean}[label={lst:mtask_itasksds_lift},caption={The implementation for lowering \glspl{SDS} in \gls{MTASK}.}] +instance lowerSds BCInterpret where + lowerSds def = {main = + let (t In _) = def (abort "lowerSds: expression too strict") + in addSdsIfNotExist (Right $ lens t) + >>= \sdsi->let (_ In e) = def (pure (Sds sdsi)) in e.main + }\end{lstClean} +% VimTeX: SynIgnore off + +\section{Run-time system}\label{sec:compiler_rts} +The \gls{RTS} is a customisable domain-specific \gls{OS} that takes care of the execution of tasks. +Furthermore, it also takes care of low-level mechanisms such as the communication, multitasking, and memory management. +Once a device is programmed with the \gls{MTASK} \gls{RTS}, it can continuously receive new tasks without the need for reprogramming. +The \gls{OS} is written in portable \ccpp{} and only contains a small device-specific portion. +In order to keep the abstraction level high and the hardware requirements low, much of the high-level functionality of the \gls{MTASK} language is implemented not in terms of lower-level constructs from \gls{MTASK} language but in terms of \ccpp{} code. + +Most microcontrollers software consists of a cyclic executive instead of an \gls{OS}, this one loop function is continuously executed and all work is performed there. +In the \gls{RTS} of the \gls{MTASK} system, there is also such an event loop function. +It is a function with a relatively short execution time that gets called repeatedly. +The event loop consists of three distinct phases. +After doing the three phases, the devices goes to sleep for as long as possible (see \cref{chp:green_computing_mtask} for more details on task scheduling). + +\subsection{Communication phase} +In the first phase, the communication channels are processed. +The exact communication method is a customisable device-specific option baked into the \gls{RTS}. +The interface is kept deliberately simple and consists of two layers: a link interface and a communication interface. +Besides opening, closing and cleaning up, the link interface has three functions that are shown in \cref{lst:link_interface}. +Consequently, implementing this link interface is very simple but it is still possible to implement more advanced link features such as buffering. +There are implementations for this interface for serial or \gls{WIFI} connections using \gls{ARDUINO}, and \gls{TCP} connections for Linux. + +\begin{lstArduino}[caption={Link interface of the \gls{MTASK} \gls{RTS}.},label={lst:link_interface}] +bool link_input_available(void); +uint8_t link_read_byte(void); +void link_write_byte(uint8_t b); +\end{lstArduino} + +The communication interface abstracts away from this link interface and is typed instead. +It contains only two functions as seen in \cref{lst:comm_interface}. +There are implementations for direct communication, or communication using an \gls{MQTT} broker. +Both use the automatic serialisation and deserialisation shown in \cref{sec:ccodegen}. + +\begin{lstArduino}[caption={Communication interface of the \gls{MTASK} \gls{RTS}.},label={lst:comm_interface}] +struct MTMessageTo receive_message(void); +void send_message(struct MTMessageFro msg); +\end{lstArduino} + +Processing the received messages from the communication channels happens synchronously and the channels are exhausted completely before moving on to the next phase. +There are several possible messages that can be received from the server: + +\begin{description} + \item[SpecRequest] + is a message instructing the device to send its specification and it is received immediately after connecting. + The \gls{RTS} responds with a \texttt{Spec} answer containing the specification. + \item[TaskPrep] + tells the device a task is on its way. + Especially on faster connections, it may be the case that the communication buffers overflow because a big message is sent while the \gls{RTS} is busy executing tasks. + This message allows the \gls{RTS} to postpone execution for a while, until the larger task has been received. + The server sends the task only after the device acknowledged the preparation by by sending a \texttt{TaskPrepAck} message. + \item[Task] + contains a new task, its peripheral configuration, the \glspl{SDS}, and the byte code. + The new task is immediately copied to the task storage but is only initialised during the next phase. + The device acknowledges the task by sending a \texttt{TaskAck} message. + \item[SdsUpdate] + notifies the device of the new value for a lowered \gls{SDS}. + The old value of the lowered \gls{SDS} is immediately replaced with the new one. + There is no acknowledgement required. + \item[TaskDel] + instructs the device to delete a running task. + Tasks are automatically deleted when they become stable. + However, a task may also be deleted when the surrounding task on the server is deleted, for example when the task is on the left-hand side of a step combinator and the condition to step holds. + The device acknowledges the deletion by sending a \texttt{TaskDelAck}. + \item[Shutdown] + tells the device to reset. +\end{description} + +\subsection{Execution phase} +The second phase performs one execution step for all tasks that wish for it. +Tasks are ordered in a priority queue ordered by the time a task needs to execute, the \gls{RTS} selects all tasks that can be scheduled, see \cref{sec:scheduling} for more details. +Execution of a task is always an interplay between the interpreter and the rewriter. + +When a new task is received, the main expression is evaluated to produce a task tree. +A task tree is a tree structure in which each node represents a task combinator and the leaves are basic tasks. +If a task is not initialised yet, i.e.\ the pointer to the current task tree is still null, the byte code of the main function is interpreted. +The main expression always produces a task tree. +Execution of a task consists of continuously rewriting the task until its value is stable. + +Rewriting is a destructive process, i.e.\ the rewriting is done in place. +The rewriting engine uses the interpreter when needed, e.g.\ to calculate the step continuations. +The rewriter and the interpreter use the same stack to store intermediate values. +Rewriting steps are small so that interleaving results in seemingly parallel execution. +In this phase new task tree nodes may be allocated. +Both rewriting and initialization are atomic operations in the sense that no processing on \glspl{SDS} is done other than \gls{SDS} operations from the task itself. +The host is notified if a task value is changed after a rewrite step by sending a \texttt{TaskReturn} message. + +Take for example a blink task for which the code is shown in \cref{lst:blink_code}. + +\begin{lstClean}[caption={Code for a blink program.},label={lst:blink_code}] +fun \blink=(\st->delay (lit 500) >>|. writeD d3 st >>=. blink o Not) +In {main = blink true} +\end{lstClean} + +On receiving this task, the task tree is still null and the initial expression \cleaninline{blink true} is evaluated by the interpreter. +This results in the task tree shown in \cref{fig:blink_tree}. +Rewriting always starts at the top of the tree and traverses to the leaves, the basic tasks that do the actual work. +The first basic task encountered is the \cleaninline{delay} task, that yields no value until the time, \qty{500}{\ms} in this case, has passed. +When the \cleaninline{delay} task yielded a stable value after a number of rewrites, the task continues with the right-hand side of the \cleaninline{>>\|.} combinator. +This combinator has a \cleaninline{writeD} task at the left-hand side that becomes stable after one rewrite step in which it writes the value to the given pin. +When \cleaninline{writeD} becomes stable, the written value is the task value that is observed by the right-hand side of the \cleaninline{>>=.} combinator. +This will call the interpreter to evaluate the expression, now that the argument of the function is known. +The result of the function is again a task tree, but now with different arguments to the tasks, e.g.\ the state in \cleaninline{writeD} is inversed. + +\begin{figure} + \centering + \includestandalone{blinktree} + \caption{The task tree for a blink task in \cref{lst:blink_code} in \gls{MTASK}.}% + \label{fig:blink_tree} +\end{figure} + +\subsection{Memory management} +The third and final phase is memory management. +The \gls{MTASK} \gls{RTS} is designed to run on systems with as little as \qty{2}{\kibi\byte} of \gls{RAM}. +Aggressive memory management is therefore vital. +Not all firmwares for microprocessors support heaps and---when they do---allocation often leaves holes when not used in a \emph{last in first out} strategy. +The \gls{RTS} uses a chunk of memory in the global data segment with its own memory manager tailored to the needs of \gls{MTASK}. +The size of this block can be changed in the configuration of the \gls{RTS} if necessary. +On an \gls{ARDUINO} UNO---equipped with \qty{2}{\kibi\byte} of \gls{RAM}---the maximum viable size is about \qty{1500}{\byte}. +The self-managed memory uses a similar layout as the memory layout for \gls{C} programs only the heap and the stack are switched (see \cref{fig:memory_layout}). + +\begin{figure} + \centering + \includestandalone{memorylayout} + \caption{Memory layout in the \gls{MTASK} \gls{RTS}.}\label{fig:memory_layout} +\end{figure} + +A task is stored below the stack and its complete state is a \gls{CLEAN} record contain most importantly the task id, a pointer to the task tree in the heap (null if not initialised yet), the current task value, the configuration of \glspl{SDS}, the configuration of peripherals, the byte code and some scheduling information. + +In memory, task data grows from the bottom up and the interpreter stack is located directly on top of it growing in the same direction. +As a consequence, the stack moves when a new task is received. +This never happens within execution because communication is always processed before execution. +Values in the interpreter are always stored on the stack. +Compound data types are stored unboxed and flattened. +Task trees grow from the top down as in a heap. +This approach allows for flexible ratios, i.e.\ many tasks and small trees or few tasks and big trees. + +Stable tasks, and unreachable task tree nodes are removed. +If a task is to be removed, tasks with higher memory addresses are moved down. +For task trees---stored in the heap---the \gls{RTS} already marks tasks and task trees as trash during rewriting so the heap can be compacted in a single pass. +This is possible because there is no sharing or cycles in task trees and nodes contain pointers pointers to their parent. + + +\section{C code generation}\label{sec:ccodegen} +All communication between the \gls{ITASK} server and the \gls{MTASK} server is type parametrised. +From the structural representation of the type, a \gls{CLEAN} parser and printer is constructed using generic programming. +Furthermore, a \ccpp{} parser and printer is generated for use on the \gls{MTASK} device. +The technique for generating the \ccpp{} parser and printer is very similar to template metaprogramming and requires a rich generic programming library or compiler support that includes a lot of metadata in the record and constructor nodes. +Using generic programming in the \gls{MTASK} system, both serialisation and deserialisation on the microcontroller and and the server is automatically generated. + +\subsection{Server} +On the server, off-the-shelve generic programming techniques are used to make the serialisation and deserialisation functions (see \cref{lst:ser_deser_server}). +Serialisation is a simple conversion from a value of the type to a string. +Deserialisation is a little bit different in order to support streaming\footnotemark. +\footnotetext{% + Here the \cleaninline{*!} variant of the generic interface is chosen that has less uniqueness constraints for the compiler-generated adaptors \citep{alimarine_generic_2005,hinze_derivable_2001}.% +} +Given a list of available characters, a tuple is always returned. +The right-hand side of the tuple contains the remaining characters, the unparsed input. +The left-hand side contains either an error or a maybe value. +If the value is a \cleaninline{?None}, there was no full value to parse. +If the value is a \cleaninline{?Just}, the data field contains a value of the requested type. + +\begin{lstClean}[caption={Serialisation and deserialisation functions in \gls{CLEAN}.},label={lst:ser_deser_server}] +generic toByteCode a :: a -> String +generic fromByteCode a *! :: [Char] -> (Either String (? a), [Char]) +\end{lstClean} + +\subsection{Client} +The \gls{RTS} of the \gls{MTASK} system runs on resource-constrained microcontrollers and is implemented in portable \ccpp{}. +In order to achieve more interoperation safety, the communication between the server and the client is automated, i.e.\ the serialisation and deserialisation code in the \gls{RTS} is generated. +The technique used for this is very similar to the technique shown in \cref{chp:first-class_datatypes}. +However, instead of using template metaprogramming, a feature \gls{CLEAN} lacks, generic programming is used also as a two-stage rocket. +In contrast to many other generic programming systems, \gls{CLEAN} allows for access to much of the metadata of the compiler. +For example, \cleaninline{Cons}, \cleaninline{Object}, \cleaninline{Field}, and \cleaninline{Record} generic constructors are enriched with their arity, names, types, \etc. +Furthermore, constructors can access the metadata of the objects and fields of their parent records. +Using this metadata, generic functions are created that generate \ccpp{} type definitions, parsers and printers for any first-order \gls{CLEAN} type. +The exact details of this technique can be found in the future in a paper that is in preparation. + +\Glspl{ADT} are converted to tagged unions, newtypes to typedefs, records to structs, and arrays to dynamic size-parametrised allocated arrays. +For example, the \gls{CLEAN} types in \cref{lst:ser_clean} are translated to the \ccpp{} types seen in \cref{lst:ser_c} + +\begin{lstClean}[caption={Simple \glspl{ADT} in \gls{CLEAN}.},label={lst:ser_clean}] +:: T a = A a | B NT {#Char} +:: NT =: NT Real +\end{lstClean} + +\begin{lstArduino}[caption={Generated \ccpp{} type definitions for the simple \glspl{ADT}.},label={lst:ser_c}] +typedef double Real; +typedef char Char; + +typedef Real NT; +enum T_c {A_c, B_c}; + +struct Char_HshArray { uint32_t size; Char *elements; }; +struct T { + enum T_c cons; + struct { void *A; + struct { NT f0; struct Char_HshArray f1; } B; + } data; +}; +\end{lstArduino} + +For each of these generated types, two functions are created, a typed printer, and a typed parser (see \cref{lst:ser_pp}). +The parser functions are parametrised by a read function, an allocation function and parse functions for all type variables. +This allows for the use of these functions in environments where the communication is parametrised and the memory management is self-managed such as in the \gls{MTASK} \gls{RTS}. + +\begin{lstArduino}[caption={Printer and parser for the \glspl{ADT} in \ccpp{}.},label={lst:ser_pp}] +struct T parse_T(uint8_t (*get)(), void *(*alloc)(size_t), + void *(*parse_0)(uint8_t (*)(), void *(*)(size_t))); + +void print_T(void (*put)(uint8_t), struct T r, + void (*print_0)(void (*)(uint8_t), void *)); +\end{lstArduino} + +\section{Conclusion} +It is not straightforward to execute \gls{MTASK} tasks on resources-constrained \gls{IOT} edge devices. +To achieve this, the terms in the \gls{DSL} are compiled to compact domain-specific byte code. +This byte code is sent for interpretation to the light-weight \gls{RTS} of the edge device. +The \gls{RTS} first evaluates the main expression in the interpreter. +The result of this evaluation, a run time representation of the task, is a task tree. +This task tree is rewritten according to small-step reduction rules until a stable value is observed. +Rewriting multiple tasks at the same time is achieved by interleaving the rewrite steps, resulting in seamingly parallel execution of the tasks. +All communication, including the serialisation and deserialisation, between the server and the \gls{RTS} is automated. +From the structural representation of the types, printers and parsers are generated for the server and the client. \input{subfilepostamble} \end{document}