split up more and updatE
[phd-thesis.git] / top / imp.tex
1 \documentclass[../thesis.tex]{subfiles}
2
3 \input{subfilepreamble}
4
5 \setcounter{chapter}{6}
6
7 \begin{document}
8 \input{subfileprefix}
9 \chapter{The implementation of mTask}%
10 \label{chp:implementation}
11 \begin{chapterabstract}
12 This chapter shows the implementation of the \gls{MTASK} system by:
13 \begin{itemize}
14 \item showing the compilation and execution toolchain;
15 \item showing the implementation of the byte code compiler for the \gls{MTASK} language;
16 \item elaborating on the implementation and architecture of the \gls{RTS} of \gls{MTASK};
17 \item and explaining the machinery used to automatically serialise and deserialise data to-and-fro the device.
18 \end{itemize}
19 \end{chapterabstract}
20
21 The \gls{MTASK} system targets resource-constrained edge devices that have little memory, processor speed, and communication.
22 Such edge devices are often powered by microcontrollers, tiny computers specifically designed for embedded applications.
23 The microcontrollers usually have flash-based program memory which wears out fairly quickly.
24 For example, the flash memory of the popular atmega328p powering the \gls{ARDUINO} UNO is rated for \num{10000} write cycles.
25 While this sounds like a lot, if new tasks are sent to the device every minute or so, a lifetime of only seven days is guaranteed.
26 Hence, for dynamic applications, storing the program in the \gls{RAM} of the device and thus interpreting this code is necessary in order to save precious write cycles of the program memory.
27 In the \gls{MTASK} system, the \gls{MTASK} \gls{RTS}, a domain-specific \gls{OS}, is responsible for interpreting the programs.
28
29 Programs in \gls{MTASK} are \gls{DSL} terms constructed at run time in an \gls{ITASK} system.
30 \Cref{fig:toolchain} shows the compilation and execution toolchain of such programs.
31 First, the source code is compiled to a byte code specification, this specification contains the compiled main expression, the functions, and the \gls{SDS} and peripheral configuration.
32 How an \gls{MTASK} task is compiled to this specification is shown in \cref{sec:compiler_imp}.
33 This package is then sent to the \gls{RTS} of the device for execution.
34 In order to execute a task, first the main expression is evaluated in the interpreter, resulting in a task tree.
35 Using small-step reduction, this task tree is continuously rewritten by the rewrite engine of the \gls{RTS}.
36 At times, the reduction requires the evaluation of expressions, using the interpreter.
37 During every rewrite step, a task value is produced.
38 On the device, the \gls{RTS} may have multiple tasks at the same time active.
39 By interleavig the rewrite steps, parallel operation is achieved.
40 The design, architecture and implementation of the \gls{RTS} is shown in \cref{sec:compiler_rts}.
41
42 \begin{figure}
43 \centering
44 \centerline{\includestandalone{toolchain}}
45 \caption{Compilation and execution toolchain of \gls{MTASK} programs.}%
46 \label{fig:toolchain}
47 \end{figure}
48
49 \section{Compiler}\label{sec:compiler_imp}
50 \subsection{Compiler infrastructure}
51 The byte code compiler interpretation for the \gls{MTASK} language is implemented as a monad stack containing a writer monad and a state monad.
52 The writer monad is used to generate code snippets locally without having to store them in the monadic values.
53 The state monad accumulates the code, and stores the state the compiler requires.
54 \Cref{lst:compiler_state} shows the data type for the state, storing:
55 function the compiler currently is in;
56 code of the main expression;
57 context (see \cref{ssec:step});
58 code for the functions;
59 next fresh label;
60 a list of all the used \glspl{SDS}, either local \glspl{SDS} containing the initial value (\cleaninline{Left}) or lowered \glspl{SDS} (see \cref{sec:liftsds}) containing a reference to the associated \gls{ITASK} \gls{SDS};
61 and finally there is a list of peripherals used.
62
63 \begin{lstClean}[label={lst:compiler_state},caption={The type for the \gls{MTASK} byte code compiler.}]
64 :: BCInterpret a :== StateT BCState (WriterT [BCInstr] Identity) a
65 :: BCState =
66 { bcs_infun :: JumpLabel
67 , bcs_mainexpr :: [BCInstr]
68 , bcs_context :: [BCInstr]
69 , bcs_functions :: Map JumpLabel BCFunction
70 , bcs_freshlabel :: JumpLabel
71 , bcs_sdses :: [Either String255 MTLens]
72 , bcs_hardware :: [BCPeripheral]
73 }
74 :: BCFunction =
75 { bcf_instructions :: [BCInstr]
76 , bcf_argwidth :: UInt8
77 , bcf_returnwidth :: UInt8
78 }
79 \end{lstClean}
80
81 Executing the compiler is done by providing an initial state and running the monad.
82 After compilation, several post-processing steps are applied to make the code suitable for the microprocessor.
83 First, in all tail call \cleaninline{BCReturn} instructions are replaced by \cleaninline{BCTailCall} instructions to optimise the tail calls.
84 Furthermore, all byte code is concatenated, resulting in one big program.
85 Many instructions have commonly used arguments so shorthands are introduced to reduce the program size.
86 For example, the \cleaninline{BCArg} instruction is often called with argument \numrange{0}{2} and can be replaced by the \numrange[parse-numbers=false]{\cleaninline{BCArg0}}{\cleaninline{BCArg2}} shorthands.
87 Furthermore, redundant instructions such as pop directly after push are removed as well in order not to burden the code generation with these intricacies.
88 Finally the labels are resolved to represent actual program addresses instead of the freshly generated identifiers.
89 After the byte code is ready, the lowered \glspl{SDS} are resolved to provide an initial value for them.
90 The byte code, \gls{SDS} specification and perpipheral specifications are the result of the process, ready to be sent to the device.
91
92 \subsection{Instruction set}
93 The instruction set is a fairly standard stack machine instruction set extended with special \gls{TOP} instructions for creating task tree nodes.
94 All instructions are housed in a \gls{CLEAN} \gls{ADT} and serialised to the byte representation using generic functions (see \cref{sec:ccodegen}).
95 Type synonyms and newtypes are used to provide insight on the arguments of the instructions (\cref{lst:type_synonyms}).
96 Labels are always two bytes long, all other arguments are one byte long.
97
98 \begin{lstClean}[caption={Type synonyms for instructions arguments.},label={lst:type_synonyms}]
99 :: ArgWidth :== UInt8 :: ReturnWidth :== UInt8
100 :: Depth :== UInt8 :: Num :== UInt8
101 :: SdsId :== UInt8 :: JumpLabel =: JL UInt16
102 \end{lstClean}
103
104 \Cref{lst:instruction_type} shows an excerpt of the \gls{CLEAN} type that represents the instruction set.
105 Shorthand instructions such as instructions with inlined arguments are omitted for brevity.
106 Detailed semantics for the instructions are given in \cref{chp:bytecode_instruction_set}.
107 One notable instruction is the \cleaninline{MkTask} instruction, it allocates and initialises a task tree node and pushes a pointer to it on the stack.
108
109 \begin{lstClean}[caption={The type housing the instruction set in \gls{MTASK}.},label={lst:instruction_type}]
110 :: BCInstr
111 //Jumps
112 = BCJumpF JumpLabel | BCLabel JumpLabel | BCJumpSR ArgWidth JumpLabel
113 | BCReturn ReturnWidth ArgWidth
114 | BCTailcall ArgWidth ArgWidth JumpLabel
115 //Arguments
116 | BCArgs ArgWidth ArgWidth
117 //Task node creation and refinement
118 | BCMkTask BCTaskType | BCTuneRateMs | BCTuneRateSec
119 //Stack ops
120 | BCPush String255 | BCPop Num | BCRot Depth Num | BCDup | BCPushPtrs
121 //Casting
122 | BCItoR | BCItoL | BCRtoI | ...
123 // arith
124 | BCAddI | BCSubI | ...
125 ...
126
127 :: BCTaskType
128 = BCStableNode ArgWidth | BCUnstableNode ArgWidth
129 // Pin io
130 | BCReadD | BCWriteD | BCReadA | BCWriteA | BCPinMode
131 // Interrupts
132 | BCInterrupt
133 // Repeat
134 | BCRepeat
135 // Delay
136 | BCDelay | BCDelayUntil
137 // Parallel
138 | BCTAnd | BCTOr
139 //Step
140 | BCStep ArgWidth JumpLabel
141 //Sds ops
142 | BCSdsGet SdsId | BCSdsSet SdsId | BCSdsUpd SdsId JumpLabel
143 // Rate limiter
144 | BCRateLimit
145 ////Peripherals
146 //DHT
147 | BCDHTTemp UInt8 | BCDHTHumid UInt8
148 ...
149 \end{lstClean}
150
151 \section{Compilation rules}
152 This section describes the compilation rules, the translation from \gls{AST} to byte code.
153 The compilation scheme consists of three schemes\slash{}functions.
154 Double vertical bars, e.g.\ $\stacksize{a_i}$, denote the number of stack cells required to store the argument.
155
156 Some schemes have a context $r$ as an argument which contains information about the location of the arguments in scope.
157 More information is given in the schemes requiring such arguments.
158
159 \begin{table}
160 \centering
161 \caption{An overview of the compilation schemes.}
162 \begin{tabularx}{\linewidth}{l X}
163 \toprule
164 Scheme & Description\\
165 \midrule
166 $\cschemeE{e}{r}$ & Produces the value of expression $e$ given the context $r$ and pushes it on the stack.
167 The result can be a basic value or a pointer to a task.\\
168 $\cschemeF{e}$ & Generates the bytecode for functions.\\
169 $\cschemeS{e}{r}{w} $ & Generates the function for the step continuation given the context $r$ and the width $w$ of the left-hand side task value.\\
170 \bottomrule
171 \end{tabularx}
172 \end{table}
173
174 \subsection{Expressions}
175 Almost all expression constructions are compiled using $\mathcal{E}$.
176 The argument of $\mathcal{E}$ is the context (see \cref{ssec:functions}).
177 Values are always placed on the stack; tuples and other compound data types are unpacked.
178 Function calls, function arguments and tasks are also compiled using $\mathcal{E}$ but their compilations is explained later.
179
180 \begin{align*}
181 \cschemeE{\text{\cleaninline{lit}}~e}{r} & = \text{\cleaninline{BCPush (bytecode e)}};\\
182 \cschemeE{e_1\mathbin{\text{\cleaninline{+.}}}e_2}{r} & = \cschemeE{e_1}{r};
183 \cschemeE{e_2}{r};
184 \text{\cleaninline{BCAdd}};\\
185 {} & \text{\emph{Similar for other binary operators}}\\
186 \cschemeE{\text{\cleaninline{Not}}~e}{r} & =
187 \cschemeE{e}{r};
188 \text{\cleaninline{BCNot}};\\
189 {} & \text{\emph{Similar for other unary operators}}\\
190 \cschemeE{\text{\cleaninline{If}}~e_1~e_2~e_3}{r} & =
191 \cschemeE{e_1}{r};
192 \text{\cleaninline{BCJmpF}}\enskip l_{else}; \mathbin{\phantom{=}} \cschemeE{e_2}{r}; \text{\cleaninline{BCJmp}}\enskip l_{endif};\\
193 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}\enskip l_{else}; \cschemeE{e_3}{r}; \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}\enskip l_{endif};\\
194 {} & \text{\emph{Where $l_{else}$ and $l_{endif}$ are fresh labels}}\\
195 \cschemeE{\text{\cleaninline{tupl}}~e_1~e_2}{r} & =
196 \cschemeE{e_1}{r};
197 \cschemeE{e_2}{r};\\
198 {} & \text{\emph{Similar for other unboxed compound data types}}\\
199 \cschemeE{\text{\cleaninline{first}}~e}{r} & =
200 \cschemeE{e}{r};
201 \text{\cleaninline{BCPop}}\enskip w;\\
202 {} & \text{\emph{Where $w$ is the width of the right value and}}\\
203 {} & \text{\emph{similar for other unboxed compound data types}}\\
204 \cschemeE{\text{\cleaninline{second}}\enskip e}{r} & =
205 \cschemeE{e}{r};
206 \text{\cleaninline{BCRot}}\enskip (w_l+w_r)\enskip w_r;
207 \text{\cleaninline{BCPop}}\enskip w_l;\\
208 {} & \text{\emph{Where $w_l$ is the width of the left and, $w_r$ of the right value}}\\
209 {} & \text{\emph{similar for other unboxed compound data types}}\\
210 \end{align*}
211
212 Translating $\mathcal{E}$ to \gls{CLEAN} code is very straightforward, it basically means writing the instructions to the writer monad.
213 Almost always, the type of the interpretation is not used, i.e.\ it is a phantom type.
214 To still have the functions return the correct type, the \cleaninline{tell`}\footnote{\cleaninline{tell` :: [BCInstr] -> BCInterpret a}} helper is used.
215 This function is similar to the writer monad's \cleaninline{tell} function but is casted to the correct type.
216 \Cref{lst:imp_arith} shows the implementation for the arithmetic and conditional expressions.
217 Note that $r$, the context, is not an explicit argument here but stored in the state.
218
219 \begin{lstClean}[caption={Interpretation implementation for the arithmetic and conditional functions.},label={lst:imp_arith}]
220 instance expr BCInterpret where
221 lit t = tell` [BCPush (toByteCode{|*|} t)]
222 (+.) a b = a >>| b >>| tell` [BCAdd]
223 ...
224 If c t e = freshlabel >>= \elselabel->freshlabel >>= \endiflabel->
225 c >>| tell` [BCJumpF elselabel] >>|
226 t >>| tell` [BCJump endiflabel,BCLabel elselabel] >>|
227 e >>| tell` [BCLabel endiflabel]
228 \end{lstClean}
229
230 \subsection{Functions}\label{ssec:functions}
231 Compiling functions and other top-level definitions is done using in $\mathcal{F}$, which generates bytecode for the complete program by iterating over the functions and ending with the main expression.
232 When compiling the body of the function, the arguments of the function are added to the context so that the addresses can be determined when referencing arguments.
233 The main expression is a special case of $\mathcal{F}$ since it neither has arguments nor something to continue.
234 Therefore, it is just compiled using $\mathcal{E}$ with an empty context.
235
236 \begin{align*}
237 \cschemeF{main=m} & =
238 \cschemeE{m}{[]};\\
239 \cschemeF{f~a_0 \ldots a_n = b~\text{\cleaninline{In}}~m} & =
240 \text{\cleaninline{BCLabel}}~f; \cschemeE{b}{[\langle f, i\rangle, i\in \{(\Sigma^n_{i=0}\stacksize{a_i})..0\}]};\\
241 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCReturn}}~\stacksize{b}~n; \cschemeF{m};\\
242 \end{align*}
243
244 A function call starts by pushing the stack and frame pointer, and making space for the program counter (\cref{lst:funcall_pushptrs}) followed by evaluating the arguments in reverse order (\cref{lst:funcall_args}).
245 On executing \cleaninline{BCJumpSR}, the program counter is set and the interpreter jumps to the function (\cref{lst:funcall_jumpsr}).
246 When the function returns, the return value overwrites the old pointers and the arguments.
247 This occurs right after a \cleaninline{BCReturn} (\cref{lst:funcall_ret}).
248 Putting the arguments on top of pointers and not reserving space for the return value uses little space and facilitates tail call optimization.
249
250 \begin{figure}
251 \begin{subfigure}{.24\linewidth}
252 \centering
253 \includestandalone{memory1}
254 \caption{\cleaninline{BCPushPtrs}.}\label{lst:funcall_pushptrs}
255 \end{subfigure}
256 \begin{subfigure}{.24\linewidth}
257 \centering
258 \includestandalone{memory2}
259 \caption{Arguments.}\label{lst:funcall_args}
260 \end{subfigure}
261 \begin{subfigure}{.24\linewidth}
262 \centering
263 \includestandalone{memory3}
264 \caption{\cleaninline{BCJumpSR}.}\label{lst:funcall_jumpsr}
265 \end{subfigure}
266 \begin{subfigure}{.24\linewidth}
267 \centering
268 \includestandalone{memory4}
269 \caption{\cleaninline{BCReturn}.}\label{lst:funcall_ret}
270 \end{subfigure}
271 \caption{The stack layout during function calls.}%
272 \end{figure}
273
274 Calling a function and referencing function arguments are an extension to $\mathcal{E}$ as shown below.
275 Arguments may be at different places on the stack at different times (see \cref{ssec:step}) and therefore the exact location always is be determined from the context using \cleaninline{findarg}\footnote{\cleaninline{findarg [l`:r] l = if (l == l`) 0 (1 + findarg r l)}}.
276 Compiling argument $a_{f^i}$, the $i$th argument in function $f$, consists of traversing all positions in the current context.
277 Arguments wider than one stack cell are fetched in reverse to reconstruct the original order.
278
279 \begin{align*}
280 \cschemeE{f(a_0, \ldots, a_n)}{r} & =
281 \text{\cleaninline{BCPushPtrs}}; \cschemeE{a_i}{r}~\text{for all}~i\in\{n\ldots 0\}; \text{\cleaninline{BCJumpSR}}~n~f;\\
282 \cschemeE{a_{f^i}}{r} & =
283 \text{\cleaninline{BCArg}~findarg}(r, f, i)~\text{for all}~i\in\{w\ldots v\};\\
284 {} & v = \Sigma^{i-1}_{j=0}\stacksize{a_{f^j}}~\text{ and }~ w = v + \stacksize{a_{f^i}}\\
285 \end{align*}
286
287 Translating the compilation schemes for functions to \gls{CLEAN} is not as straightforward as other schemes due to the nature of shallow embedding in combination with the use of state.
288 The \cleaninline{fun} class has a single function with a single argument.
289 This argument is a \gls{CLEAN} function that---when given a callable \gls{CLEAN} function representing the \gls{MTASK} function---produces the \cleaninline{main} expression and a callable function.
290 To compile this, the argument must be called with a function representing a function call in \gls{MTASK}.
291 \Cref{lst:fun_imp} shows the implementation for this as \gls{CLEAN} code.
292 To uniquely identify the function, a fresh label is generated.
293 The function is then called with the \cleaninline{callFunction} helper function that generates the instructions that correspond to calling the function.
294 That is, it pushes the pointers, compiles the arguments, and writes the \cleaninline{JumpSR} instruction.
295 The resulting structure (\cleaninline{g In m}) contains a function representing the mTask function (\cleaninline{g}) and the \cleaninline{main} structure to continue with.
296 To get the actual function, \cleaninline{g} must be called with representations for the argument, i.e.\ using \cleaninline{findarg} for all arguments.
297 The arguments are added to the context using \cleaninline{infun} and \cleaninline{liftFunction} is called with the label, the argument width and the compiler.
298 This function executes the compiler, decorates the instructions with a label and places them in the function dictionary together with the metadata such as the argument width.
299 After lifting the function, the context is cleared again and compilation continues with the rest of the program.
300
301 \begin{lstClean}[label={lst:fun_imp},caption={The interpretation implementation for functions.}]
302 instance fun (BCInterpret a) BCInterpret | type a where
303 fun def = {main=freshlabel >>= \funlabel->
304 let (g In m) = def \a->callFunction funlabel (toByteWidth a) [a]
305 argwidth = toByteWidth (argOf g)
306 in addToCtx funlabel zero argwidth
307 >>| infun funlabel
308 (liftFunction funlabel argwidth
309 (g (retrieveArgs funlabel zero argwidth)
310 ) ?None)
311 >>| clearCtx >>| m.main
312 }
313
314 argOf :: ((m a) -> b) a -> UInt8 | toByteWidth a
315 callFunction :: JumpLabel UInt8 [BCInterpret b] -> BCInterpret c | ...
316 liftFunction :: JumpLabel UInt8 (BCInterpret a) (?UInt8) -> BCInterpret ()
317 infun :: JumpLabel (BCInterpret a) -> BCInterpret a
318 \end{lstClean}
319
320 \subsection{Tasks}\label{ssec:scheme_tasks}
321 Task trees are created with the \cleaninline{BCMkTask} instruction that allocates a node and pushes a pointer to it on the stack.
322 It pops arguments from the stack according to the given task type.
323 The following extension of $\mathcal{E}$ shows this compilation scheme (except for the step combinator, explained in \cref{ssec:step}).
324
325 \begin{align*}
326 \cschemeE{\text{\cleaninline{rtrn}}~e}{r} & =
327 \cschemeE{e}{r};
328 \text{\cleaninline{BCMkTask BCStable}}_{\stacksize{e}};\\
329 \cschemeE{\text{\cleaninline{unstable}}~e}{r} & =
330 \cschemeE{e}{r};
331 \text{\cleaninline{BCMkTask BCUnstable}}_{\stacksize{e}};\\
332 \cschemeE{\text{\cleaninline{readA}}~e}{r} & =
333 \cschemeE{e}{r};
334 \text{\cleaninline{BCMkTask BCReadA}};\\
335 \cschemeE{\text{\cleaninline{writeA}}~e_1~e_2}{r} & =
336 \cschemeE{e_1}{r};
337 \cschemeE{e_2}{r};
338 \text{\cleaninline{BCMkTask BCWriteA}};\\
339 \cschemeE{\text{\cleaninline{readD}}~e}{r} & =
340 \cschemeE{e}{r};
341 \text{\cleaninline{BCMkTask BCReadD}};\\
342 \cschemeE{\text{\cleaninline{writeD}}~e_1~e_2}{r} & =
343 \cschemeE{e_1}{r};
344 \cschemeE{e_2}{r};
345 \text{\cleaninline{BCMkTask BCWriteD}};\\
346 \cschemeE{\text{\cleaninline{delay}}~e}{r} & =
347 \cschemeE{e}{r};
348 \text{\cleaninline{BCMkTask BCDelay}};\\
349 \cschemeE{\text{\cleaninline{rpeat}}~e}{r} & =
350 \cschemeE{e}{r};
351 \text{\cleaninline{BCMkTask BCRepeat}};\\
352 \cschemeE{e_1\text{\cleaninline{.\|\|.}}e_2}{r} & =
353 \cschemeE{e_1}{r};
354 \cschemeE{e_2}{r};
355 \text{\cleaninline{BCMkTask BCOr}};\\
356 \cschemeE{e_1\text{\cleaninline{.&&.}}e_2}{r} & =
357 \cschemeE{e_1}{r};
358 \cschemeE{e_2}{r};
359 \text{\cleaninline{BCMkTask BCAnd}};\\
360 \end{align*}
361
362 This translates to Clean code by writing the correct \cleaninline{BCMkTask} instruction as exemplified in \cref{lst:imp_ret}.
363
364 \begin{lstClean}[caption={The byte code interpretation implementation for \cleaninline{rtrn}.},label={lst:imp_ret}]
365 instance rtrn BCInterpret
366 where
367 rtrn m = m >>| tell` [BCMkTask (bcstable m)]
368 \end{lstClean}
369
370 \subsection{Sequential combinator}\label{ssec:step}
371 The \cleaninline{step} construct is a special type of task because the task value of the left-hand side changes over time.
372 Therefore, the task continuations on the right-hand side are \emph{observing} this task value and acting upon it.
373 In the compilation scheme, all continuations are first converted to a single function that has two arguments: the stability of the task and its value.
374 This function either returns a pointer to a task tree or fails (denoted by $\bot$).
375 It is special because in the generated function, the task value of a task is inspected.
376 Furthermore, it is a lazy node in the task tree: the right-hand side may yield a new task tree after several rewrite steps, i.e.\ it is allowed to create infinite task trees using step combinators.
377 The function is generated using the $\mathcal{S}$ scheme that requires two arguments: the context $r$ and the width of the left-hand side so that it can determine the position of the stability which is added as an argument to the function.
378 The resulting function is basically a list of if-then-else constructions to check all predicates one by one.
379 Some optimization is possible here but has currently not been implemented.
380
381 \begin{align*}
382 \cschemeE{t_1\text{\cleaninline{>>*.}}t_2}{r} & =
383 \cschemeE{a_{f^i}}{r}, \langle f, i\rangle\in r;
384 \text{\cleaninline{BCMkTask}}~\text{\cleaninline{BCStable}}_{\stacksize{r}}; \cschemeE{t_1}{r};\\
385 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCMkTask}}~\text{\cleaninline{BCAnd}}; \text{\cleaninline{BCMkTask}}~(\text{\cleaninline{BCStep}}~(\cschemeS{t_2}{(r + [\langle l_s, i\rangle])}{\stacksize{t_1}}));\\
386 \end{align*}
387
388 \begin{align*}
389 \cschemeS{[]}{r}{w} & =
390 \text{\cleaninline{BCPush}}~\bot;\\
391 \cschemeS{\text{\cleaninline{IfValue}}~f~t:cs}{r}{w} & =
392 \text{\cleaninline{BCArg}} (\stacksize{r} + w);
393 \text{\cleaninline{BCIsNoValue}};\\
394 {} & \mathbin{\phantom{=}} \cschemeE{f}{r};
395 \text{\cleaninline{BCAnd}};\\
396 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCJmpF}}~l_1;\\
397 {} & \mathbin{\phantom{=}} \cschemeE{t}{r};
398 \text{\cleaninline{BCJmp}}~l_2;\\
399 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}~l_1;
400 \cschemeS{cs}{r}{w};\\
401 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCLabel}}~l_2;\\
402 {} & \text{\emph{Where $l_1$ and $l_2$ are fresh labels}}\\
403 {} & \text{\emph{Similar for \cleaninline{IfStable} and \cleaninline{IfUnstable}}}\\
404 \end{align*}
405
406 First the context is evaluated.
407 The context contains arguments from functions and steps that need to be preserved after rewriting.
408 The evaluated context is combined with the left-hand side task value by means of a \cleaninline{.&&.} combinator to store it in the task tree so that it is available after a rewrite.
409 This means that the task tree is be transformed as seen in \cref{lst:context_tree}.
410
411 \begin{figure}
412 \begin{subfigure}{.5\textwidth}
413 \includestandalone{contexttree1}
414 \caption{Without the embedded context.}
415 \end{subfigure}%
416 \begin{subfigure}{.5\textwidth}
417 \includestandalone{contexttree2}
418 \caption{With the embedded context.}
419 \end{subfigure}
420 \caption{Context embedded in a task tree.}%
421 \label{lst:context_tree}
422 \end{figure}
423
424 The translation to \gls{CLEAN} is given in \cref{lst:imp_seq}.
425
426 \begin{lstClean}[caption={Byte code compilation interpretation implementation for the step class.},label={lst:imp_seq}]
427 instance step BCInterpret where
428 (>>*.) lhs cont
429 //Fetch a fresh label and fetch the context
430 = freshlabel >>= \funlab->gets (\s->s.bcs_context)
431 //Generate code for lhs
432 >>= \ctx->lhs
433 //Possibly add the context
434 >>| tell` (if (ctx =: []) []
435 //The context is just the arguments up till now in reverse
436 ( [BCArg (UInt8 i)\\i<-reverse (indexList ctx)]
437 ++ map BCMkTask (bcstable (UInt8 (length ctx)))
438 ++ [BCMkTask BCTAnd]
439 ))
440 //Increase the context
441 >>| addToCtx funlab zero lhswidth
442 //Lift the step function
443 >>| liftFunction funlab
444 //Width of the arguments is the width of the lhs plus the
445 //stability plus the context
446 (one + lhswidth + (UInt8 (length ctx)))
447 //Body label ctx width continuations
448 (contfun funlab (UInt8 (length ctx)))
449 //Return width (always 1, a task pointer)
450 (Just one)
451 >>| modify (\s->{s & bcs_context=ctx})
452 >>| tell` [BCMkTask (instr rhswidth funlab)]
453
454 toContFun :: JumpLabel UInt8 -> BCInterpret a
455 toContFun steplabel contextwidth
456 = foldr tcf (tell` [BCPush fail]) cont
457 where
458 tcf (IfStable f t)
459 = If ((stability >>| tell` [BCIsStable]) &. f val)
460 (t val >>| tell` [])
461 ...
462 stability = tell` [BCArg (lhswidth + contextwidth)]
463 val = retrieveArgs steplabel zero lhswidth
464 \end{lstClean}
465
466 \subsection{Shared data sources}\label{lst:imp_sds}
467 The compilation scheme for \gls{SDS} definitions is a trivial extension to $\mathcal{F}$ since there is no code generated as seen below.
468
469 \begin{align*}
470 \cschemeF{\text{\cleaninline{sds}}~x=i~\text{\cleaninline{In}}~m} & =
471 \cschemeF{m};\\
472 \end{align*}
473
474 The \gls{SDS} access tasks have a compilation scheme similar to other tasks (see \cref{ssec:scheme_tasks}).
475 The \cleaninline{getSds} task just pushes a task tree node with the \gls{SDS} identifier embedded.
476 The \cleaninline{setSds} task evaluates the value, lifts that value to a task tree node and creates \pgls{SDS} set node.
477
478 \begin{align*}
479 \cschemeE{\text{\cleaninline{getSds}}~s}{r} & =
480 \text{\cleaninline{BCMkTask}} (\text{\cleaninline{BCSdsGet}} s);\\
481 \cschemeE{\text{\cleaninline{setSds}}~s~e}{r} & =
482 \cschemeE{e}{r};
483 \text{\cleaninline{BCMkTask BCStable}}_{\stacksize{e}};\\
484 {} & \mathbin{\phantom{=}} \text{\cleaninline{BCMkTask}} (\text{\cleaninline{BCSdsSet}} s);\\
485 \end{align*}
486
487 While there is no code generated in the definition, the byte code compiler is storing all \gls{SDS} data in the \cleaninline{bcs_sdses} field in the compilation state.
488 Regular \glspl{SDS} are stored as \cleaninline{Right String255} values.
489 The \glspl{SDS} are typed as functions in the host language so an argument for this function must be created that represents the \gls{SDS} on evaluation.
490 For this, an \cleaninline{BCInterpret} is created that emits this identifier.
491 When passing it to the function, the initial value of the \gls{SDS} is returned.
492 In the case of a local \gls{SDS}, this initial value is stored as a byte code encoded value in the state and the compiler continues with the rest of the program.
493
494 \Cref{lst:comp_sds} shows the implementation of the \cleaninline{sds} type class.
495 First, the initial \gls{SDS} value is extracted from the expression by bootstrapping the fixed point with a dummy value.
496 This is safe because the expression on the right-hand side of the \cleaninline{In} is never evaluated.
497 Then, using \cleaninline{addSdsIfNotExist}, the identifier for this particular \gls{SDS} is either retrieved from the compiler state or generated freshly.
498 This identifier is then used to provide a reference to the \cleaninline{def} definition to evaluate the main expression.
499 Compiling \cleaninline{getSds} is a matter of executing the \cleaninline{BCInterpret} representing the \gls{SDS}, which yields the identifier that can be embedded in the instruction.
500 Setting the \gls{SDS} is similar: the identifier is retrieved and the value is written to put in a task tree so that the resulting task can remember the value it has written.
501
502 % VimTeX: SynIgnore on
503 \begin{lstClean}[caption={Backend implementation for the SDS classes.},label={lst:comp_sds}]
504 :: Sds a = Sds Int
505 instance sds BCInterpret where
506 sds def = {main =
507 let (t In e) = def (abort "sds: expression too strict")
508 in addSdsIfNotExist (Left $ String255 (toByteCode{|*|} t))
509 >>= \sdsi-> let (t In e) = def (pure (Sds sdsi))
510 in e.main
511 }
512 getSds f = f >>= \(Sds i)-> tell` [BCMkTask (BCSdsGet (fromInt i))]
513 setSds f v = f >>= \(Sds i)->v >>| tell`
514 ( map BCMkTask (bcstable (byteWidth v))
515 ++ [BCMkTask (BCSdsSet (fromInt i))])
516 \end{lstClean}
517 % VimTeX: SynIgnore off
518
519 Lowered \glspl{SDS} are stored in the compiler state as \cleaninline{Right MTLens} values.
520 The compilation of the code and the serialisation of the data throws away all typing information.
521 The \cleaninline{MTLens} is a type synonym for \pgls{SDS} that represents the typeless serialised value of the underlying \gls{SDS}.
522 This is done so that the \cleaninline{withDevice} task can write the received \gls{SDS} updates to the according \gls{SDS} while the \gls{SDS} is not in scope.
523 The \gls{ITASK} notification mechanism then takes care of the rest.
524 Such \pgls{SDS} is created by using the \cleaninline{mapReadWriteError} which, given a pair of read and write functions with error handling, produces \pgls{SDS} with the lens embedded.
525 The read function transforms converts the typed value to a typeless serialised value.
526 The write function will, given a new serialised value and the old typed value, produce a new typed value.
527 It tries to decode the serialised value, if that succeeds, it is written to the underlying \gls{SDS}, an error is thrown otherwise.
528 \Cref{lst:mtask_itasksds_lens} shows the implementation for this.
529
530 % VimTeX: SynIgnore on
531 \begin{lstClean}[label={lst:mtask_itasksds_lens},caption={Lens applied to lowered \gls{ITASK} \glspl{SDS} in \gls{MTASK}.}]
532 lens :: (Shared sds a) -> MTLens | type a & RWShared sds
533 lens sds = mapReadWriteError
534 ( \r-> Ok (fromString (toByteCode{|*|} r)
535 , \w r-> ?Just <$> iTasksDecode (toString w)
536 ) ?None sds
537 \end{lstClean}
538 % VimTeX: SynIgnore off
539
540 \Cref{lst:mtask_itasksds_lift} shows the code for the implementation of \cleaninline{lowerSds} that uses the \cleaninline{lens} function shown earlier.
541 It is very similar to the \cleaninline{sds} constructor in \cref{lst:comp_sds}, only now a \cleaninline{Right} value is inserted in the \gls{SDS} administration.
542
543 % VimTeX: SynIgnore on
544 \begin{lstClean}[label={lst:mtask_itasksds_lift},caption={The implementation for lowering \glspl{SDS} in \gls{MTASK}.}]
545 instance lowerSds BCInterpret where
546 lowerSds def = {main =
547 let (t In _) = def (abort "lowerSds: expression too strict")
548 in addSdsIfNotExist (Right $ lens t)
549 >>= \sdsi->let (_ In e) = def (pure (Sds sdsi)) in e.main
550 }\end{lstClean}
551 % VimTeX: SynIgnore off
552
553 \section{Run-time system}\label{sec:compiler_rts}
554 The \gls{RTS} is a customisable domain-specific \gls{OS} that takes care of the execution of tasks.
555 Furthermore, it also takes care of low-level mechanisms such as the communication, multitasking, and memory management.
556 Once a device is programmed with the \gls{MTASK} \gls{RTS}, it can continuously receive new tasks without the need for reprogramming.
557 The \gls{OS} is written in portable \ccpp{} and only contains a small device-specific portion.
558 In order to keep the abstraction level high and the hardware requirements low, much of the high-level functionality of the \gls{MTASK} language is implemented not in terms of lower-level constructs from \gls{MTASK} language but in terms of \ccpp{} code.
559
560 Most microcontrollers software consists of a cyclic executive instead of an \gls{OS}, this one loop function is continuously executed and all work is performed there.
561 In the \gls{RTS} of the \gls{MTASK} system, there is also such an event loop function.
562 It is a function with a relatively short execution time that gets called repeatedly.
563 The event loop consists of three distinct phases.
564 After doing the three phases, the devices goes to sleep for as long as possible (see \cref{chp:green_computing_mtask} for more details on task scheduling).
565
566 \subsection{Communication phase}
567 In the first phase, the communication channels are processed.
568 The exact communication method is a customisable device-specific option baked into the \gls{RTS}.
569 The interface is kept deliberately simple and consists of two layers: a link interface and a communication interface.
570 Besides opening, closing and cleaning up, the link interface has three functions that are shown in \cref{lst:link_interface}.
571 Consequently, implementing this link interface is very simple but it is still possible to implement more advanced link features such as buffering.
572 There are implementations for this interface for serial or \gls{WIFI} connections using \gls{ARDUINO}, and \gls{TCP} connections for Linux.
573
574 \begin{lstArduino}[caption={Link interface of the \gls{MTASK} \gls{RTS}.},label={lst:link_interface}]
575 bool link_input_available(void);
576 uint8_t link_read_byte(void);
577 void link_write_byte(uint8_t b);
578 \end{lstArduino}
579
580 The communication interface abstracts away from this link interface and is typed instead.
581 It contains only two functions as seen in \cref{lst:comm_interface}.
582 There are implementations for direct communication, or communication using an \gls{MQTT} broker.
583 Both use the automatic serialisation and deserialisation shown in \cref{sec:ccodegen}.
584
585 \begin{lstArduino}[caption={Communication interface of the \gls{MTASK} \gls{RTS}.},label={lst:comm_interface}]
586 struct MTMessageTo receive_message(void);
587 void send_message(struct MTMessageFro msg);
588 \end{lstArduino}
589
590 Processing the received messages from the communication channels happens synchronously and the channels are exhausted completely before moving on to the next phase.
591 There are several possible messages that can be received from the server:
592
593 \begin{description}
594 \item[SpecRequest]
595 is a message instructing the device to send its specification and it is received immediately after connecting.
596 The \gls{RTS} responds with a \texttt{Spec} answer containing the specification.
597 \item[TaskPrep]
598 tells the device a task is on its way.
599 Especially on faster connections, it may be the case that the communication buffers overflow because a big message is sent while the \gls{RTS} is busy executing tasks.
600 This message allows the \gls{RTS} to postpone execution for a while, until the larger task has been received.
601 The server sends the task only after the device acknowledged the preparation by by sending a \texttt{TaskPrepAck} message.
602 \item[Task]
603 contains a new task, its peripheral configuration, the \glspl{SDS}, and the byte code.
604 The new task is immediately copied to the task storage but is only initialised during the next phase.
605 The device acknowledges the task by sending a \texttt{TaskAck} message.
606 \item[SdsUpdate]
607 notifies the device of the new value for a lowered \gls{SDS}.
608 The old value of the lowered \gls{SDS} is immediately replaced with the new one.
609 There is no acknowledgement required.
610 \item[TaskDel]
611 instructs the device to delete a running task.
612 Tasks are automatically deleted when they become stable.
613 However, a task may also be deleted when the surrounding task on the server is deleted, for example when the task is on the left-hand side of a step combinator and the condition to step holds.
614 The device acknowledges the deletion by sending a \texttt{TaskDelAck}.
615 \item[Shutdown]
616 tells the device to reset.
617 \end{description}
618
619 \subsection{Execution phase}
620 The second phase performs one execution step for all tasks that wish for it.
621 Tasks are ordered in a priority queue ordered by the time a task needs to execute, the \gls{RTS} selects all tasks that can be scheduled, see \cref{sec:scheduling} for more details.
622 Execution of a task is always an interplay between the interpreter and the rewriter.
623
624 When a new task is received, the main expression is evaluated to produce a task tree.
625 A task tree is a tree structure in which each node represents a task combinator and the leaves are basic tasks.
626 If a task is not initialised yet, i.e.\ the pointer to the current task tree is still null, the byte code of the main function is interpreted.
627 The main expression always produces a task tree.
628 Execution of a task consists of continuously rewriting the task until its value is stable.
629
630 Rewriting is a destructive process, i.e.\ the rewriting is done in place.
631 The rewriting engine uses the interpreter when needed, e.g.\ to calculate the step continuations.
632 The rewriter and the interpreter use the same stack to store intermediate values.
633 Rewriting steps are small so that interleaving results in seemingly parallel execution.
634 In this phase new task tree nodes may be allocated.
635 Both rewriting and initialization are atomic operations in the sense that no processing on \glspl{SDS} is done other than \gls{SDS} operations from the task itself.
636 The host is notified if a task value is changed after a rewrite step by sending a \texttt{TaskReturn} message.
637
638 Take for example a blink task for which the code is shown in \cref{lst:blink_code}.
639
640 \begin{lstClean}[caption={Code for a blink program.},label={lst:blink_code}]
641 fun \blink=(\st->delay (lit 500) >>|. writeD d3 st >>=. blink o Not)
642 In {main = blink true}
643 \end{lstClean}
644
645 On receiving this task, the task tree is still null and the initial expression \cleaninline{blink true} is evaluated by the interpreter.
646 This results in the task tree shown in \cref{fig:blink_tree}.
647 Rewriting always starts at the top of the tree and traverses to the leaves, the basic tasks that do the actual work.
648 The first basic task encountered is the \cleaninline{delay} task, that yields no value until the time, \qty{500}{\ms} in this case, has passed.
649 When the \cleaninline{delay} task yielded a stable value after a number of rewrites, the task continues with the right-hand side of the \cleaninline{>>\|.} combinator.
650 This combinator has a \cleaninline{writeD} task at the left-hand side that becomes stable after one rewrite step in which it writes the value to the given pin.
651 When \cleaninline{writeD} becomes stable, the written value is the task value that is observed by the right-hand side of the \cleaninline{>>=.} combinator.
652 This will call the interpreter to evaluate the expression, now that the argument of the function is known.
653 The result of the function is again a task tree, but now with different arguments to the tasks, e.g.\ the state in \cleaninline{writeD} is inversed.
654
655 \begin{figure}
656 \centering
657 \includestandalone{blinktree}
658 \caption{The task tree for a blink task in \cref{lst:blink_code} in \gls{MTASK}.}%
659 \label{fig:blink_tree}
660 \end{figure}
661
662 \subsection{Memory management}
663 The third and final phase is memory management.
664 The \gls{MTASK} \gls{RTS} is designed to run on systems with as little as \qty{2}{\kibi\byte} of \gls{RAM}.
665 Aggressive memory management is therefore vital.
666 Not all firmwares for microprocessors support heaps and---when they do---allocation often leaves holes when not used in a \emph{last in first out} strategy.
667 The \gls{RTS} uses a chunk of memory in the global data segment with its own memory manager tailored to the needs of \gls{MTASK}.
668 The size of this block can be changed in the configuration of the \gls{RTS} if necessary.
669 On an \gls{ARDUINO} UNO---equipped with \qty{2}{\kibi\byte} of \gls{RAM}---the maximum viable size is about \qty{1500}{\byte}.
670 The self-managed memory uses a similar layout as the memory layout for \gls{C} programs only the heap and the stack are switched (see \cref{fig:memory_layout}).
671
672 \begin{figure}
673 \centering
674 \includestandalone{memorylayout}
675 \caption{Memory layout in the \gls{MTASK} \gls{RTS}.}\label{fig:memory_layout}
676 \end{figure}
677
678 A task is stored below the stack and its complete state is a \gls{CLEAN} record contain most importantly the task id, a pointer to the task tree in the heap (null if not initialised yet), the current task value, the configuration of \glspl{SDS}, the configuration of peripherals, the byte code and some scheduling information.
679
680 In memory, task data grows from the bottom up and the interpreter stack is located directly on top of it growing in the same direction.
681 As a consequence, the stack moves when a new task is received.
682 This never happens within execution because communication is always processed before execution.
683 Values in the interpreter are always stored on the stack.
684 Compound data types are stored unboxed and flattened.
685 Task trees grow from the top down as in a heap.
686 This approach allows for flexible ratios, i.e.\ many tasks and small trees or few tasks and big trees.
687
688 Stable tasks, and unreachable task tree nodes are removed.
689 If a task is to be removed, tasks with higher memory addresses are moved down.
690 For task trees---stored in the heap---the \gls{RTS} already marks tasks and task trees as trash during rewriting so the heap can be compacted in a single pass.
691 This is possible because there is no sharing or cycles in task trees and nodes contain pointers pointers to their parent.
692
693
694 \section{C code generation}\label{sec:ccodegen}
695 All communication between the \gls{ITASK} server and the \gls{MTASK} server is type parametrised.
696 From the structural representation of the type, a \gls{CLEAN} parser and printer is constructed using generic programming.
697 Furthermore, a \ccpp{} parser and printer is generated for use on the \gls{MTASK} device.
698 The technique for generating the \ccpp{} parser and printer is very similar to template metaprogramming and requires a rich generic programming library or compiler support that includes a lot of metadata in the record and constructor nodes.
699 Using generic programming in the \gls{MTASK} system, both serialisation and deserialisation on the microcontroller and and the server is automatically generated.
700
701 \subsection{Server}
702 On the server, off-the-shelve generic programming techniques are used to make the serialisation and deserialisation functions (see \cref{lst:ser_deser_server}).
703 Serialisation is a simple conversion from a value of the type to a string.
704 Deserialisation is a little bit different in order to support streaming\footnotemark.
705 \footnotetext{%
706 Here the \cleaninline{*!} variant of the generic interface is chosen that has less uniqueness constraints for the compiler-generated adaptors \citep{alimarine_generic_2005,hinze_derivable_2001}.%
707 }
708 Given a list of available characters, a tuple is always returned.
709 The right-hand side of the tuple contains the remaining characters, the unparsed input.
710 The left-hand side contains either an error or a maybe value.
711 If the value is a \cleaninline{?None}, there was no full value to parse.
712 If the value is a \cleaninline{?Just}, the data field contains a value of the requested type.
713
714 \begin{lstClean}[caption={Serialisation and deserialisation functions in \gls{CLEAN}.},label={lst:ser_deser_server}]
715 generic toByteCode a :: a -> String
716 generic fromByteCode a *! :: [Char] -> (Either String (? a), [Char])
717 \end{lstClean}
718
719 \subsection{Client}
720 The \gls{RTS} of the \gls{MTASK} system runs on resource-constrained microcontrollers and is implemented in portable \ccpp{}.
721 In order to achieve more interoperation safety, the communication between the server and the client is automated, i.e.\ the serialisation and deserialisation code in the \gls{RTS} is generated.
722 The technique used for this is very similar to the technique shown in \cref{chp:first-class_datatypes}.
723 However, instead of using template metaprogramming, a feature \gls{CLEAN} lacks, generic programming is used also as a two-stage rocket.
724 In contrast to many other generic programming systems, \gls{CLEAN} allows for access to much of the metadata of the compiler.
725 For example, \cleaninline{Cons}, \cleaninline{Object}, \cleaninline{Field}, and \cleaninline{Record} generic constructors are enriched with their arity, names, types, \etc.
726 Furthermore, constructors can access the metadata of the objects and fields of their parent records.
727 Using this metadata, generic functions are created that generate \ccpp{} type definitions, parsers and printers for any first-order \gls{CLEAN} type.
728 The exact details of this technique can be found in the future in a paper that is in preparation.
729
730 \Glspl{ADT} are converted to tagged unions, newtypes to typedefs, records to structs, and arrays to dynamic size-parametrised allocated arrays.
731 For example, the \gls{CLEAN} types in \cref{lst:ser_clean} are translated to the \ccpp{} types seen in \cref{lst:ser_c}
732
733 \begin{lstClean}[caption={Simple \glspl{ADT} in \gls{CLEAN}.},label={lst:ser_clean}]
734 :: T a = A a | B NT {#Char}
735 :: NT =: NT Real
736 \end{lstClean}
737
738 \begin{lstArduino}[caption={Generated \ccpp{} type definitions for the simple \glspl{ADT}.},label={lst:ser_c}]
739 typedef double Real;
740 typedef char Char;
741
742 typedef Real NT;
743 enum T_c {A_c, B_c};
744
745 struct Char_HshArray { uint32_t size; Char *elements; };
746 struct T {
747 enum T_c cons;
748 struct { void *A;
749 struct { NT f0; struct Char_HshArray f1; } B;
750 } data;
751 };
752 \end{lstArduino}
753
754 For each of these generated types, two functions are created, a typed printer, and a typed parser (see \cref{lst:ser_pp}).
755 The parser functions are parametrised by a read function, an allocation function and parse functions for all type variables.
756 This allows for the use of these functions in environments where the communication is parametrised and the memory management is self-managed such as in the \gls{MTASK} \gls{RTS}.
757
758 \begin{lstArduino}[caption={Printer and parser for the \glspl{ADT} in \ccpp{}.},label={lst:ser_pp}]
759 struct T parse_T(uint8_t (*get)(), void *(*alloc)(size_t),
760 void *(*parse_0)(uint8_t (*)(), void *(*)(size_t)));
761
762 void print_T(void (*put)(uint8_t), struct T r,
763 void (*print_0)(void (*)(uint8_t), void *));
764 \end{lstArduino}
765
766 \section{Conclusion}
767 It is not straightforward to execute \gls{MTASK} tasks on resources-constrained \gls{IOT} edge devices.
768 To achieve this, the terms in the \gls{DSL} are compiled to compact domain-specific byte code.
769 This byte code is sent for interpretation to the light-weight \gls{RTS} of the edge device.
770 The \gls{RTS} first evaluates the main expression in the interpreter.
771 The result of this evaluation, a run time representation of the task, is a task tree.
772 This task tree is rewritten according to small-step reduction rules until a stable value is observed.
773 Rewriting multiple tasks at the same time is achieved by interleaving the rewrite steps, resulting in seamingly parallel execution of the tasks.
774 All communication, including the serialisation and deserialisation, between the server and the \gls{RTS} is automated.
775 From the structural representation of the types, printers and parsers are generated for the server and the client.
776
777 \input{subfilepostamble}
778 \end{document}