updates

[phd-thesis.git] / top / imp.tex
diff --git a/top/imp.tex b/top/imp.tex

index 88c67ca..cacceec 100644 (file)
--- a/top/imp.tex
+++ b/top/imp.tex
@@ -2,33 +2,41 @@
  
  \input{subfilepreamble}
  
-\setcounter{chapter}{5}
+\setcounter{chapter}{4}
  
  \begin{document}
  \input{subfileprefix}
-\chapter{Implementation}%
+\chapter{The implementation of \texorpdfstring{\gls{MTASK}}{mTask}}%
  \label{chp:implementation}
  \begin{chapterabstract}
-       \noindent This chapter shows the implementation of the \gls{MTASK} system by:
+       This chapter shows the implementation of the \gls{MTASK} system by:
         \begin{itemize}
-               \item gives details of the implementation of \gls{MTASK}'s \gls{TOP} engine that executes the \gls{MTASK} tasks on the microcontroller;
-               \item shows the implementation of the byte code compiler for \gls{MTASK}'s \gls{TOP} language;
-               \item explains the machinery used to automatically serialise and deserialise data to-and-fro the device.
+               \item elaborating on the implementation and architecture of the \gls{RTS} of \gls{MTASK};
+               \item giving details of the implementation of \gls{MTASK}'s \gls{TOP} engine that executes the \gls{MTASK} tasks on the microcontroller;
+               \item showing the implementation of the byte code compiler for \gls{MTASK}'s \gls{TOP} language;
+               \item explaining the machinery used to automatically serialise and deserialise data to-and-fro the device.
         \end{itemize}
  \end{chapterabstract}
  
-Microcontrollers usually have flash-based program memory which wears out fairly quick.
-For example, the atmega328p in the \gls{ARDUINO} UNO is rated for 10000 write cycles.
+\todo[inline]{Dit hoofdstuk is het ruwst van allen}
+The \gls{MTASK} system targets resource-constrained edge devices that have little memory, processor speed and communication.
+Such edge devices are often powered by microcontrollers.
+They usually have flash-based program memory which wears out fairly quick.
+For example, the flash memory of the popular atmega328p powering the \gls{ARDUINO} UNO is just rated for 10000 write cycles.
  While this sounds like a lot, if new tasks are sent to the device every minute or so, a lifetime of only seven days is guaranteed.
-Hence, for dynamic applications, interpretation on the device is necessary, saving precious write cycles of the program memory.
+Hence, for dynamic applications, storing the program in the \gls{RAM} of the device and interpreting this code is necessary, saving precious write cycles of the program memory.
+In the \gls{MTASK} system, this is done by the \gls{MTASK} \gls{RTS}.
  
-In order to provide the device with the tools to interpret the byte code, it is programmed with a \gls{RTS}, a customisable domain-specific \gls{OS} that takes care of the execution of tasks but also low-level mechanisms such as the communication, multi tasking, and memory management.
-Once the device is programmed with the \gls{MTASK} \gls{RTS}, it can continuously receive new tasks without the need for reprogramming.
+\section{\texorpdfstring{\Glsxtrlong{RTS}}{Run time system}}
+The \gls{RTS} is a customisable domain-specific \gls{OS} that takes care of the execution of tasks, but also low-level mechanisms such as the communication, multitasking, and memory management.
+Once a device is programmed with the \gls{MTASK} \gls{RTS}, it can continuously receive new tasks without the need for reprogramming.
  The \gls{OS} is written in portable \ccpp{} and only contains a small device-specific portion.
+In order to keep the abstraction level high and the hardware requirements low, much of the high-level functionality of the \gls{MTASK} language is implemented not in terms of lower-level constructs from \gls{MTASK} language but in terms of \ccpp{} code.
  
-\section{\texorpdfstring{\Glsxtrlong{RTS}}{Run time system}}
-The event loop of the \gls{RTS} is executed repeatedly and consists of three distinct phases.
-After doing the three phases, the devices goes to sleep for as long as possible (see \cref{chp:green_computing_mtask}).
+As most microcontrollers software is run solely by a cyclic executive instead of an \gls{OS}, the \gls{RTS} of the \gls{MTASK} system is implemented as such also.
+It consists of a loop function with a relatively short execution time, similar to the one in \gls{ARDUINO}, that gets called repeatedly.
+The event loop consists of three distinct phases.
+After doing the three phases, the devices goes to sleep for as long as possible (see \cref{chp:green_computing_mtask} for more details on task scheduling).
  
  \subsection{Communication}
  In the first phase, the communication channels are processed.
@@ -36,7 +44,7 @@ The exact communication method is a customisable device-specific option baked in
  The interface is deliberately kept simple and consists of a two layer interface: a link interface and a communication interface.
  Besides opening, closing and cleaning up, the link interface has only three functions that are shown in \cref{lst:link_interface}.
  Consequently, implementing this link interface is very simple but allows for many more advanced link settings such as buffering.
-There are implementations for this interface for serial or Wi-Fi connections using \gls{ARDUINO} and \gls{TCP} connections for Linux.
+There are implementations for this interface for serial or \gls{WIFI} connections using \gls{ARDUINO} and \gls{TCP} connections for Linux.
  
  \begin{lstArduino}[caption={Link interface of the \gls{MTASK} \gls{RTS}.},label={lst:link_interface}]
  bool link_input_available(void);
@@ -84,19 +92,48 @@ There are several possible messages that can be received from the server:
  \end{description}
  
  \subsection{Execution}
-The second phase consists of performing one execution step for all tasks that wish for it.
+The second phase performs one execution step for all tasks that wish for it.
  Tasks are ordered in a priority queue ordered by the time a task needs to be executed, the \gls{RTS} selects all tasks that can be scheduled, see \cref{sec:scheduling} for more details.
-A task is stored below the stack and its complete state is a \gls{CLEAN} record contain most importantly the task id, a pointer to the task tree in the heap (null if not initialised yet), the current task value, the configuration of \glspl{SDS}, the configuration of peripherals, the byte code and some scheduling information.
+Execution of a task is always an interplay between the interpreter and the \emph{rewriter}.
  
+When a new task is received, the main expression is evaluated to produce a task tree.
+A task tree is a tree in which each node represents a task combinator and the leaves are basic tasks.
  If a task is not initialized yet, i.e.\ the pointer to the current task tree is still null, the byte code of the main function is interpreted.
  The main expression always produces a task tree.
+Execution of a task consists of continuously rewriting the task until its value is stable.
+
+Rewriting is a destructive process, i.e.\ the rewriting is done in place.
  The rewriting engine uses the interpreter when needed, e.g.\ to calculate the step continuations.
  The rewriter and the interpreter use the same stack to store intermediate values.
  Rewriting steps are small so that interleaving results in seemingly parallel execution.
  In this phase new task tree nodes may be allocated.
-Both rewriting and initialization are atomic operations in the sense that no processing on SDSs is done other than SDS operations from the task itself.
+Both rewriting and initialization are atomic operations in the sense that no processing on \glspl{SDS} is done other than \gls{SDS} operations from the task itself.
  The host is notified if a task value is changed after a rewrite step.
  
+Take for example the blink task for which the code is shown in \cref{lst:blink_code}.
+
+\begin{lstClean}[caption={Code for a blink program.},label={lst:blink_code}]
+fun \blink=(\st->delay (lit 500) >>|. writeD d3 st >>=. blink o Not)
+In {main = blink true}
+\end{lstClean}
+
+On receiving this task, the task tree is still null and the initial expression \cleaninline{blink true} is evaluated by the interpreter.
+This results in the task tree shown in \cref{fig:blink_tree}.
+Rewriting always starts at the top of the tree and traverses to the leaves, the basic tasks that do the actual work.
+The first basic task encountered is the \cleaninline{delay} task, that yields no value until the time, \qty{500}{\ms} in this case, has passed.
+When the \cleaninline{delay} task yielded a stable value, the task continues with the right-hand side of the \cleaninline{>>\|.} combinator.
+This combinator has a \cleaninline{writeD} task at the left-hand side that becomes stable after one rewrite step in which it writes the value to the given pin.
+When \cleaninline{writeD} becomes stable, the written value is the task value that is observed by the right-hand side of the \cleaninline{>>=.} combinator.
+This will call the interpreter to evaluate the expression, now that the argument of the function is known.
+The result of the function is basically the original task tree again, but now with the state inversed.
+
+\begin{figure}
+       \centering
+       \includestandalone{blinktree}
+       \caption{The task tree for a blink task in \cref{lst:blink_code} in \gls{MTASK}.}%
+       \label{fig:blink_tree}
+\end{figure}
+
  \subsection{Memory management}
  The third and final phase is memory management.
  The \gls{MTASK} \gls{RTS} is designed to run on systems with as little as \qty{2}{\kibi\byte} of \gls{RAM}.
@@ -110,13 +147,16 @@ The self-managed memory uses a similar layout as the memory layout for \gls{C} p
  \begin{figure}
         \centering
         \includestandalone{memorylayout}
-       \caption{Memory layout}\label{fig:memory_layout}
+       \caption{Memory layout in the \gls{MTASK} \gls{RTS}.}\label{fig:memory_layout}
  \end{figure}
  
+A task is stored below the stack and its complete state is a \gls{CLEAN} record contain most importantly the task id, a pointer to the task tree in the heap (null if not initialised yet), the current task value, the configuration of \glspl{SDS}, the configuration of peripherals, the byte code and some scheduling information.
+
  In memory, task data grows from the bottom up and the interpreter stack is located directly on top of it growing in the same direction.
  As a consequence, the stack moves when a new task is received.
  This never happens within execution because communication is always processed before execution.
-Values in the interpreter are always stored on the stack, even tuples.
+Values in the interpreter are always stored on the stack.
+Compound data types are stored unboxed and flattened.
  Task trees grow from the top down as in a heap.
  This approach allows for flexible ratios, i.e.\ many tasks and small trees or few tasks and big trees.
  
@@ -125,32 +165,35 @@ If a task is to be removed, tasks with higher memory addresses are moved down.
  For task trees---stored in the heap---the \gls{RTS} already marks tasks and task trees as trash during rewriting so the heap can be compacted in a single pass.
  This is possible because there is no sharing or cycles in task trees and nodes contain pointers pointers to their parent.
  
-\todo{plaa\-tje van me\-mo\-ry hier}
+\todo[inline]{plaa\-tje van me\-mo\-ry hier uitbreiden?}
  
  \section{Compiler}
  \subsection{Instruction set}
-The instruction set is a fairly standard stack machine instruction set extended with special \gls{TOP} instructions.
-\Cref{lst:instruction_type} shows the \gls{CLEAN} type representing the instruction set of which \cref{tbl:instr_task} gives detailed semantics.
-Type synonyms are used to provide insight on the arguments of the instructions.
-One notable instruction is the \cleaninline{MkTask} instruction, it allocates and initialises a task tree node and pushes a pointer to it on the stack.
+The instruction set is a fairly standard stack machine instruction set extended with special \gls{TOP} instructions for creating task tree nodes.
+All instructions are housed in a \gls{CLEAN} \gls{ADT} and serialised to the byte representation using a generic function.
+Type synonyms (\cref{lst:type_synonyms}) are used to provide insight on the arguments of the instructions.
+Labels are always two bytes long, all other arguments are one byte long.
  
-\begin{lstClean}[caption={The type housing the instruction set.},label={lst:instruction_type}]
+\begin{lstClean}[caption={Type synonyms for instructions arguments.},label={lst:type_synonyms}]
  :: ArgWidth    :== UInt8         :: ReturnWidth :== UInt8
  :: Depth       :== UInt8         :: Num         :== UInt8
  :: SdsId       :== UInt8         :: JumpLabel   =: JL UInt16
+\end{lstClean}
+
+\Cref{lst:instruction_type} shows an excerpt of the \gls{CLEAN} type that represents the instruction set.
+For example, shorthand instructions are omitted for brevity.
+Detailed semantics for the instructions are given in \cref{chp:bytecode_instruction_set}.
+One notable instruction is the \cleaninline{MkTask} instruction, it allocates and initialises a task tree node and pushes a pointer to it on the stack.
  
-//** Datatype housing all instructions
+\begin{lstClean}[caption={The type housing the instruction set in \gls{MTASK}.},label={lst:instruction_type}]
  :: BCInstr
-       //Return instructions
         //Jumps
-       = BCJumpF JumpLabel | BCJump JumpLabel | BCLabel JumpLabel | BCJumpSR ArgWidth JumpLabel
+       = BCJumpF JumpLabel | BCLabel JumpLabel | BCJumpSR ArgWidth JumpLabel
         | BCReturn ReturnWidth ArgWidth | BCTailcall ArgWidth ArgWidth JumpLabel
         //Arguments
         | BCArgs ArgWidth ArgWidth
         //Task node creation and refinement
         | BCMkTask BCTaskType | BCTuneRateMs | BCTuneRateSec
-       //Task value ops
-       | BCIsStable | BCIsUnstable | BCIsNoValue | BCIsValue
         //Stack ops
         | BCPush String255 | BCPop Num | BCRot Depth Num | BCDup | BCPushPtrs
         //Casting
@@ -159,9 +202,8 @@ One notable instruction is the \cleaninline{MkTask} instruction, it allocates an
         | BCAddI | BCSubI | ...
         ...
  
-//** Datatype housing all task types
  :: BCTaskType
-       = BCStableNode ArgWidth | ArgWidth
+       = BCStableNode ArgWidth | BCUnstableNode ArgWidth
         // Pin io
         | BCReadD | BCWriteD | BCReadA | BCWriteA | BCPinMode
         // Interrupts
@@ -169,7 +211,7 @@ One notable instruction is the \cleaninline{MkTask} instruction, it allocates an
         // Repeat
         | BCRepeat
         // Delay
-       | BCDelay | BCDelayUntil //* Only for internal use
+       | BCDelay | BCDelayUntil
         // Parallel
         | BCTAnd | BCTOr
         //Step
@@ -184,20 +226,20 @@ One notable instruction is the \cleaninline{MkTask} instruction, it allocates an
         ...
  \end{lstClean}
  
-\subsection{Compiler}
+\subsection{Compiler infrastructure}
  The bytecode compiler interpretation for the \gls{MTASK} language is implemented as a monad stack containing a writer monad and a state monad.
  The writer monad is used to generate code snippets locally without having to store them in the monadic values.
  The state monad accumulates the code, and stores the stateful data the compiler requires.
  \Cref{lst:compiler_state} shows the data type for the state, storing:
  function the compiler currently is in;
  code of the main expression;
-context (see \todo{insert ref to compilation rules step here});
+context (see \cref{ssec:step});
  code for the functions;
  next fresh label;
  a list of all the used \glspl{SDS}, either local \glspl{SDS} containing the initial value (\cleaninline{Left}) or lifted \glspl{SDS} (see \cref{sec:liftsds}) containing a reference to the associated \gls{ITASK} \gls{SDS};
  and finally there is a list of peripherals used.
  
-\begin{lstClean}[label={lst:compiler_state},caption={\Gls{MTASK}'s byte code compiler type}]
+\begin{lstClean}[label={lst:compiler_state},caption={The type for the \gls{MTASK} byte code compiler}]
  :: BCInterpret a :== StateT BCState (WriterT [BCInstr] Identity) a
  :: BCState =
         { bcs_infun        :: JumpLabel
@@ -224,7 +266,7 @@ For example, the \cleaninline{BCArg} instruction is often called with argument \
  Furthermore, redundant instructions (e.g.\ pop directly after push) are removed as well in order not to burden the code generation with these intricacies.
  Finally the labels are resolved to represent actual program addresses instead of freshly generated identifiers.
  After the byte code is ready, the lifted \glspl{SDS} are resolved to provide an initial value for them.
-The result---byte code, \gls{SDS} specification and perpipheral specifications---are the result of the process, ready to be sent to the device. 
+The result---byte code, \gls{SDS} specification and perpipheral specifications---are the result of the process, ready to be sent to the device.
  
  \section{Compilation rules}
  This section describes the compilation rules, the translation from abstract syntax to byte code.
@@ -364,7 +406,7 @@ Arguments wider than one stack cell are fetched in reverse to preserve the order
                 {} & v = \Sigma^{i-1}_{j=0}\stacksize{a_{f^j}}~\text{ and }~ w = v + \stacksize{a_{f^i}}\\
  \end{align*}
  
-Translating the compilation schemes for functions to Clean is not as straightforward as other schemes due to the nature of shallow embedding.\todo{deze \P{} moet ge\-\"up\-da\-ted worden}
+Translating the compilation schemes for functions to Clean is not as straightforward as other schemes due to the nature of shallow embedding.
  The \cleaninline{fun} class has a single function with a single argument.
  This argument is a Clean function that---when given a callable Clean function representing the mTask function---will produce \cleaninline{main} and a callable function.
  To compile this, the argument must be called with a function representing a function call in mTask.
@@ -446,7 +488,7 @@ where
         rtrn m = m >>| tell` [BCMkTask (bcstable m)]
  \end{lstClean}
  
-\subsection{Step combinator}\label{ssec:step}
+\subsection{Sequential combinator}\label{ssec:step}
  The \cleaninline{step} construct is a special type of task because the task value of the left-hand side may change over time.
  Therefore, the continuation tasks on the right-hand side are \emph{observing} this task value and acting upon it.
  In the compilation scheme, all continuations are first converted to a single function that has two arguments: the stability of the task and its value.
@@ -564,11 +606,7 @@ This initial value is stored as a byte code encoded value in the state and the c
  
  Compiling \cleaninline{getSds} is a matter of executing the \cleaninline{BCInterpret} representing the \gls{SDS}, which yields the identifier that can be embedded in the instruction.
  Setting the \gls{SDS} is similar: the identifier is retrieved and the value is written to put in a task tree so that the resulting task can remember the value it has written.
-Lifted SDSs are compiled in a very similar way.\todo{deze \P{} moet naar integration?}
-The only difference is that there is no initial value but an iTasks SDS when executing the Clean function.
-A lens on this SDS converting \cleaninline{a} from the \cleaninline{Shared a} to a \cleaninline{String255}---a bytecode encoded version---is stored in the state.
-The encoding and decoding itself is unsafe when used directly but the type system of the language and the abstractions make it safe.
-Upon sending the mTask task to the device, the initial values of the lifted SDSs are fetched to complete the SDS specification.
+Lifted SDSs are compiled in a very similar way \cref{sec:liftsds}.
  
  % VimTeX: SynIgnore on
  \begin{lstClean}[caption={Backend implementation for the SDS classes.},label={lst:comp_sds}]
@@ -587,14 +625,88 @@ instance sds BCInterpret where
  \end{lstClean}
  % VimTeX: SynIgnore off
  
-\section{C code generation}\label{sec:ccodegen}
+\section{\texorpdfstring{\Gls{C}}{C} code generation}\label{sec:ccodegen}
+\todo[inline]{Dit is nog zeer ruw}
  All communication between the \gls{ITASK} server and the \gls{MTASK} server is type-parametrised.
  From the structural representation of the type, a \gls{CLEAN} parser and printer is constructed using generic programming.
  Furthermore, a \ccpp{} parser and printer is generated for use on the \gls{MTASK} device.
  The technique for generating the \ccpp{} parser and printer is very similar to template metaprogramming and requires a generic programming library or compiler support that includes a lot of metadata in the record and constructor nodes.
+Using generic programming in the \gls{MTASK} system, both serialisation and deserialisation on the microcontroller and and the server is automatically generated.
+
+\subsection{Server}
+On the server, off-the-shelve generic programming techniques are used to make the serialisation and deserialisation functions (see \cref{lst:ser_deser_server}).
+Serialisation is a simple conversion from a value of the type to a string.
+Deserialisation is a little bit different in order to support streaming\footnotemark{}.
+\footnotetext{%
+       Here the \cleaninline{*!} variant of the generic interface is chosen that has less uniqueness constraints for the compiler-generated adaptors\citep{alimarine_generic_2005,hinze_derivable_2001}.%
+}
+Given a list of available characters, a tuple is always returned.
+The right-hand side of the tuple contains the remaining characters, the unparsed input.
+The left-hand side contains either an error or a maybe value.
+If the value is a \cleaninline{?None}, there was no full value to parse.
+If the value is a \cleaninline{?Just}, the data field contains a value of the requested type.
+
+\begin{lstClean}[caption={Serialisation and deserialisation functions in \gls{CLEAN}.},label={lst:ser_deser_server}]
+generic toByteCode   a    :: a -> String
+generic fromByteCode a *! :: [Char] -> (Either String (? a), [Char])
+\end{lstClean}
+
+\subsection{Client}
+The \gls{RTS} of the \gls{MTASK} system runs on resource-constrained microcontrollers and is implemented in portable \ccpp{}.
+In order to still achieve some type safety, the communication between the server and the client is automated, i.e.\ the serialisation and deserialisation code in the \gls{RTS} is generated.
+The technique used for this is very similar to the technique shown in \cref{chp:first-class_datatypes}.
+However, instead of using template metaprogramming, a feature \gls{CLEAN} lacks, generic programming is used also as a two-stage rocket.
+In contrast to many other generic programming systems, \gls{CLEAN} allows for access to much of the metadata of the compiler.
+For example, \cleaninline{Cons}, \cleaninline{Object}, \cleaninline{Field}, and \cleaninline{Record} generic constructors are enriched with their arity, names, types, \etc.
+Furthermore, constructors can access the metadata of the objects and fields of their parent records.
+Using this metadata, generic functions can be created that generate \ccpp{} type definitions, parsers and printers for any first-order \gls{CLEAN} type.
+The exact details of this technique can be found later in a paper that is in preparation\todo{noe\-men?}.
+
+\Glspl{ADT} are converted to tagged unions, newtypes to typedefs, records to structs, and arrays to dynamically size-parametrised allocated arrays.
+For example, the \gls{CLEAN} types in \cref{lst:ser_clean} are translated to the \ccpp{} types seen in \cref{lst:ser_c}
+
+\begin{lstClean}[caption={Simple \glspl{ADT} in \gls{CLEAN}.},label={lst:ser_clean}]
+:: T a = A a | B NT {#Char}
+:: NT =: NT Real
+\end{lstClean}
+
+\begin{lstArduino}[caption={Simple \glspl{ADT} in \ccpp{}.},label={lst:ser_c}]
+typedef double Real;
+typedef char Char;
  
+typedef Real NT;
+enum T_c {A_c, B_c};
+
+struct Char_HshArray { uint32_t size; Char *elements; };
+struct T {
+       enum T_c cons;
+       struct { void *A;
+                struct { NT f0; struct Char_HshArray f1; } B;
+       } data;
+};
+\end{lstArduino}
+
+For each of these generated types, two functions are created, a typed printer, and a typed parser (see \cref{lst:ser_pp}).
+The parser functions are parametrised by a read function, an allocation function and parse functions for all type variables.
+This allows for the use of these functions in environments where the communication is parametrised and the memory management is self-managed such as in the \gls{MTASK} \gls{RTS}.
+
+\begin{lstArduino}[caption={Printer and parser for the \glspl{ADT} in \ccpp{}.},label={lst:ser_pp}]
+struct T parse_T(uint8_t (*get)(), void *(*alloc)(size_t), void *(*parse_0)(uint8_t (*)(), void *(*)(size_t)));
+
+void print_T(void (*put)(uint8_t), struct T r, void (*print_0)(void (*)(uint8_t), void *));
+\end{lstArduino}
+\todo[inline]{uitbreiden, maar niet te veel}
  
  \section{Conclusion}
+It is not straightforward to execute \gls{MTASK} tasks on resources-constrained \gls{IOT} edge devices.
+To achieve this, the terms in the \gls{DSL} are compiled to domain-specific byte code.
+This byte code is sent for interpretation to the light-weight \gls{RTS} of the edge device.
+First the expression is evaluated.
+The result of this evaluation, a run time representation of the task, is a task tree.
+This task tree is rewritten according to rewrite rules until a stable value is observed.
+Rewriting multiple tasks at the same time is achieved by interleaving the rewrite steps, resulting in seamingly parallel execution of the tasks.
+All communication is automated.
+
  \todo[inline]{conclusion}
  
  \input{subfilepostamble}