A device is suitable for the system if it can run the engine.
The engine is compiled from one codebase and devices implement (part of) the
device specific interface. The shared codebase only uses standard \gls{C} and
no special libraries or tricks are used. Therefore, the code is compilable for
almost any device or system. Note that it is not needed to implement a full
interface. The full interface --- excluding the device specific settings --- is
listed in Appendix~\ref{app:device-interface}.  The interface works in a
similar fashion as the \gls{EDSL}. Devices do not have to implement all
functionality, this is analogous to the fact that views do not have to
implement all type classes in the \gls{EDSL}.  When the device connects with
the server for the first time, the specifications of what is implemented is
communicated.

At the time of writing the following device families are supported and can run
the device software.
\begin{itemize}
	\item \texttt{POSIX} compatible systems connected via the \gls{TCP}.

		This includes systems running \emph{Linux} and \emph{MacOS}.
	\item The \texttt{STM32} microcontrollers family supported by
		\texttt{ChibiOS} connected via serial communication.

		This is tested in particular on the \texttt{STM32f7x} series \gls{ARM}
		development board.
	\item Microcontrollers which are programmable in the \gls{Arduino} \gls{IDE}
		connected via serial communication or via \gls{TCP} over WiFi or
		Ethernet.

		This does not only include \gls{Arduino} compatible boards but also
		other boards capable of running \gls{Arduino} code. A port of the
		client has been made for the \texttt{ESP8266} powered \emph{NodeMCU}
		that is connected via \gls{TCP} over WiFi. A port also has been made
		for the regular \gls{Arduino} \emph{UNO} board which only boasts a
		meager \emph{2K} \emph{RAM}. The stack size and storage available for
		devices boasting this little \emph{RAM} has to be smaller than default
		but are still suitable to hold a hand full of \glspl{Task}.
\end{itemize}

\subsection{Client}
\subsubsection{Engine}
The client is in a constant loop listening for input and waiting to execute
\glspl{Task}. The pseudocode for this is shown in Algorithm~\ref{alg:client}.
The \CI{input\_available} function waits for input, but has a timeout set which
can be interrupted. The timeout of the function determines the amount of loops
per time interval and is a parameter that can be set during compilation for a
device.

\begin{algorithm}
	\KwData{
		\textbf{list} $tasks$,
		\textbf{time} $tm$
	}

	\Begin{
		\While{true}{
			\If{input\_available$()$}{
				receive\_data()\;
			}

			$tm\leftarrow \text{now}()$\;
			\ForEach{$t\leftarrow tasks$}{
				\uIf{is\_interrupt$(t)$ \textbf{and} had\_interrupt$(t)$}{
					run\_task$(t)$\;
				}
				\ElseIf{$tm-t.\text{lastrun} > t.\text{interval}$}{
					run\_task$(t)$\;
					\uIf{$t.\text{interval}==0$}{
						delete\_task$(t)$\;
					}\Else{
						$t.\text{lastrun}\leftarrow t$\;
					}
				}
			}
		}
	}
	\caption{Engine pseudocode}\label{alg:client}
\end{algorithm}

\subsubsection{Storage}
\glspl{Task} and \glspl{SDS} are stored on the client in memory. Some devices
have very little memory and therefore memory space is very expensive and needs
to be used optimally. Almost all microcontrollers support heaps nowadays,
however, the functions for allocating and freeing the memory on the heap are
not very space optimal and often leave holes in the heap if allocations are not
freed in reverse order. To overcome this problem the client will allocate a big
memory segment in the global data block. This block of memory resides under the
stack and its size can be set in the interface implementation. This block of
memory will be managed in a similar way as the entire memory space of the
device is managed. \Glspl{Task} will grow from the bottom up and \glspl{SDS}
will grow from the top down.

When a \gls{Task} is received, the program will traverse the memory space from
the bottom up, jumping over all \glspl{Task}. A \gls{Task} is stored as the
structure followed directly by its bytecode. Therefore it only takes two jumps
to determine the size of the \gls{Task}. When the program arrived at the last
\gls{Task}, this place is returned and the newly received \gls{Task} can be
copied to there. This method is analogously applied for \glspl{SDS}, however,
the \glspl{SDS} grow from the bottom down.

When a \gls{Task} or \gls{SDS} is removed, all remaining objects are compressed
again. This means that if the first received \gls{Task} is removed, all
\glspl{Task} received later will have to move back. Obviously, this is quite
time intensive but it can not be permitted to leave holes in the memory since
the memory space is so limited. This techniques allows for even the smallest
tested microcontrollers with only $2K$ \emph{RAM} to hold several \glspl{Task}
and \glspl{SDS}. If this technique would not be used the memory space will
decrease over time and the client can then not run for very long since holes
are evidently created at some point.

The structure instances and helper functions for traversing them in memory for
\glspl{Task} and \glspl{SDS} are shown in Listing~\ref{lst:structs}.

\begin{lstlisting}[language=C,label={lst:structs},%
	caption={The data type storing the \glspl{Task}},float]
struct task {
	uint16_t tasklength;
	uint16_t interval;
	unsigned long lastrun;
	uint8_t taskid;
	uint8_t *bc;
};

struct task *task_head(void);
struct task *task_next(struct task *t);

struct sds {
	int id;
	int value;
	char type;
};

struct sds *sds_head(void);
struct sds *sds_next(struct sds *s);
\end{lstlisting}

\subsubsection{Interpretation}
The execution of a \gls{Task} is started by running the \CI{run\_task} function
and always starts with setting the program counter and stack
pointer to zero and the bottom respectively. When finished, the
interpreter executes one step at the time while the program counter is smaller
than the program length. This code is listed in Listing~\ref{lst:interpr}. One
execution step is basically a big switch statement going over all possible
bytecode instructions. Of some instructions, the implementations are shown in
the listing. The \CI{BCPush} instruction is a little more complicated in the
real code because some decoding will take place as not all \CI{BCValue}s are of
the same length and are encoded.

\begin{lstlisting}[language=C,label={lst:interpr},%
	caption={Rough code outline for interpretation}]
#define f16(p) program[pc]*265+program[pc+1]

void run_task(struct task *t){
	uint8_t *program = t->bc;
	int plen = t->tasklength;
	int pc = 0;
	int sp = 0;
	while(pc < plen){
		switch(program[pc++]){
		case BCNOP:
			break;
		case BCPUSH:
			stack[sp++] = pc++ //Simplified
			break;
		case BCPOP:
			sp--;
			break;
		case BCSDSSTORE:
			sds_store(f16(pc), stack[--sp]);
			pc+=2;
			break;
		// ...
		case BCADD: trace("add");
			stack[sp-2] = stack[sp-2] + stack[sp-1];
			sp -= 1;
			break;
		// ...
		case BCJMPT: trace("jmpt to %d", program[pc]);
			pc = stack[--sp] ? program[pc]-1 : pc+1;
			break;
}
\end{lstlisting}

\subsection{Specification}
The server stores a description for every device available in a record type.
From the macro settings in the interface file, a profile is created that
describes the specification of the device. When the connection between the
server and a client is established, the server will send a request for
specification. The client serializes its specification and send it to the
server so that the server knows what the client is capable of.  The exact
specification is shown in Listing~\ref{lst:devicespec} and stores the
peripheral availability, the memory available for storing \glspl{Task} and
\glspl{SDS} and the size of the stack.

\begin{lstlisting}[label={lst:devicespec},
	caption={Device specification for \gls{mTask}-\glspl{Task}}]
:: MTaskDeviceSpec =
	{ haveLed     :: Bool
	, haveLCD     :: Bool
	, have...
	, bytesMemory :: Int
	, stackSize   :: Int
	, aPins       :: Int
	, dPins       :: Int
	}
\end{lstlisting}