-\section{Application overview and workflow}
-The program can be divided into two main components namely the \textit{Crawler
-application} and the \textit{Input application}. The components are strictly
-separated by task and by application. The crawler is an application dedicated
-to the sole task of periodically crawling the sources asynchronously. The input
-is a web interface to a set of tools that can create, edit, remove and test
-crawlers via simple point and click user interfaces that can be worked with by
-someone without a computer science background.
+\section{Internals of the crawler generation module}
+Data marked by the user is in principle just raw html data that contains a
+table with for every RSS feed entry a table row. Within the html the code
+severel markers are placed to make the parsing more easy and removing the need
+of an advanced HTML parser. When the user presses submit a http POST request is
+prepared to send all the gathered data to the backend to get processed. The
+sending does not happen asynchronously, so the user has to wait for the data to
+be processed. Because of the sychronous data transmission the user is notified
+immediatly when the crawler is succesfully added. The data preparation that
+makes the backend able to process it is all done on the user side using
+Javascript.
+
+When the backend receives the http POST data it saves all the raw data and the
+data from the information fields. The raw data is processed line by line to
+extract the entries that contain user markings. The entries containing user
+markings are stripped from all html while creating an data structure that
+contains the locations of the markings and the original text. All data is
+stored, for example also the entries without user data, but not all data is
+processed. The storage of, at first sight, useless data is done because later
+when the crawlers needs editing the old data can be used to update the crawler.
+
+When the entries are isolated and processed they are converted to node-lists. A
+node-list is a literal list of words where a word is interpreted in the
+broadest sense, meaning that a word can be a character, a single byte, a string
+or basically anything. These node-lists are in this case the original entry
+string where all the user markers are replaced by single nodes. As an example
+lets take the following entry and its corresponding node-list assuming that the
+user marked the time, title and date correctly. Markers are visualized by
+enclosing the name of the marker in angle brackets.
+\begin{flushleft}
+ Entry: \texttt{19:00, 2014-11-12 - Foobar}\\
+ Node-list: \texttt{['<time>', ',', ' ', '<date>', ' ', '-', ' ', '<title>']}
+\end{flushleft}
+
+The last step is when the entries with markers are then processed to build
+node-lists. Node-lists are basically lists of words that, when concatenated,
+form the original entry. A word isn't a word in the linguistic sense. A word
+can be one letter or a category. The node-list is generated by putting all the
+separate characters one by one in the list and when a user marking is
+encountered, this marking is translated to the category code and that code is
+then added as a word. The nodelists are then sent to the actual algorithm to be
+converted to a graph representation.