From: Mart Lubbers <mart@martlubbers.net>
Date: Wed, 29 Oct 2014 15:29:20 +0000 (+0100)
Subject: part of algo explained
X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=5cc599b4cbfcaef87ebdb72bd0f352ea616beda4;p=bsc-thesis1415.git

part of algo explained
---

diff --git a/thesis2/2.methods.tex b/thesis2/2.methods.tex
index 477190d..3f67cb3 100644
--- a/thesis2/2.methods.tex
+++ b/thesis2/2.methods.tex
@@ -23,3 +23,25 @@ Generate xml
 \subsection{Interface}
 
 \subsection{Algorithm}
+\subsection{Preprocessing}
+When the data is received by the crawler the data is embedded as POST data in a
+HTTP request. The POST data consists of several fields with information about
+the feed and a container that has the table with the user markers embedded.
+After that the entries are extracted and processed line by line.
+
+The line processing converts the raw string of html data from a table row to a
+string. The string is stripped of all the html tags and is accompanied by a
+list of marker items.
+
+The entries that don't contain any markers are left out in the next step of
+processing. All data, including entries without user markers, is stored in the
+object too for possible later reference, for example for editing the patterns.
+
+The last step is when the entries with markers are then processed to build
+node-lists. Node-lists are basically strings where the user markers are
+replaced by patterns so that the variable data, the isolated data, is not used
+in the node-lists. 
+
+\subsection{Directed acyclic graphs}
+
+