}
\end{figure}
-\subsection{Minimality of the algorithm}
-
-
\subsection{Appliance on extraction of patterns}
The text data in combination with the user markings can not be converted
automatically to a DAWG using the algorithm we described. This is because the
\label{nddawg}
\centering
\includegraphics[width=\linewidth]{nddawg.eps}
+ \strut\\
\caption{Example non determinism}
\end{figure}
graphs accepting the same language there is a single graph with the least
amount of states. Mihov\cite{Mihov1998} has proven that the algorithm is
minimal in its original form. Our program converts the node-lists to DAWGs that
-can possibly contain non deterministic nodes and therefore one can argue about
-the minimality. Due to the nature of the determinism this is not the case. The
-non determinism is only visible when matching the data and not in the real
-graph since in the real graph we ...
+can possibly contain non deterministic transitions from nodes and therefore one
+can argue about Myhill-Nerodes theorem and Mihovs proof holding.. Due to the
+nature of the determinism this is not the case and both hold. In reality the
+graph itself is only non-deterministic when expanding the categories and thus
+only during matching.
+
+Choosing the smartest path during matching the program has to choose
+deterministically between possibly multiple path with possibly multiple
+results. There are several possibilities or heuristics to choose from.
+\begin{itemize}
+ \item Maximum fields heuristic\\
+
+ This heuristic prefers the result that has the highest amount
+ of categories filled with actual text. Using this method the
+ highest amount of data fields will be getting filled at all
+ times. The downside of this method is that because of this it
+ might be that some data is not put in the right field because a
+ suboptimal splitting occurred that has put the data in two
+ separate fields whereas it should be in one field.
+ \item Maximum path heuristic\\
+
+ Maximum path heuristic tries to find a match with the highest
+ amount of fixed path transitions. Fixed path transitions are
+ transitions that occur not within a category. The philosophy
+ behind is, is that because the path are hard coded in the graph
+ they must be important. The downside of this method is when
+ overlap occurs between hard coded paths and information within
+ the categories. For example a band that is called
+ \texttt{Location} could interfere greatly with a hard coded
+ path that marks a location using the same words.
+\end{itemize}
+If we would know more about the categories the best heuristic automatically
+becomes the maximum path heuristic. When, as in our implementation, there is
+very little information both heuristics perform about the same.