converted to a graph representation.
\section{Minimizing DAWGs}
+\subsection{Datastructure}
We represent the user generated patterns as DAWGs by converting the node-lists
to DAWGS. Normally DAWGs have as edgelabels letters from an alphabet, in our
case the DAWGs alphabet contains all letters, whitespace and punctuation but
-also the specified user markers. DAWGs are a graph but by using them as an
-automaton we can check if a word is accepted by the automaton, or in other
-words, if the word matches the specified pattern. The first algorithm to
-generate DAWGs from node-lists was proposed by Hopcroft et
+also the specified user markers which can be multiple characters in length.
+
+DAWGs are graphs but due to the constraints we can check if a match occurs by
+checking if a path exists that creates the word by concatenating all the edge
+labels. The first algorithm to
+generate DAWGs from words was proposed by Hopcroft et
al\cite{Hopcroft1971}. It is an incremental approach in generating the graph.
-Meaning that entry by entry the graph was built. The only constraint that the
-algorithm has is that the entries must be sorted lexicographically. Later on
-Daciuk et al.\cite{Daciuk2000} improved on the original algorithm and their
-algorithm is the algorithm we used to minimize or optimize our DAWGs.
+Meaning that entry by entry the graph will be expanded. Hopcrofts algorithm has
+the constraint of lexicographical ordering. Later on Daciuk et
+al.\cite{Daciuk2000} improved on the original algorithm and their algorithm is
+the algorithm we used to minimize our DAWGs. A minimal graph is a graph $G$ for
+which there is no graph $G'$ that has less paths and $\mathcal{L}(G)=\mathcal{L
+}(G')$ where $\mathcal{L}(G)$ is the set of all words present in the DAWG.
+
+\subsection{Algorithm}
Pseudocode for the algorithm can be found in Listing~\ref{pseudodawg}.
Incrementally node-lists are added to create a graph. For example in
{Subgraphs in Figure}~\ref{dawg1} visualizes the construction of the
-DAWG from the entries: \texttt{a.bc}, \texttt{a,bc} and \texttt{a,bd}.
+DAWG from the entries: \texttt{aibc}, \texttt{aibc} and \texttt{ajbd}.
In SG0 the graph is only the null graph, described by
$G=(\{q_0\},\{q_0\},\{\}\{\})$ and does not contain any entries.
q46 [label="q6"];
n4 -> q40[label="SG4"];
q40 -> q41[label="a"];
- q41 -> q42[label=","];
+ q41 -> q42[label="i"];
q42 -> q43[label="b"];
q43 -> q44[label="c"];
- q41 -> q45[label="."];
+ q41 -> q45[label="j"];
q45 -> q46[label="b"];
q46 -> q44[label="d"];
q46 -> q44[label="c"];
q31 [label="q1"];
q32 [label="q2"];
q33 [label="q3"];
- q34 [label="q4",shape=doublecircle];
+ q34 [label="q4"ishape=doublecircle];
q35 [label="q5"];
n3 -> q30[label="SG3"];
q30 -> q31[label="a"];
- q31 -> q32[label=","];
+ q31 -> q32[label="i"];
q32 -> q33[label="b"];
q33 -> q34[label="c"];
- q33 -> q34[label="d",constraint=false];
- q31 -> q35[label="."];
+ q33 -> q34[label="d"iconstraint=false];
+ q31 -> q35[label="j"];
q35 -> q33[label="b"];
n2 [style=invis];
q21 [label="q1"];
q22 [label="q2"];
q23 [label="q3"];
- q24 [label="q4",shape=doublecircle];
+ q24 [label="q4"ishape=doublecircle];
q25 [label="q5"];
n2 -> q20[label="SG2"];
q20 -> q21[label="a"];
- q21 -> q22[label=","];
+ q21 -> q22[label="i"];
q22 -> q23[label="b"];
q23 -> q24[label="c"];
- q21 -> q25[label="."];
+ q21 -> q25[label="j"];
q25 -> q23[label="b"];
n1 [style=invis];
q11 [label="q1"];
q12 [label="q2"];
q13 [label="q3"];
- q14 [label="q4",shape=doublecircle];
+ q14 [label="q4"ishape=doublecircle];
n1 -> q10[label="SG1"];
q10 -> q11[label="a"];
- q11 -> q12[label=","];
+ q11 -> q12[label="i"];
q12 -> q13[label="b"];
q13 -> q14[label="c"];