From: Mart Lubbers Date: Mon, 2 Feb 2015 11:37:06 +0000 (+0100) Subject: update 3, not finished X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=e6b7a0cca7e92f4ed48762921ac461ceb76a795a;p=bsc-thesis1415.git update 3, not finished --- diff --git a/thesis2/3.methods.tex b/thesis2/3.methods.tex index b02e707..2f671be 100644 --- a/thesis2/3.methods.tex +++ b/thesis2/3.methods.tex @@ -42,24 +42,31 @@ then added as a word. The nodelists are then sent to the actual algorithm to be converted to a graph representation. \section{Minimizing DAWGs} +\subsection{Datastructure} We represent the user generated patterns as DAWGs by converting the node-lists to DAWGS. Normally DAWGs have as edgelabels letters from an alphabet, in our case the DAWGs alphabet contains all letters, whitespace and punctuation but -also the specified user markers. DAWGs are a graph but by using them as an -automaton we can check if a word is accepted by the automaton, or in other -words, if the word matches the specified pattern. The first algorithm to -generate DAWGs from node-lists was proposed by Hopcroft et +also the specified user markers which can be multiple characters in length. + +DAWGs are graphs but due to the constraints we can check if a match occurs by +checking if a path exists that creates the word by concatenating all the edge +labels. The first algorithm to +generate DAWGs from words was proposed by Hopcroft et al\cite{Hopcroft1971}. It is an incremental approach in generating the graph. -Meaning that entry by entry the graph was built. The only constraint that the -algorithm has is that the entries must be sorted lexicographically. Later on -Daciuk et al.\cite{Daciuk2000} improved on the original algorithm and their -algorithm is the algorithm we used to minimize or optimize our DAWGs. +Meaning that entry by entry the graph will be expanded. Hopcrofts algorithm has +the constraint of lexicographical ordering. Later on Daciuk et +al.\cite{Daciuk2000} improved on the original algorithm and their algorithm is +the algorithm we used to minimize our DAWGs. A minimal graph is a graph $G$ for +which there is no graph $G'$ that has less paths and $\mathcal{L}(G)=\mathcal{L +}(G')$ where $\mathcal{L}(G)$ is the set of all words present in the DAWG. + +\subsection{Algorithm} Pseudocode for the algorithm can be found in Listing~\ref{pseudodawg}. Incrementally node-lists are added to create a graph. For example in {Subgraphs in Figure}~\ref{dawg1} visualizes the construction of the -DAWG from the entries: \texttt{a.bc}, \texttt{a,bc} and \texttt{a,bd}. +DAWG from the entries: \texttt{aibc}, \texttt{aibc} and \texttt{ajbd}. In SG0 the graph is only the null graph, described by $G=(\{q_0\},\{q_0\},\{\}\{\})$ and does not contain any entries. @@ -109,10 +116,10 @@ the DAWG is still optimal. q46 [label="q6"]; n4 -> q40[label="SG4"]; q40 -> q41[label="a"]; - q41 -> q42[label=","]; + q41 -> q42[label="i"]; q42 -> q43[label="b"]; q43 -> q44[label="c"]; - q41 -> q45[label="."]; + q41 -> q45[label="j"]; q45 -> q46[label="b"]; q46 -> q44[label="d"]; q46 -> q44[label="c"]; @@ -122,15 +129,15 @@ the DAWG is still optimal. q31 [label="q1"]; q32 [label="q2"]; q33 [label="q3"]; - q34 [label="q4",shape=doublecircle]; + q34 [label="q4"ishape=doublecircle]; q35 [label="q5"]; n3 -> q30[label="SG3"]; q30 -> q31[label="a"]; - q31 -> q32[label=","]; + q31 -> q32[label="i"]; q32 -> q33[label="b"]; q33 -> q34[label="c"]; - q33 -> q34[label="d",constraint=false]; - q31 -> q35[label="."]; + q33 -> q34[label="d"iconstraint=false]; + q31 -> q35[label="j"]; q35 -> q33[label="b"]; n2 [style=invis]; @@ -138,14 +145,14 @@ the DAWG is still optimal. q21 [label="q1"]; q22 [label="q2"]; q23 [label="q3"]; - q24 [label="q4",shape=doublecircle]; + q24 [label="q4"ishape=doublecircle]; q25 [label="q5"]; n2 -> q20[label="SG2"]; q20 -> q21[label="a"]; - q21 -> q22[label=","]; + q21 -> q22[label="i"]; q22 -> q23[label="b"]; q23 -> q24[label="c"]; - q21 -> q25[label="."]; + q21 -> q25[label="j"]; q25 -> q23[label="b"]; n1 [style=invis]; @@ -153,10 +160,10 @@ the DAWG is still optimal. q11 [label="q1"]; q12 [label="q2"]; q13 [label="q3"]; - q14 [label="q4",shape=doublecircle]; + q14 [label="q4"ishape=doublecircle]; n1 -> q10[label="SG1"]; q10 -> q11[label="a"]; - q11 -> q12[label=","]; + q11 -> q12[label="i"]; q12 -> q13[label="b"]; q13 -> q14[label="c"];