From: Mart Lubbers Date: Mon, 3 Nov 2014 16:45:17 +0000 (+0100) Subject: update send franc X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=1f1621940b2a7e9183a8841c65541546ebbd521a;p=bsc-thesis1415.git update send franc --- diff --git a/thesis2/1.introduction.tex b/thesis2/1.introduction.tex index 2aae8a6..3efa80b 100644 --- a/thesis2/1.introduction.tex +++ b/thesis2/1.introduction.tex @@ -48,7 +48,7 @@ information so that the information will appear structured in the expensive. The program is built so that a programmer can easily add fields or categories -to the data to make it flexible for changes. +to the application to reduce programming costs and complex modifications. \section{Why RSS/Atom} Information from venues comes in various different format with for each format @@ -72,7 +72,9 @@ data are static and mainly only usable by computer science experts. There is a great need of data mining in non structured data because the data within companies and on the internet is piling up and are usually left to catch dust. -The project is a continuation of the past project done by Roelofs et -al.\cite{Roelofs2009}. The techniques described by Roelofs et al. are more -focussed on extracting data from websites and/or already isolated data so it -can be an addition to the current project. +The project is a side project of the past project done by Roelofs et +al.\cite{Roelofs2009}. Roelofs et al. describes a technique to recognize date +strings using an adapted Levensteins algorithm. This technique can be fitted in +the current project because it works on a low level and the technique we +describe works on a high level. The algorithm only works on already isolated +data. diff --git a/thesis2/2.methods.tex b/thesis2/2.methods.tex index 9df1941..db5e56c 100644 --- a/thesis2/2.methods.tex +++ b/thesis2/2.methods.tex @@ -22,7 +22,6 @@ Generate xml \section{Crawler application} \subsection{Interface} -\subsection{Algorithm} \subsection{Preprocessing} When the data is received by the crawler the data is embedded as POST data in a HTTP request. The POST data consists of several fields with information about @@ -56,7 +55,7 @@ and intermediate states are marked with a single circle. \caption{Sample DAG} \label{fig:f21} \centering - \digraph[]{graph2}{ + \digraph[]{graph21}{ rankdir=LR; 1, 2 [shape="circle"]; 3, 4 [shape="doublecircle"]; @@ -66,6 +65,87 @@ and intermediate states are marked with a single circle. } \end{figure} -Using the algorithm described by Hopcroft et al\cite{Hopcroft1971} the -nodelists are converted into minimal directed acyclic graphs with a low -complexity($\mathcal{O}(N\log{N})$). +The first algorithm to generate DAG's was proposed by Hopcroft et +al\cite{Hopcroft1971}. The algorithm they described wasn't incremental and had +a complexity of $\mathcal{O}(N\log{N})$. \cite{Daciuk2000} et al. later +extended the algorithm and created an incremental one without increasing the +computational complexity. The non incremental algorithm from Daciuk et al. is +used to convert the nodelists to a graph. + +For example constructing a graph that from the entry: \textit{a,bc} and +\textit{a.bc} goes in the following steps: + +\begin{figure}[H] + \caption{Sample DAG, first entry} + \label{fig:f22} + \centering + \digraph[]{graph22}{ + rankdir=LR; + 1,2,3,5 [shape="circle"]; + 5 [shape="doublecircle"]; + 1 -> 2 [label="a"]; + 2 -> 3 [label="."]; + 3 -> 4 [label="b"]; + 4 -> 5 [label="c"]; + } +\end{figure} + +\begin{figure}[H] + \caption{Sample DAG, second entry} + \label{fig:f23} + \centering + \digraph[]{graph23}{ + rankdir=LR; + 1,2,3,5,6 [shape="circle"]; + 5 [shape="doublecircle"]; + 1 -> 2 [label="a"]; + 2 -> 3 [label="."]; + 3 -> 4 [label="b"]; + 4 -> 5 [label="c"]; + + 2 -> 6 [label=","]; + 6 -> 4 [label="b"]; + } +\end{figure} + +\subsection{Defining categories} +pass + +\subsection{Process} +Proposal was written + + +First html/mail/fax/rss, worst case rss + + +After some research and determining the scope of the project we decided only to +do RSS, this because RSS tends to force structure in the data because RSS feeds +are often generated by the website and thus reliable and consistent. We found a +couple of good RSS feeds. + + +At first the general framework was designed and implemented, no method yet. + + +Started with method for recognizing separators. + + +Found research paper about algorithm that can create directed acyclic graphs +from string, although it was designed to compress word lists it can be +(mis)used to extract information. + + +Implementation of DAG algorithm found and tied to the program. + + +Command line program ready. Conversation with both supervisors, gui had to be +made. + +Step by step gui created. Web interface as a control center for the crawlers. + + +Gui optimized. + + +Concluded that the program doesn't reach wide audience due to lack of well +structured rss feeds. diff --git a/thesis2/graph21.dot b/thesis2/graph21.dot new file mode 100644 index 0000000..887fc4b --- /dev/null +++ b/thesis2/graph21.dot @@ -0,0 +1,8 @@ +digraph graph21 { +rankdir=LR; +1, 2 [shape="circle"]; +3, 4 [shape="doublecircle"]; +1 -> 2 [label="a"]; +2 -> 3 [label="a"]; +2 -> 4 [label="b"]; +} diff --git a/thesis2/graph21.ps b/thesis2/graph21.ps new file mode 100644 index 0000000..4cd7a19 --- /dev/null +++ b/thesis2/graph21.ps @@ -0,0 +1,309 @@ +%!PS-Adobe-3.0 +%%Creator: graphviz version 2.38.0 (20140413.2041) +%%Title: graph21 +%%Pages: (atend) +%%BoundingBox: (atend) +%%EndComments +save +%%BeginProlog +/DotDict 200 dict def +DotDict begin + +/setupLatin1 { +mark +/EncodingVector 256 array def + EncodingVector 0 + +ISOLatin1Encoding 0 255 getinterval putinterval +EncodingVector 45 /hyphen put + +% Set up ISO Latin 1 character encoding +/starnetISO { + dup dup findfont dup length dict begin + { 1 index /FID ne { def }{ pop pop } ifelse + } forall + /Encoding EncodingVector def + currentdict end definefont +} def +/Times-Roman starnetISO def +/Times-Italic starnetISO def +/Times-Bold starnetISO def +/Times-BoldItalic starnetISO def +/Helvetica starnetISO def +/Helvetica-Oblique starnetISO def +/Helvetica-Bold starnetISO def +/Helvetica-BoldOblique starnetISO def +/Courier starnetISO def +/Courier-Oblique starnetISO def +/Courier-Bold starnetISO def +/Courier-BoldOblique starnetISO def +cleartomark +} bind def + +%%BeginResource: procset graphviz 0 0 +/coord-font-family /Times-Roman def +/default-font-family /Times-Roman def +/coordfont coord-font-family findfont 8 scalefont def + +/InvScaleFactor 1.0 def +/set_scale { + dup 1 exch div /InvScaleFactor exch def + scale +} bind def + +% styles +/solid { [] 0 setdash } bind def +/dashed { [9 InvScaleFactor mul dup ] 0 setdash } bind def +/dotted { [1 InvScaleFactor mul 6 InvScaleFactor mul] 0 setdash } bind def +/invis {/fill {newpath} def /stroke {newpath} def /show {pop newpath} def} bind def +/bold { 2 setlinewidth } bind def +/filled { } bind def +/unfilled { } bind def +/rounded { } bind def +/diagonals { } bind def +/tapered { } bind def + +% hooks for setting color +/nodecolor { sethsbcolor } bind def +/edgecolor { sethsbcolor } bind def +/graphcolor { sethsbcolor } bind def +/nopcolor {pop pop pop} bind def + +/beginpage { % i j npages + /npages exch def + /j exch def + /i exch def + /str 10 string def + npages 1 gt { + gsave + coordfont setfont + 0 0 moveto + (\() show i str cvs show (,) show j str cvs show (\)) show + grestore + } if +} bind def + +/set_font { + findfont exch + scalefont setfont +} def + +% draw text fitted to its expected width +/alignedtext { % width text + /text exch def + /width exch def + gsave + width 0 gt { + [] 0 setdash + text stringwidth pop width exch sub text length div 0 text ashow + } if + grestore +} def + +/boxprim { % xcorner ycorner xsize ysize + 4 2 roll + moveto + 2 copy + exch 0 rlineto + 0 exch rlineto + pop neg 0 rlineto + closepath +} bind def + +/ellipse_path { + /ry exch def + /rx exch def + /y exch def + /x exch def + matrix currentmatrix + newpath + x y translate + rx ry scale + 0 0 1 0 360 arc + setmatrix +} bind def + +/endpage { showpage } bind def +/showpage { } def + +/layercolorseq + [ % layer color sequence - darkest to lightest + [0 0 0] + [.2 .8 .8] + [.4 .8 .8] + [.6 .8 .8] + [.8 .8 .8] + ] +def + +/layerlen layercolorseq length def + +/setlayer {/maxlayer exch def /curlayer exch def + layercolorseq curlayer 1 sub layerlen mod get + aload pop sethsbcolor + /nodecolor {nopcolor} def + /edgecolor {nopcolor} def + /graphcolor {nopcolor} def +} bind def + +/onlayer { curlayer ne {invis} if } def + +/onlayers { + /myupper exch def + /mylower exch def + curlayer mylower lt + curlayer myupper gt + or + {invis} if +} def + +/curlayer 0 def + +%%EndResource +%%EndProlog +%%BeginSetup +14 default-font-family set_font +1 setmiterlimit +% /arrowlength 10 def +% /arrowwidth 5 def + +% make sure pdfmark is harmless for PS-interpreters other than Distiller +/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse +% make '<<' and '>>' safe on PS Level 1 devices +/languagelevel where {pop languagelevel}{1} ifelse +2 lt { + userdict (<<) cvn ([) cvn load put + userdict (>>) cvn ([) cvn load put +} if + +%%EndSetup +setupLatin1 +%%Page: 1 1 +%%PageBoundingBox: 36 36 246 150 +%%PageOrientation: Portrait +0 0 1 beginpage +gsave +36 36 210 114 boxprim clip newpath +1 1 set_scale 0 rotate 40 40 translate +% 1 +gsave +1 setlinewidth +0 0 0 nodecolor +18 53 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +14.5 49.3 moveto 7 (1) alignedtext +grestore +% 2 +gsave +1 setlinewidth +0 0 0 nodecolor +97 53 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +93.5 49.3 moveto 7 (2) alignedtext +grestore +% 1->2 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 36.09 53 moveto +45.66 53 57.82 53 68.68 53 curveto +stroke +0 0 0 edgecolor +newpath 68.96 56.5 moveto +78.96 53 lineto +68.96 49.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 68.96 56.5 moveto +78.96 53 lineto +68.96 49.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +54 56.8 moveto 7 (a) alignedtext +grestore +% 3 +gsave +1 setlinewidth +0 0 0 nodecolor +180 84 18 18 ellipse_path stroke +1 setlinewidth +0 0 0 nodecolor +180 84 22 22 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +176.5 80.3 moveto 7 (3) alignedtext +grestore +% 2->3 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 114 59.12 moveto +124.16 63.01 137.61 68.16 149.64 72.76 curveto +stroke +0 0 0 edgecolor +newpath 148.72 76.16 moveto +159.31 76.46 lineto +151.22 69.62 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 148.72 76.16 moveto +159.31 76.46 lineto +151.22 69.62 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +133 71.8 moveto 7 (a) alignedtext +grestore +% 4 +gsave +1 setlinewidth +0 0 0 nodecolor +180 22 18 18 ellipse_path stroke +1 setlinewidth +0 0 0 nodecolor +180 22 22 22 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +176.5 18.3 moveto 7 (4) alignedtext +grestore +% 2->4 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 114 46.88 moveto +124.16 42.99 137.61 37.84 149.64 33.24 curveto +stroke +0 0 0 edgecolor +newpath 151.22 36.38 moveto +159.31 29.54 lineto +148.72 29.84 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 151.22 36.38 moveto +159.31 29.54 lineto +148.72 29.84 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +133 43.8 moveto 7 (b) alignedtext +grestore +endpage +showpage +grestore +%%PageTrailer +%%EndPage: 1 +%%Trailer +%%Pages: 1 +%%BoundingBox: 36 36 246 150 +end +restore +%%EOF diff --git a/thesis2/graph22.dot b/thesis2/graph22.dot new file mode 100644 index 0000000..f797302 --- /dev/null +++ b/thesis2/graph22.dot @@ -0,0 +1,9 @@ +digraph graph22 { +rankdir=LR; +1,2,3,5 [shape="circle"]; +5 [shape="doublecircle"]; +1 -> 2 [label="a"]; +2 -> 3 [label="."]; +3 -> 4 [label="b"]; +4 -> 5 [label="c"]; +} diff --git a/thesis2/graph22.ps b/thesis2/graph22.ps new file mode 100644 index 0000000..7e806e8 --- /dev/null +++ b/thesis2/graph22.ps @@ -0,0 +1,338 @@ +%!PS-Adobe-3.0 +%%Creator: graphviz version 2.38.0 (20140413.2041) +%%Title: graph22 +%%Pages: (atend) +%%BoundingBox: (atend) +%%EndComments +save +%%BeginProlog +/DotDict 200 dict def +DotDict begin + +/setupLatin1 { +mark +/EncodingVector 256 array def + EncodingVector 0 + +ISOLatin1Encoding 0 255 getinterval putinterval +EncodingVector 45 /hyphen put + +% Set up ISO Latin 1 character encoding +/starnetISO { + dup dup findfont dup length dict begin + { 1 index /FID ne { def }{ pop pop } ifelse + } forall + /Encoding EncodingVector def + currentdict end definefont +} def +/Times-Roman starnetISO def +/Times-Italic starnetISO def +/Times-Bold starnetISO def +/Times-BoldItalic starnetISO def +/Helvetica starnetISO def +/Helvetica-Oblique starnetISO def +/Helvetica-Bold starnetISO def +/Helvetica-BoldOblique starnetISO def +/Courier starnetISO def +/Courier-Oblique starnetISO def +/Courier-Bold starnetISO def +/Courier-BoldOblique starnetISO def +cleartomark +} bind def + +%%BeginResource: procset graphviz 0 0 +/coord-font-family /Times-Roman def +/default-font-family /Times-Roman def +/coordfont coord-font-family findfont 8 scalefont def + +/InvScaleFactor 1.0 def +/set_scale { + dup 1 exch div /InvScaleFactor exch def + scale +} bind def + +% styles +/solid { [] 0 setdash } bind def +/dashed { [9 InvScaleFactor mul dup ] 0 setdash } bind def +/dotted { [1 InvScaleFactor mul 6 InvScaleFactor mul] 0 setdash } bind def +/invis {/fill {newpath} def /stroke {newpath} def /show {pop newpath} def} bind def +/bold { 2 setlinewidth } bind def +/filled { } bind def +/unfilled { } bind def +/rounded { } bind def +/diagonals { } bind def +/tapered { } bind def + +% hooks for setting color +/nodecolor { sethsbcolor } bind def +/edgecolor { sethsbcolor } bind def +/graphcolor { sethsbcolor } bind def +/nopcolor {pop pop pop} bind def + +/beginpage { % i j npages + /npages exch def + /j exch def + /i exch def + /str 10 string def + npages 1 gt { + gsave + coordfont setfont + 0 0 moveto + (\() show i str cvs show (,) show j str cvs show (\)) show + grestore + } if +} bind def + +/set_font { + findfont exch + scalefont setfont +} def + +% draw text fitted to its expected width +/alignedtext { % width text + /text exch def + /width exch def + gsave + width 0 gt { + [] 0 setdash + text stringwidth pop width exch sub text length div 0 text ashow + } if + grestore +} def + +/boxprim { % xcorner ycorner xsize ysize + 4 2 roll + moveto + 2 copy + exch 0 rlineto + 0 exch rlineto + pop neg 0 rlineto + closepath +} bind def + +/ellipse_path { + /ry exch def + /rx exch def + /y exch def + /x exch def + matrix currentmatrix + newpath + x y translate + rx ry scale + 0 0 1 0 360 arc + setmatrix +} bind def + +/endpage { showpage } bind def +/showpage { } def + +/layercolorseq + [ % layer color sequence - darkest to lightest + [0 0 0] + [.2 .8 .8] + [.4 .8 .8] + [.6 .8 .8] + [.8 .8 .8] + ] +def + +/layerlen layercolorseq length def + +/setlayer {/maxlayer exch def /curlayer exch def + layercolorseq curlayer 1 sub layerlen mod get + aload pop sethsbcolor + /nodecolor {nopcolor} def + /edgecolor {nopcolor} def + /graphcolor {nopcolor} def +} bind def + +/onlayer { curlayer ne {invis} if } def + +/onlayers { + /myupper exch def + /mylower exch def + curlayer mylower lt + curlayer myupper gt + or + {invis} if +} def + +/curlayer 0 def + +%%EndResource +%%EndProlog +%%BeginSetup +14 default-font-family set_font +1 setmiterlimit +% /arrowlength 10 def +% /arrowwidth 5 def + +% make sure pdfmark is harmless for PS-interpreters other than Distiller +/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse +% make '<<' and '>>' safe on PS Level 1 devices +/languagelevel where {pop languagelevel}{1} ifelse +2 lt { + userdict (<<) cvn ([) cvn load put + userdict (>>) cvn ([) cvn load put +} if + +%%EndSetup +setupLatin1 +%%Page: 1 1 +%%PageBoundingBox: 36 36 419 88 +%%PageOrientation: Portrait +0 0 1 beginpage +gsave +36 36 383 52 boxprim clip newpath +1 1 set_scale 0 rotate 40 40 translate +% 1 +gsave +1 setlinewidth +0 0 0 nodecolor +18 22 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +14.5 18.3 moveto 7 (1) alignedtext +grestore +% 2 +gsave +1 setlinewidth +0 0 0 nodecolor +97 22 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +93.5 18.3 moveto 7 (2) alignedtext +grestore +% 1->2 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 36.09 22 moveto +45.66 22 57.82 22 68.68 22 curveto +stroke +0 0 0 edgecolor +newpath 68.96 25.5 moveto +78.96 22 lineto +68.96 18.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 68.96 25.5 moveto +78.96 22 lineto +68.96 18.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +54 25.8 moveto 7 (a) alignedtext +grestore +% 3 +gsave +1 setlinewidth +0 0 0 nodecolor +173 22 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +169.5 18.3 moveto 7 (3) alignedtext +grestore +% 2->3 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 115.16 22 moveto +123.94 22 134.84 22 144.73 22 curveto +stroke +0 0 0 edgecolor +newpath 144.93 25.5 moveto +154.93 22 lineto +144.93 18.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 144.93 25.5 moveto +154.93 22 lineto +144.93 18.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +133 25.8 moveto 4 (.) alignedtext +grestore +% 4 +gsave +1 setlinewidth +0 0 0 nodecolor +261 22 27 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +257.5 18.3 moveto 7 (4) alignedtext +grestore +% 3->4 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 191.4 22 moveto +200.72 22 212.56 22 223.77 22 curveto +stroke +0 0 0 edgecolor +newpath 223.8 25.5 moveto +233.8 22 lineto +223.8 18.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 223.8 25.5 moveto +233.8 22 lineto +223.8 18.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +209 25.8 moveto 7 (b) alignedtext +grestore +% 5 +gsave +1 setlinewidth +0 0 0 nodecolor +353 22 18 18 ellipse_path stroke +1 setlinewidth +0 0 0 nodecolor +353 22 22 22 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +349.5 18.3 moveto 7 (5) alignedtext +grestore +% 4->5 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 288.03 22 moveto +298.23 22 310.03 22 320.69 22 curveto +stroke +0 0 0 edgecolor +newpath 320.87 25.5 moveto +330.87 22 lineto +320.87 18.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 320.87 25.5 moveto +330.87 22 lineto +320.87 18.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +306 25.8 moveto 7 (c) alignedtext +grestore +endpage +showpage +grestore +%%PageTrailer +%%EndPage: 1 +%%Trailer +%%Pages: 1 +%%BoundingBox: 36 36 419 88 +end +restore +%%EOF diff --git a/thesis2/graph23.dot b/thesis2/graph23.dot new file mode 100644 index 0000000..8a793b3 --- /dev/null +++ b/thesis2/graph23.dot @@ -0,0 +1,12 @@ +digraph graph23 { +rankdir=LR; +1,2,3,5,6 [shape="circle"]; +5 [shape="doublecircle"]; +1 -> 2 [label="a"]; +2 -> 3 [label="."]; +3 -> 4 [label="b"]; +4 -> 5 [label="c"]; + +2 -> 6 [label=","]; +6 -> 4 [label="b"]; +} diff --git a/thesis2/graph23.ps b/thesis2/graph23.ps new file mode 100644 index 0000000..8335da7 --- /dev/null +++ b/thesis2/graph23.ps @@ -0,0 +1,393 @@ +%!PS-Adobe-3.0 +%%Creator: graphviz version 2.38.0 (20140413.2041) +%%Title: graph23 +%%Pages: (atend) +%%BoundingBox: (atend) +%%EndComments +save +%%BeginProlog +/DotDict 200 dict def +DotDict begin + +/setupLatin1 { +mark +/EncodingVector 256 array def + EncodingVector 0 + +ISOLatin1Encoding 0 255 getinterval putinterval +EncodingVector 45 /hyphen put + +% Set up ISO Latin 1 character encoding +/starnetISO { + dup dup findfont dup length dict begin + { 1 index /FID ne { def }{ pop pop } ifelse + } forall + /Encoding EncodingVector def + currentdict end definefont +} def +/Times-Roman starnetISO def +/Times-Italic starnetISO def +/Times-Bold starnetISO def +/Times-BoldItalic starnetISO def +/Helvetica starnetISO def +/Helvetica-Oblique starnetISO def +/Helvetica-Bold starnetISO def +/Helvetica-BoldOblique starnetISO def +/Courier starnetISO def +/Courier-Oblique starnetISO def +/Courier-Bold starnetISO def +/Courier-BoldOblique starnetISO def +cleartomark +} bind def + +%%BeginResource: procset graphviz 0 0 +/coord-font-family /Times-Roman def +/default-font-family /Times-Roman def +/coordfont coord-font-family findfont 8 scalefont def + +/InvScaleFactor 1.0 def +/set_scale { + dup 1 exch div /InvScaleFactor exch def + scale +} bind def + +% styles +/solid { [] 0 setdash } bind def +/dashed { [9 InvScaleFactor mul dup ] 0 setdash } bind def +/dotted { [1 InvScaleFactor mul 6 InvScaleFactor mul] 0 setdash } bind def +/invis {/fill {newpath} def /stroke {newpath} def /show {pop newpath} def} bind def +/bold { 2 setlinewidth } bind def +/filled { } bind def +/unfilled { } bind def +/rounded { } bind def +/diagonals { } bind def +/tapered { } bind def + +% hooks for setting color +/nodecolor { sethsbcolor } bind def +/edgecolor { sethsbcolor } bind def +/graphcolor { sethsbcolor } bind def +/nopcolor {pop pop pop} bind def + +/beginpage { % i j npages + /npages exch def + /j exch def + /i exch def + /str 10 string def + npages 1 gt { + gsave + coordfont setfont + 0 0 moveto + (\() show i str cvs show (,) show j str cvs show (\)) show + grestore + } if +} bind def + +/set_font { + findfont exch + scalefont setfont +} def + +% draw text fitted to its expected width +/alignedtext { % width text + /text exch def + /width exch def + gsave + width 0 gt { + [] 0 setdash + text stringwidth pop width exch sub text length div 0 text ashow + } if + grestore +} def + +/boxprim { % xcorner ycorner xsize ysize + 4 2 roll + moveto + 2 copy + exch 0 rlineto + 0 exch rlineto + pop neg 0 rlineto + closepath +} bind def + +/ellipse_path { + /ry exch def + /rx exch def + /y exch def + /x exch def + matrix currentmatrix + newpath + x y translate + rx ry scale + 0 0 1 0 360 arc + setmatrix +} bind def + +/endpage { showpage } bind def +/showpage { } def + +/layercolorseq + [ % layer color sequence - darkest to lightest + [0 0 0] + [.2 .8 .8] + [.4 .8 .8] + [.6 .8 .8] + [.8 .8 .8] + ] +def + +/layerlen layercolorseq length def + +/setlayer {/maxlayer exch def /curlayer exch def + layercolorseq curlayer 1 sub layerlen mod get + aload pop sethsbcolor + /nodecolor {nopcolor} def + /edgecolor {nopcolor} def + /graphcolor {nopcolor} def +} bind def + +/onlayer { curlayer ne {invis} if } def + +/onlayers { + /myupper exch def + /mylower exch def + curlayer mylower lt + curlayer myupper gt + or + {invis} if +} def + +/curlayer 0 def + +%%EndResource +%%EndProlog +%%BeginSetup +14 default-font-family set_font +1 setmiterlimit +% /arrowlength 10 def +% /arrowwidth 5 def + +% make sure pdfmark is harmless for PS-interpreters other than Distiller +/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse +% make '<<' and '>>' safe on PS Level 1 devices +/languagelevel where {pop languagelevel}{1} ifelse +2 lt { + userdict (<<) cvn ([) cvn load put + userdict (>>) cvn ([) cvn load put +} if + +%%EndSetup +setupLatin1 +%%Page: 1 1 +%%PageBoundingBox: 36 36 419 134 +%%PageOrientation: Portrait +0 0 1 beginpage +gsave +36 36 383 98 boxprim clip newpath +1 1 set_scale 0 rotate 40 40 translate +% 1 +gsave +1 setlinewidth +0 0 0 nodecolor +18 47 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +14.5 43.3 moveto 7 (1) alignedtext +grestore +% 2 +gsave +1 setlinewidth +0 0 0 nodecolor +97 47 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +93.5 43.3 moveto 7 (2) alignedtext +grestore +% 1->2 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 36.09 47 moveto +45.66 47 57.82 47 68.68 47 curveto +stroke +0 0 0 edgecolor +newpath 68.96 50.5 moveto +78.96 47 lineto +68.96 43.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 68.96 50.5 moveto +78.96 47 lineto +68.96 43.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +54 50.8 moveto 7 (a) alignedtext +grestore +% 3 +gsave +1 setlinewidth +0 0 0 nodecolor +173 72 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +169.5 68.3 moveto 7 (3) alignedtext +grestore +% 2->3 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 114.42 52.55 moveto +123.72 55.69 135.55 59.69 146.07 63.24 curveto +stroke +0 0 0 edgecolor +newpath 145.03 66.58 moveto +155.62 66.47 lineto +147.27 59.95 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 145.03 66.58 moveto +155.62 66.47 lineto +147.27 59.95 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +133 62.8 moveto 4 (.) alignedtext +grestore +% 6 +gsave +1 setlinewidth +0 0 0 nodecolor +173 18 18 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +169.5 14.3 moveto 7 (6) alignedtext +grestore +% 2->6 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 114.06 40.71 moveto +123.56 36.98 135.81 32.18 146.6 27.95 curveto +stroke +0 0 0 edgecolor +newpath 147.94 31.19 moveto +155.97 24.28 lineto +145.39 24.67 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 147.94 31.19 moveto +155.97 24.28 lineto +145.39 24.67 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +133 35.8 moveto 4 (,) alignedtext +grestore +% 4 +gsave +1 setlinewidth +0 0 0 nodecolor +261 47 27 18 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +257.5 43.3 moveto 7 (4) alignedtext +grestore +% 3->4 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 190.58 67.18 moveto +200.65 64.25 213.87 60.41 226.07 56.86 curveto +stroke +0 0 0 edgecolor +newpath 227.36 60.13 moveto +235.99 53.98 lineto +225.41 53.41 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 227.36 60.13 moveto +235.99 53.98 lineto +225.41 53.41 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +209 64.8 moveto 7 (b) alignedtext +grestore +% 5 +gsave +1 setlinewidth +0 0 0 nodecolor +353 47 18 18 ellipse_path stroke +1 setlinewidth +0 0 0 nodecolor +353 47 22 22 ellipse_path stroke +0 0 0 nodecolor +14 /Times-Roman set_font +349.5 43.3 moveto 7 (5) alignedtext +grestore +% 6->4 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 190.18 23.45 moveto +200.47 26.92 214.17 31.54 226.71 35.77 curveto +stroke +0 0 0 edgecolor +newpath 225.83 39.17 moveto +236.43 39.05 lineto +228.07 32.54 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 225.83 39.17 moveto +236.43 39.05 lineto +228.07 32.54 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +209 34.8 moveto 7 (b) alignedtext +grestore +% 4->5 +gsave +1 setlinewidth +0 0 0 edgecolor +newpath 288.03 47 moveto +298.23 47 310.03 47 320.69 47 curveto +stroke +0 0 0 edgecolor +newpath 320.87 50.5 moveto +330.87 47 lineto +320.87 43.5 lineto +closepath fill +1 setlinewidth +solid +0 0 0 edgecolor +newpath 320.87 50.5 moveto +330.87 47 lineto +320.87 43.5 lineto +closepath stroke +0 0 0 edgecolor +14 /Times-Roman set_font +306 50.8 moveto 7 (c) alignedtext +grestore +endpage +showpage +grestore +%%PageTrailer +%%EndPage: 1 +%%Trailer +%%Pages: 1 +%%BoundingBox: 36 36 419 134 +end +restore +%%EOF diff --git a/thesis2/thesis.bbl b/thesis2/thesis.bbl new file mode 100644 index 0000000..f45b3a8 --- /dev/null +++ b/thesis2/thesis.bbl @@ -0,0 +1,19 @@ +\begin{thebibliography}{1} + +\bibitem{Daciuk2000} +Jan Daciuk, Stoyan Mihov, Bruce~W. Watson, and Richard~E. Watson. +\newblock {Incremental Construction of Minimal Acyclic Finite-State Automata}. +\newblock {\em Computational Linguistics}, 26(1):3--16, March 2000. + +\bibitem{Hopcroft1971} +John Hopcroft. +\newblock {An N log N algorithm for minimizing states in a finite automaton}. +\newblock Technical report, 1971. + +\bibitem{Roelofs2009} +Wouter Roelofs, Alessandro~Tadeo Paula, and Franc Grootjen. +\newblock {Programming by Clicking}. +\newblock In {\em Proceedings of the Dutch Information Retrieval Conference}, + pages 2--3, 2009. + +\end{thebibliography} diff --git a/thesis2/thesis.blg b/thesis2/thesis.blg new file mode 100644 index 0000000..3a83e1f --- /dev/null +++ b/thesis2/thesis.blg @@ -0,0 +1,48 @@ +This is BibTeX, Version 0.99d (TeX Live 2015/dev/Debian) +Capacity: max_strings=35307, hash_size=35307, hash_prime=30011 +The top-level auxiliary file: thesis.aux +The style file: plain.bst +Database file #1: thesis.bib +Warning--empty institution in Hopcroft1971 +You've used 3 entries, + 2118 wiz_defined-function locations, + 516 strings with 4464 characters, +and the built_in function-call counts, 993 in all, are: += -- 95 +> -- 48 +< -- 1 ++ -- 19 +- -- 16 +* -- 70 +:= -- 173 +add.period$ -- 9 +call.type$ -- 3 +change.case$ -- 18 +chr.to.int$ -- 0 +cite$ -- 4 +duplicate$ -- 38 +empty$ -- 75 +format.name$ -- 16 +if$ -- 202 +int.to.chr$ -- 0 +int.to.str$ -- 3 +missing$ -- 2 +newline$ -- 18 +num.names$ -- 6 +pop$ -- 18 +preamble$ -- 1 +purify$ -- 14 +quote$ -- 0 +skip$ -- 26 +stack$ -- 0 +substring$ -- 47 +swap$ -- 8 +text.length$ -- 1 +text.prefix$ -- 0 +top$ -- 0 +type$ -- 12 +warning$ -- 1 +while$ -- 11 +width$ -- 4 +write$ -- 34 +(There was 1 warning) diff --git a/thesis2/thesis.dvi b/thesis2/thesis.dvi new file mode 100644 index 0000000..3edb64b Binary files /dev/null and b/thesis2/thesis.dvi differ diff --git a/thesis2/version/mart_thesis_0.1.tar.gz b/thesis2/version/mart_thesis_0.1.tar.gz new file mode 100644 index 0000000..7a374a2 Binary files /dev/null and b/thesis2/version/mart_thesis_0.1.tar.gz differ diff --git a/thesis2/version/mart_thesis_0.1a.tar.gz b/thesis2/version/mart_thesis_0.1a.tar.gz new file mode 100644 index 0000000..9df6499 Binary files /dev/null and b/thesis2/version/mart_thesis_0.1a.tar.gz differ