From: Mart Lubbers Date: Wed, 27 May 2015 10:28:09 +0000 (+0200) Subject: typos in 1 and 2 fixed X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=a60fd1636d31bc7637e6d38556b7da0af2b596eb;p=bsc-thesis1415.git typos in 1 and 2 fixed --- diff --git a/thesis2/1.introduction.tex b/thesis2/1.introduction.tex index 895de35..e28a1e4 100644 --- a/thesis2/1.introduction.tex +++ b/thesis2/1.introduction.tex @@ -202,8 +202,8 @@ more. \section{Goal \& Research question} Maintaining the automated crawlers and the infrastructure that provides the -\textit{Temporum} and its matching aid automization are the parts within the -dataflow that require the most amount of resources. Both of these parts require +\textit{Temporum} and its matching aid automation are the parts within the +data flow that require the most amount of resources. Both of these parts require a programmer to execute and therefore are costly. In the case of the automated crawlers it requires a programmer because the crawlers are scripts or programs created are website-specific. Changing such a script or program requires @@ -222,7 +222,7 @@ detected and the programmers have to be contacted. Finally the crawler will be adapted to the new structure and will produce good data again. This feedback loop, shown in Figure~\ref{feedbackloop}, can take days and can be the reason for gaps and faulty information in the database. The figure shows information -flow with arrows. The solid and dotted lines form the current feedbackloop. +flow with arrows. The solid and dotted lines form the current feedback loop. \begin{figure}[H] \label{feedbackloop} \centering @@ -234,8 +234,8 @@ The specific goal of this project is to relieve the programmer of spending a lot of time repairing crawlers and make the task of adapting, editing and removing crawlers feasible for someone without programming experience. In practice this -means shortening the feedbackloop. The shorter feedback loop is also shown in -Figure~\ref{feedbackloop}. The dashed line shows the shorter feedbackloop that +means shortening the feedback loop. The shorter feedback loop is also shown in +Figure~\ref{feedbackloop}. The dashed line shows the shorter feedback loop that relieves the programmer. For this project a system has been developed that provides an @@ -380,8 +380,8 @@ marking. Graph $G$ accepts the words \textit{a,ab} and to simplify the graph node $n2$ and $n3$ are final. Finally $v_0$ describes the initial node, this is visualized in figures as an incoming arrow. Because of the property of labeled edges, data can be stored in a DAWG. When traversing a DAWG and saving all the -edgelabels one can construct words. Using graph minimalization big sets of -words can be stored using a small amouth of storage because edges can be +edge labels one can construct words. Using graph minimisation big sets of +words can be stored using a small amount of storage because edges can be re-used to specify transitions. For example the graph in Figure~\ref{exampledawg} can describe the language $L$ where all words $w$ that are accepted $w\in\{abd, bad, bae\}$. Testing if a word is present in the DAWG diff --git a/thesis2/2.requirementsanddesign.tex b/thesis2/2.requirementsanddesign.tex index a69ecdc..680a18b 100644 --- a/thesis2/2.requirementsanddesign.tex +++ b/thesis2/2.requirementsanddesign.tex @@ -57,7 +57,7 @@ definitive requirements. \item[F2:] Export the data to a strict XML feed. This requirement is an adapted version of requirement I3, this - is als done to limit the scope. We chose to no interact + is also done to limit the scope. We chose to no interact directly with the database or the \textit{Temporum}. The application however is able to output XML data that is formatted following a string XSD scheme so that it is easy to @@ -69,7 +69,7 @@ definitive requirements. This requirement is formed from I4. Initially the user interface for adding and training crawlers was done via a - webinterface that was user friendly and usable by someone + web interface that was user friendly and usable by someone without a particular computer science background as the requirement stated. However in the first prototypes the control center that could test, edit and remove crawlers was a command @@ -86,7 +86,7 @@ definitive requirements. this can be due to any reason, a message is sent to the people using the program so that they can edit or remove the faulty crawler. Updating without the need of a programmer is essential - in shortening the feedbackloop explained in + in shortening the feedback loop explained in Figure~\ref{feedbackloop}. \end{itemize} @@ -108,7 +108,7 @@ definitive requirements. extensions are discussed in Section~\ref{sec:discuss}. \item[N2:] Operate standalone on a server. - Non-functional requirement N1 is dropped because we want to + Non-functional requirement O1 is dropped because we want to keep the program as modular as possible and via an XML interface we still have a very intimate connection with the database without having to maintain a direct connection. The diff --git a/thesis2/abstract.tex b/thesis2/abstract.tex index 2cc2d04..9af70fe 100644 --- a/thesis2/abstract.tex +++ b/thesis2/abstract.tex @@ -1,15 +1,14 @@ When looking for an activity in a bar or trying to find a good movie to watch -it often seems difficult to find complete information about the event without -empty or wrong data. Hyperleap tries to solve problem of bad information -giving by bundling the information from various sources and invest in good -quality checking. Currently information retrievel is performed using -site-specific crawlers, when a crawler breaks the feedback loop for fixing the -it contains of different steps and requires someone with a computer science -background. A crawler generation system has been created that uses directed -acyclic word graphs that assist solving the feedback loop problem. The system -allows users with no particular computer science background to create, edit and -test crawlers for \textit{RSS} feeds. In this way the feedback loop for broken -crawlers is shortened, new sources can be incorporated in the database quicker -and, most importantly, the information about the latest movie show, theater -production or conference will reach the people looking for it as fast as -possible. +it often seems difficult to find complete and correct information about the +event. Hyperleap tries to solve problem of bad information giving by bundling +the information from various sources and invest in good quality checking. +Currently information retrieval is performed using site-specific crawlers, when +a crawler breaks the feedback loop for fixing the it contains of different +steps and requires someone with a computer science background. A crawler +generation system has been created that uses directed acyclic word graphs to +assist solving the feedback loop problem. The system allows users with no +particular computer science background to create, edit and test crawlers for +\textit{RSS} feeds. In this way the feedback loop for broken crawlers is +shortened, new sources can be incorporated in the database quicker and, most +importantly, the information about the latest movie show, theater production or +conference will reach the people looking for it as fast as possible.