final

[bsc-thesis1415.git] / thesis2 / 2.requirementsanddesign.tex
diff --git a/thesis2/2.requirementsanddesign.tex b/thesis2/2.requirementsanddesign.tex

index 680a18b..9f31b8d 100644 (file)
--- a/thesis2/2.requirementsanddesign.tex
+++ b/thesis2/2.requirementsanddesign.tex
@@ -31,11 +31,11 @@ explanation is also provided.
                 \end{itemize}
         \item[I2:] Apply low level matching techniques on isolated data.
         \item[I3:] Insert data in the database.
-       \item[I4:] The system should have an user interface to train crawlers that is
-               usable someone without a particular computer science background.
-       \item[I5:] The system should be able to report to the user or
-               maintainer when a source has been changed too much for
-               successful crawling.
+       \item[I4:] The system should have a user interface to train crawlers
+               that is usable by someone without a particular computer science
+               background.
+       \item[I5:] The system should be able to report to the employee when a
+               source has been changed too much for successful crawling.
  \end{itemize}
  
  \subsubsection{Definitive functional requirements}
@@ -51,21 +51,21 @@ definitive requirements.
                 requirements I1a-I1d. We limited the source types to crawl to
                 strict RSS because of the time constraints of the project. Most
                 sources require an entirely different strategy and therefore we
-               could not easily combine them. an explanation why we chose RSS
+               could not easily combine them. An explanation why we chose RSS
                 feeds can be found in Section~\ref{sec:whyrss}.
  
         \item[F2:] Export the data to a strict XML feed.
  
                 This requirement is an adapted version of requirement I3, this
-               is also done to limit the scope. We chose to no interact
+               is also done to limit the scope. We chose to not interact
                 directly with the database or the \textit{Temporum}. The
                 application however is able to output XML data that is
                 formatted following a string XSD scheme so that it is easy to
                 import the data in the database or \textit{Temporum} in a
                 indirect way.
         \item[F3:] The system should have a user interface to create crawlers
-               that is usable someone without a particular computer science
-               background.  science people.
+               that is usabl for someone without a particular computer science
+               background.
  
                 This requirement is formed from I4. Initially the user
                 interface for adding and training crawlers was done via a
@@ -124,10 +124,9 @@ steps. The overview of the application is visible in Figure~\ref{appoverview}.
  The nodes are applications or processing steps and the arrows denote
  information flow or movement between nodes.
  \begin{figure}[H]
-       \label{appoverview}
         \centering
         \includegraphics[width=\linewidth]{appoverview.pdf}
-       \caption{Overview of the application}
+       \caption{Overview of the application\label{appoverview}}
  \end{figure}
  
  \subsection{Frontend}
@@ -136,20 +135,19 @@ The frontend is a web interface that is connected to the backend system which
  allows the user to interact with the backend. The frontend consists of a basic
  graphical user interface which is shown in Figure~\ref{frontendfront}. As the
  interface shows, there are three main components that the user can use.  There
-is also an button for downloading the XML. The \textit{Get xml} button is a
-quick shortcut to make the backend to generate XML. The button for grabbing the
+is also a button for downloading the XML\@. The \textit{Get xml} button is a
+quick shortcut to make the backend to generate XML\@. The button for grabbing the
  XML data is only for diagnostic purposes located there. In the standard
  workflow the XML button is not used. In the standard workflow the server
  periodically calls the XML output option from the command line interface of the
  backend to process it.
  
  \begin{figure}[H]
-       \label{frontendfront}
         \includegraphics[width=\linewidth]{frontendfront.pdf}
-       \caption{The landing page of the frontend}
+       \caption{The landing page of the frontend\label{frontendfront}}
  \end{figure}
  
-\subsubsection{Edit/Remove crawler}
+\subsubsection{Repair/Remove crawler}
  This component lets the user view the crawlers and remove the crawlers from the
  crawler database. Doing one of these things with a crawler is as simple as
  selecting the crawler from the dropdown menu and selecting the operation from
@@ -169,7 +167,7 @@ The addition or generation of crawlers is the key feature of the program and it
  is the intelligent part of the system since it includes the graph optimization
  algorithm to recognize user specified patterns in the new data. First, the user
  must fill in the static form that is visible on top of the page. This for
-contains general information about the venue together with some crawler
+example contains general information about the venue together with some crawler
  specific values such as crawling frequency. After that the user can mark
  certain points in the table as being of a category. Marking text is as easy as
  selecting the text and pressing the according button. The text visible in the
@@ -184,11 +182,10 @@ processed. The internals of what happens after submitting is explained in
  detail in Figure~\ref{appinternals} together with the text.
  
  \begin{figure}[H]
-       \label{frontendfront}
         \centering
         \includegraphics[width=\linewidth]{crawlerpattern.pdf}
         \caption{A view of the interface for specifying the pattern. Two %
-entries are already marked.}
+entries are already marked.\label{frontendfront}}
  \end{figure}
  
  \subsubsection{Test crawler}
@@ -201,15 +198,15 @@ and most importantly the results itself. In this way the user can see in a few
  gazes if the crawler functions properly. Humans are very fast in detecting
  patterns and therefore the error checking goes very fast. Because the log of
  the crawl operation is shown this page can also be used for diagnostic
-information about the backends crawling system. The logging is pretty in depth
-and also shows possible exceptions and is therefore also usable for the
-developers to diagnose problems.
+information about the backends crawling system. The logging is in depth and
+also shows possible exceptions and is therefore also usable for the developers
+to diagnose problems.
  
  \subsection{Backend}
  \subsubsection{Program description}
  The backend consists of a main module and a set of libraries all written in
-\textit{Python}\cite{Python}. The main module can, and is, be embedded in an
-apache HTTP-server\cite{apache} via the \textit{mod\_python} apache
+\textit{Python}\cite{Python}. The main module is embedded in an apache
+HTTP-server\cite{apache} via the \textit{mod\_python} apache
  module\cite{Modpython}. The module \textit{mod\_python} allows handling for
  python code via HTTP and this allows us to integrate neatly with the
  \textit{Python} libraries. We chose \textit{Python} because of the rich set of
@@ -264,8 +261,8 @@ languages have an XML parser built in and therefore it is a very versatile
  format that makes the eventual import to the database very easy. The most used
  languages also include XSD validation to detect XML errors, validity and
  completeness of XML files. This makes interfacing with the database and
-possible future programs even more easily. The XSD scheme used for this
-programs output can be found in the appendices in Listing~\ref{scheme.xsd}. The
+possible future programs even more easy. The XSD scheme used for this
+programs output can be found in the appendices in Algorithm~\ref{scheme.xsd}. The
  XML output can be queried via the HTTP interface that calls the crawler backend
-to crunch the latest crawled data into XML. It can also be acquired directly
+to crunch the latest crawled data into XML\@. It can also be acquired directly
  from the crawlers command line interface.