almost all information required is available directly from the entry.
\begin{flushleft}
- \texttt{2015-05-20, 18:00-23:00 - \textit{Foobar} presenting their new%
-CD in combination with a show. Location: small salon.}
+ \texttt{2015-05-20, 18:00-23:00 - \textit{Foobar} presenting their %
+new CD in combination with a show. Location: small salon.}
\end{flushleft}
An example of a terrible item could be for example the following text that
When the source has been determined and classified the next step is
periodically crawling the source. At the moment the crawling happens using two
main methods.\\
-\textbf{Manual crawling:} Manual crawling is basically letting an employee
-access the source and put the information directly in the database. This often
-happens with non digital sources and with very sudden events or event changes
-such as surprise concerts or event cancellation.\\
-\textbf{Automatic crawling:} Some sites are very structured and a programmer
-can create a program that can visit the website systematically and
-automatically to extract all the new information. Not all digital sources are
-suitable to be crawled automatically and will still need manual crawling. The
-programmed crawlers are always specifically created for one or a couple sources
-and when the source changes for example structure the programmer has to adapt
-the crawler which is costly. Information from the all the crawlers goes first
-to the \textit{Temporum}.
+\textbf{Manual crawling:}\\
+Manual crawling is basically letting an employee access the source and put the
+information directly in the database. This often happens with non digital
+sources and with very sudden events or event changes such as surprise concerts
+or event cancellation.\\
+\textbf{Automatic crawling:}\\
+Some sites are very structured and a programmer can create a program that can
+visit the website systematically and automatically to extract all the new
+information. Not all digital sources are suitable to be crawled automatically
+and will still need manual crawling. The programmed crawlers are always
+specifically created for one or a couple sources and when the source changes
+for example structure the programmer has to adapt the crawler which is costly.
+Information from the all the crawlers goes first to the \textit{Temporum}.
\subsection*{Temporum}
The \textit{Temporum} is a big bin that contains raw data extracted from
\begin{figure}[H]
\label{feedbackloop}
\centering
- \includegraphics[scale=0.5]{feedbackloop.eps}
+ \includegraphics[width=0.8\linewidth]{feedbackloop.eps}
\strut\\\strut\\
\caption{Feedback loop for malfunctioning crawlers}
\end{figure}
-
+\strut\\
The goal of this project is specifically to relieve the programmer of repairing
crawlers all the time and make the task of adapting, editing and removing
crawlers doable for someone without programming experience. In practice this
Every entry gotten from the previous step is going to be processing into so
called node-lists. A node-list can be seen as a path graph where every
character and marking has a node. A path graph $G$ is defined as
-$G=(V,n_1,E,n_i)$ where $V=\{n_1, n_2, \cdots, n_{i-1}, n_i\}$ and $E=\{(n_1,
-n_2), (n_2, n_3), ... (n_{i-1}, n_{i})\}$. A path graph is basically a graph
-that is a single linear path of nodes where every node is connected to the next
-node except for the last one. The last node is the only final node. The
-transitions between two nodes is either a character or a marking. As an example
-we take the entry \texttt{19:00, 2014-11-12 - Foobar} and create the
+$G=(V,n_1,E,n_i)$ where $V=\{n_1, n_2, \cdots, n_{i-1}, n_i\}$ and
+$E=\{(n_1, n_2), (n_2, n_3), \ldots\\ (n_{i-1}, n_{i})\}$. A path graph is basically
+a graph that is a single linear path of nodes where every node is connected to
+the next node except for the last one. The last node is the only final node.
+The transitions between two nodes is either a character or a marking. As an
+example we take the entry \texttt{19:00, 2014-11-12 - Foobar} and create the
corresponding node-lists and it is shown in Figure~\ref{nodelistexample}.
Characters are denoted with single quotes, spaces with an underscore and
markers with angle brackets. Node-lists are the basic elements from which the
\caption{Example non determinism}
\end{figure}
-\subsection{Minimality and non-determinism}
+\subsection{Minimality \& non-determinism}
The Myhill-Nerode theorem~\cite{Hopcroft1979} states that for every number of
graphs accepting the same language there is a single graph with the least
amount of states. Mihov\cite{Mihov1998} has proven that the algorithm for
-\section{scheme.xsd}
+\section{XSD schema}
\lstinputlisting[language=XML,label={scheme.xsd},caption={XSD scheme for XML%
output}]{scheme.xsd}
\documentclass[twopage,titlepage]{book}
-\usepackage{algorithm2e} % Pseudocode
-\usepackage{a4wide} % Paper size
-\usepackage{graphicx} % Eps inclusion
-\usepackage{float} % Floating placement of figures
-\usepackage{listings} % Source code formatting
-\usepackage{setspace} % Line spacing abstract
-\usepackage[dvipdfmx,hidelinks]{hyperref} % Hyperlinks
-\usepackage{amssymb} % nexists and much more
-\usepackage{amsmath} % Rightarrow and much more
-\usepackage{marvosym} % For euro sign
+\usepackage{algorithm2e} % Pseudocode
+\usepackage{a4wide} % Paper size
+\usepackage{graphicx} % Eps inclusion
+\usepackage{float} % Floating placement of figures
+\usepackage{listings} % Source code formatting
+\usepackage{setspace} % Line spacing abstract
+\usepackage[dvipdfmx]{hyperref} % Hyperlinks
+\usepackage{amssymb} % nexists and much more
+\usepackage{amsmath} % Rightarrow and much more
+\usepackage{marvosym} % For euro sign
\lstset{%
basicstyle=\footnotesize,
\hypersetup{
pdftitle={\cvartitle},
pdfauthor={Mart Lubbers},
- pdfsubject={Artificial Intelligence}
+ pdfsubject={Artificial Intelligence},
+ hidelinks
}
% Describe the frontpage
\author{
Mart Lubbers\\
s4109053\\
- Artificial Intelligence\\
Radboud University Nijmegen\\
\strut\\
- External supervisor: Alessandro Paula\\
- Internal supervisor: Franc Grootjen
+ Alessandro Paula\footnote{External supervisor}\\
+ Hyperleap, Nijmegen\\
+ \strut\\
+ Franc Grootjen\footnote{Internal supervisor}\\
+ Artificial Intelligence, Nijmegen\\
+ Radboud University Nijmegen
}
\title{\cvartitle}
\date{\today}
\chapter{Introduction}
\input{1.introduction.tex}
-\chapter{Requirements and design}
+\chapter{Requirements \& Application design}
\input{2.requirementsanddesign.tex}
\chapter{Algorithm}
\chapter{Appendices}
\input{5.appendices.tex}
-\bibliographystyle{ieeetr}
+\bibliographystyle{plain}
\bibliography{thesis}
\end{document}