-\section{Hyperleap and their methods}
-Hyperleap\footnote{\url{http://hyperleap.nl}} is a small company founded in the
-early years of the internet and is located in Nijmegen. Hyperleap is a company
-that presents \textit{infotainment}. \textit{Infotainment} is a concatenation
-of the words \textit{entertainment} and \textit{information} and is a
-specialized form of information, namely about the entertainment industry.
-Hyperleap manages the largest database containing \textit{infotainment}
-containing over $10.000$ events per week on average. It also manages a database
-containing over $54.000$ venues delivering the entertainment. Next to the
-factual information, Hyperleap also provides reviews, previews, background
-information and more via several popular websites specialized on genres or
-categories.
-
-Hyperleap stands compared to other \textit{infotainment} providers because of
-the quality and completeness of the data is comparatively high. This is because
-all information is checked and matched to existing information before it enters
-the database. To ensure the quality of the databases all information enters the
-database in roughly two steps.
-
-In the first step the information is extracted from the raw data sources using
-crawlers or via venue channels. Crawlers are specialized applications that are
-programmed to extract information from one single source. Venue channels are
- specially made XML feeds that contain already very structured information.
-The extracted information is put in the so called \textit{Temporum}, the
-\textit{Temporum} is a stopping place for the gathered information before it is
-entered in the real database.
-
-The second step in the path of the information is the matching of the data.
-This step is the actual quality checking and matching. Using several
-techniques, employees have to match the incoming information to existing events
-or create new events. This is also a safety net for malfunctioning crawlers,
-when a crawler provides wrong information in the \textit{Temporum} the
-programmer of the crawler has to be informed then. A large amount of the time
-the programmers are busy with repairing crawlers because it is a specialized
-task only doable by people with a computer science background. Because of this
-it is expensive to repair the crawlers.
+\section{Introduction}
+What do people do when they want to grab a movie? Attend a concert? Find out
+which theater shows play in their town theater?
+
+In the early days of the internet information about entertainment was gathered
+from flyers, books, posters, radio/tv advertisements. People had to look pretty
+hard for the information and you could easily miss a show just because it
+didn't cross paths with you. When the internet grew to what it is now we would
+think that missing an event is impossible because of the loads of information
+that you receive every day. The opposite is true.
+
+Nowadays information about entertainment is offered via two main channels on
+the internet namely individual venues and combined websites.
+
+Individual venues put a lot of effort and resources in building a beautiful,
+fast and most of all modern website that bundles their information with nice
+graphics, animations and gimmicks. There also exist companies that bundle the
+information from different websites. Because the information that is bundled
+ofter comes from the individual websites the information is most of the time
+not complete. Individual organisations tend to think it is obvious what the
+address of their venue is, that their ticket price is always fixed to
+\EURdig$5.-$ and that you need a membership to attend the events. Individual
+organizations usually put this in a disclaimer or another page.
+
+Combined websites want to bundle this information, for every event they want
+all the details and information for an event. This shows to be a hard task
+because these websites don't have the resources and time to combine the
+different sources to get a good and complete information overview of an event.
+Because of this, there are not many websites that bundle entertainment
+information so that the entire database is complete and consistent.
+Hyperleap\footnote{\url{http://hyperleap.nl}} tries to achieve this goal.
+
+\section{Hyperleap \& Infotainment}
+Hyperleap is a internet company that existed in the time that internet wasn't
+widespread. Hyperleap, active since 1993, is specialized in producing,
+publishing and maintaining \textit{infotainment}. \textit{Infotainment} is a
+combination of the words \textit{information} and \textit{entertainment} and it
+means complete information about entertainment in the broadest sense of
+entertainment. Entertainment can be a theater show, move showing in the cinema,
+the weekly bridge night in the local town center, music concerts etc. Hyperleap
+Hyperleap manages the largest database containing \textit{infotainment}. The
+database contains over $10.000$ events per week on average and their venue
+database contains over $54.000$ venues delivering the entertainment. All the
+information is quality checked and therefore very reliable. Hyperleap is the
+only in its kind that has such high quality information. The
+\textit{infotainment} is presented via several websites specialized per
+genre or category and is bundled with other kinds of non factual information
+such as reviews, previews, background information and interviews.
+
+As said before Hyperleap is the only in its kind with the high quality data.
+This is because a lot of time and resources are spend to crosscompare, match
+and check the data that enters the database. To achieve this the data is
+inserted in the database in several different steps described in
+Figure~\ref{fig:1.1.1}
+
+\begin{figure}[H]
+ \caption{Information flow Hyperleap database}
+ \label{fig:1.1.1}
+ \centering
+ \scalebox{0.8}{
+ \digraph[]{graph111}{
+ rankdir=TB;
+ node [shape="rectangle",fontsize=10,nodesep=0.5,ranksep=0.75,width=1]
+ edge [weight=5.]
+ i0 [label="Website"]
+ i1 [label="Email"]
+ i2 [label="Fax"]
+ i3 [label="RSS/Atom"]
+ p1 [label="Crawler: Preproccessing"]
+ p2 [label="Temporum: Postproccesing"]
+ o1 [label="Database: Insertion"]
+ o2 [label="TheAgenda"]
+ o3 [label="BiosAgenda"]
+ o4 [label="..."]
+ p1, p2, o1 [width=5];
+ i0 -> p1
+ i1 -> p1
+ i2 -> p1
+ i3 -> p1
+ p1 -> p2
+ p2 -> o1
+ o1 -> o2
+ o1 -> o3
+ o1 -> o4
+ }
+ }
+\end{figure}
+
+The first step in the information flow is the source. Hyperleap processes
+different sources. The input sources vary by type, for example website or fax,
+and by source. The sources vary in reliability a lot. For example private
+information streams from venues are very reliable whereas other combined
+websites are not reliable at all. Sources also vary in structural consistency,
+websites from venues often look very consistent but the entries are usually
+hand typed by employees and key information appears often on random places
+surrounded by a lot of text. Ticket vendors on the other hand present their
+information usually in a structured consistent way. Depending on the amount of
+consistency and structure preprocessing happens in step two. All the
+preprocessed data is then sent to the \textit{Temporum}.
+
+The \textit{Temporum} is a big bin that contains raw data extracted from
+different sources and has to be post processed to be suitable enough for the
+actual database. This post processing encompasses several possible tasks. The
+first task is to check the validity of the entry. The second step is matching
+the entry, entries have to be matched to a venue or organisation in the events
+database. Entries also can be matched to existing events that belong to the
+same tour or series. These two steps are have a lot of aspects that are and can
+be done automatically but a lot of user input is still required to match and
+check the data. The \textit{Temporum} functions as safety net for the data.
+
+When the data is post processed it is entered in the final database. The
+database contains all the events that happened in the past and all the events
+that are going to happen. The database is linked to several categorical
+websites that offer the information to users and accompany it with the non
+factual information discussed earlier.