From: Mart Lubbers Date: Thu, 1 May 2014 08:57:09 +0000 (+0200) Subject: intro started X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=a0e3ccdc3e6bb65740c1c79d5f5899f14f5780e2;p=bsc-thesis1415.git intro started --- diff --git a/thesis/abstract.tex b/thesis/abstract.tex index 5cdb1a4..7d34d4a 100644 --- a/thesis/abstract.tex +++ b/thesis/abstract.tex @@ -1,4 +1,3 @@ \begin{center} \textbf{Abstract}\\ \end{center} -\lipsum[1] diff --git a/thesis/introduction.tex b/thesis/introduction.tex index f6276be..c6c3419 100644 --- a/thesis/introduction.tex +++ b/thesis/introduction.tex @@ -1 +1,35 @@ -\lipsum[1] +\section{Introduction} +Within the entertainment business there is no consistent style of informing +people about the events. Different venues display their, often incomplete, +information in entirely different ways. Because of this, converting raw +information from venues to structured consistent data is a relevant problem. + +\section{HyperLeap} +Hyperleap is a small company that is specialized in infotainment +(information+entertainment) and administrates several websites which bundle +information about entertainment in a ordered and complete way. Right now, most +of the data input is done by hand and takes a lot of time to type in. + +\section{Research question} +The main research question is: \textit{How can we make an adaptive, autonomous +and programmable data mining program that can be set up by a non IT +professional which is able to transform raw data into structured data.}\\ + +The practical goal and aim of the project is to make a crawler(web or other +document types) that can autonomously gather information after it has been +setup by a, not necessarily IT trained, employer via an intuitive interface. +Optionally the crawler shouldn't be susceptible by small structure changes in +the website, be able to handle advanced website display techniques such as +javascript and should be able to notify the administrator when the site has +become uncrawlable and the crawler needs to be reprogrammed for that particular +site. But the main purpose is the translation from raw data to structured data. +The projects is in principle a continuation of a past project done by Wouter +Roelofs\cite{Roelofs2009} which was also supervised by Franc Grootjen and +Alessandro Paula, however it was never taken out of the experimental phase and +therefore is in need continuation. + +\section{Scientific relevance} +Currently the techniques for conversion from non structured data to structured +data are static and mainly only usable by IT specialists. There is a great need +of data mining in non structured data because the data within companies and on +the internet is piling up and are usually left to catch dust.