thesis/introduction.tex

   1 \section{Introduction}
   2 Within the entertainment business there is no consistent style of informing
   3 people about the events. Different venues display their, often incomplete,
   4 information in entirely different ways. Because of this, converting raw
   5 information from venues to structured consistent data is a relevant problem.
   6
   7 \section{HyperLeap}
   8 Hyperleap is a small company that is specialized in infotainment
   9 (information+entertainment) and administrates several websites which bundle
  10 information about entertainment in a ordered and complete way.  Right now, most
  11 of the data input is done by hand and takes a lot of time to type in.
  12
  13 \section{Research question}
  14 The main research question is: \textit{How can we make an adaptive, autonomous
  15 and programmable data mining program that can be set up by a non IT
  16 professional which is able to transform raw data into structured data.}\\
  17
  18 The practical goal and aim of the project is to make a crawler(web or other
  19 document types) that can autonomously gather information after it has been
  20 setup by a, not necessarily IT trained, employer via an intuitive interface.
  21 Optionally the crawler shouldn't be susceptible by small structure changes in
  22 the website, be able to handle advanced website display techniques such as
  23 javascript and should be able to notify the administrator when the site has
  24 become uncrawlable and the crawler needs to be reprogrammed for that particular
  25 site. But the main purpose is the translation from raw data to structured data.
  26 The projects is in principle a continuation of a past project done by Wouter
  27 Roelofs\cite{Roelofs2009} which was also supervised by Franc Grootjen and
  28 Alessandro Paula, however it was never taken out of the experimental phase and
  29 therefore is in need continuation.
  30
  31 \section{Scientific relevance}
  32 Currently the techniques for conversion from non structured data to structured
  33 data are static and mainly only usable by IT specialists. There is a great need
  34 of data mining in non structured data because the data within companies and on
  35 the internet is piling up and are usually left to catch dust.