From bdbd62f6b23f5eb7e81948f1bd955c60f4cb9a0d Mon Sep 17 00:00:00 2001 From: Mart Lubbers Date: Sun, 8 Mar 2015 12:37:38 +0100 Subject: [PATCH] up --- thesis2/1.introduction.tex | 89 +++++++++++++++++++------------------- thesis2/Makefile | 42 +++++++++--------- thesis2/abstract.tex | 18 ++++---- 3 files changed, 75 insertions(+), 74 deletions(-) diff --git a/thesis2/1.introduction.tex b/thesis2/1.introduction.tex index 74efdbc..c9b99f6 100644 --- a/thesis2/1.introduction.tex +++ b/thesis2/1.introduction.tex @@ -1,69 +1,70 @@ \section{Introduction} What do people do when they want to grab a movie? Attend a concert? Find out -which theater shows play in their town theater? +which theater shows play in local theater? When the internet was in its early days and it started to be accessible to most of the people information about entertainment was still obtained almost -exclusively from flyers, books, posters, radio/tv advertisements. People had to -look pretty hard for the information and you could easily miss a show just -because it didn't cross paths with you. -Today the internet is used by almost everyone in the westen society on a daily +exclusively from flyers, books, posters, radio/TV advertisements. People had to +look hard for information and you could easily miss a show just because you did +not cross paths with it. +Today the internet is used by almost everyone in the western society on a daily basis and we would think that missing an event would be impossible because of -the loads of information you receive every day. The opposite is true. +the enormous loads of information you can receive every day using the internet. +The opposite is true for information about leisure activities. -Nowadays information about entertainment is offered via two main channels on -the internet namely individual venues and combined websites. +Nowadays information on the internet about entertainment is offered via two +main channels: individual venues websites and information bundling websites. Individual venues put a lot of effort and resources in building a beautiful, fast and most of all modern website that bundles their information with nice graphics, animations and gimmicks. There also exist companies that bundle the -information from different websites. Information bundling websites often have -the individual venue website as the source for their information and therefore -the information is most of the time not complete. -Individual organisations tend to think, for example, that it is obvious what -the address of their venue is, that their ticket price is always fixed to -\EURdig$5.-$ and that you need a membership to attend the events. Individual -organizations usually put this non specific information in a disclaimer or a -separate page and information bundling website miss out on these types of -information a lot. +information from different websites to provide an overview. Information +bundling websites often have multiple individual venue websites as the source +for their information and therefore the information is most of the time not +complete. This is because individual venues tend to think, for example, that it +is obvious what the address of their venue is, that their ticket price is +always fixed to \EURdig$5.-$ and that you need a membership to attend the +events. Individual organizations usually put this non specific information in a +disclaimer or a separate page and information bundling website miss out on +these types of information a lot. They miss out because they can crawl these +individual events but gathering the miscellaneous information is usually done +by hand. Combining the information from the different data source turns out to be a hard task for such information bundling websites. It is a hard task because information bundling websites do not have the resources and time reserved for -these tasks and therefore often also serve incomplete information. Because of -the complexity of complete information there are not many websites trying to -bundle entertainment information into a complete and consistent databese. -Hyperleap\footnote{\url{http://hyperleap.nl}} tries to achieve goal of serving -complete and consistent information. +these tasks and therefore often serve incomplete information. Because of +the complexity of getting complete information there are not many websites +trying to bundle entertainment information into a complete and consistent +database. Hyperleap\footnote{\url{http://hyperleap.nl}} tries to achieve goal +of serving complete and consistent information. \section{Hyperleap \& Infotainment} -Hyperleap is a internet company that existed in the time that internet was not +Hyperleap is an internet company that existed in the time that internet was not widespread. Hyperleap, active since 1995, is specialized in producing, publishing and maintaining \textit{infotainment}. \textit{Infotainment} is a combination of the words \textit{information} and \textit{entertainment}. It -means a combination of factual information and subjectual information -(entertainment) within a certain category. In the case of Hyperleap the -category is the entertainment industry, entertainment industry encompasses all -facets of entertainment going from cinemas, theaters, concerts to swimming -pools, bridge matches and conferences. Within the entertainment industry -factual information includes, but is not limited to, information such as -starting time, location, host or venue and location. Subjectual information -includes, but is not limited to, things such as reviews, previews, photos and -background information or trivia. +represents a combination of factual information and subjectual information +(entertainment) within a certain category or field. In the case of Hyperleap +the category is the leisure industry, leisur industry encompasses all facets of +entertainment going from cinemas, theaters, concerts to swimming pools, bridge +competitions and conferences. Within the entertainment industry factual +information includes, but is not limited to, information such as starting time, +location, host or venue and duration. Subjectual information includes, but is +not limited to, information such as reviews, previews, photos, background +information and trivia. Hyperleap manages the largest database containing \textit{infotainment} about -the entertainment industry. The database contains over $10.000$ categorized -events per week on average and their venue database contains over $54.000$ -venues delivering the entertainment ranging from theaters and music venues to -petting zoos and fastfood restaurants. All the subjectual information is -obtained or created by Hyperleap and all factual information is gathered from -different sources and quality checked and therefore very reliable. Hyperleap is -the only company in its kind that has such high quality information. The -\textit{infotainment} is presented via several websites specialized per genre -or category and some sites attract over $500.000$ visitors per month. - -\section{Extracting data from plain text} - +the leisure industry in the Netherlands and surroundings. The database contains +over $10.000$ categorized events on average per week and their venue database +contains over $54.000$ venues delivering the leisure activities ranging from +theaters and music venues to petting zoos and fastfood restaurants. All the +subjectual information is obtained or created by Hyperleap and all factual +information is gathered from different sources, quality checked and therefore +very reliable. Hyperleap is the only company in its kind that has such high +quality information. The \textit{infotainment} is presented via several +websites specialized per genre or category and some sites attract over +$500.000$ visitors per month. \section{Information flow} The reason why Hyperleap is the only in its kind with the high quality data is diff --git a/thesis2/Makefile b/thesis2/Makefile index 3d00019..be514f1 100644 --- a/thesis2/Makefile +++ b/thesis2/Makefile @@ -1,32 +1,32 @@ SHELL:=/bin/bash VERSION:=1.0RC1 +SOURCES:=1.introduction.tex 2.requirementsanddesign.tex 3.methods.tex\ +4.discussion.tex 5.appendices.tex abstract.tex thesis.tex scheme1.xsd\ +scheme2.xsd appoverview.dot backend.dot dagexample.dot exrss.xml\ +graphexample.dot inccons.dot nddawg.dot nodelistexample.dot scheme.xsd -all: thesis +all: thesis.pdf -pre: - head -50 scheme.xsd > scheme1.xsd - tail -n +49 scheme.xsd > scheme2.xsd - dot -Teps appoverview.dot > appoverview.eps - dot -Teps backend.dot > backend.eps - dot -Teps nodelistexample.dot > nodelistexample.eps - dot -Teps nddawg.dot > nddawg.eps +scheme1.xsd: scheme.xsd + head -50 $< > $@ -thesis: pre - latex -shell-escape thesis.tex > log.txt - bibtex thesis.aux >> log.txt - latex -shell-escape thesis.tex >> log.txt - latex -shell-escape thesis.tex >> log.txt - dvipdfm thesis.dvi >> log.txt 2>&1 - mv -v {thesis,mart_thesis_$(VERSION)}.pdf +scheme2.xsd: scheme.xsd + tail -n +49 $< > $@ -pack: clean - rm -fv version/mart_thesis_$(VERSION).tar{,.gz} - tar -cvf version/mart_thesis_$(VERSION).tar *.{tex,xml,png,bib,xsd} - gzip -9 version/mart_thesis_$(VERSION).tar +%.eps: %.dot + dot -Teps $@ > $< +%.pdf: %.dvi + dvipdfm $< + +%.dvi: $(SOURCES) + latex -shell-escape thesis.tex + bibtex thesis.aux + latex -shell-escape thesis.tex + latex -shell-escape thesis.tex clean: - rm -vf *.{aux,bbl,blg,dvi,log,out,toc,ps,pyg} log.txt scheme[12].xsd + @$(RM) -v *.{aux,bbl,blg,dvi,log,out,toc,ps,pyg} scheme[12].xsd clobber: clean - rm -vf *.pdf + @$(RM) -v *.pdf diff --git a/thesis2/abstract.tex b/thesis2/abstract.tex index 9f5e594..05f5f49 100644 --- a/thesis2/abstract.tex +++ b/thesis2/abstract.tex @@ -1,12 +1,12 @@ Within the leisure activity field, information is often bundled badly and contains empty or wrong data. Hyperleap tries to solve this problem by bundling the information from various sources including RSS feeds. Currently the -feedback loop for fixing site-specific crawlers requires multiple steps which -demand someone with a computer science background to perform. We introduce a -new adaptable crawler generation system using subword matching via an adapted -form of directed acyclic word graphs. The application allows users with no -particular computer science background to create, edit and test crawlers for -RSS feeds. In this way the feedback loop for broken crawlers is shortened, new -sources can be incorporated in the database quicker and, most importantly, the -information about the latest movie show, theater production or conference will -reach the people looking for it as fast as possible. +feedback loop for fixing site-specific crawlers requires multiple steps of which +multiple steps demand someone with a computer science background to perform. We +introduce a new adaptable crawler generation system using substring matching via +an adapted form of directed acyclic word graphs. The application allows users +with no particular computer science background to create, edit and test +crawlers for RSS feeds. In this way the feedback loop for broken crawlers is +shortened, new sources can be incorporated in the database quicker and, most +importantly, the information about the latest movie show, theater production or +conference will reach the people looking for it as fast as possible. -- 2.20.1