From: Mart Lubbers Date: Mon, 24 Nov 2014 21:00:22 +0000 (+0100) Subject: v0.3 X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=fe78905b735a51fdf31d3badb6bc92b87f8d8182;p=bsc-thesis1415.git v0.3 --- diff --git a/thesis2/2.requirementsanddesign.tex b/thesis2/2.requirementsanddesign.tex index c6ee8f7..db8c35a 100644 --- a/thesis2/2.requirementsanddesign.tex +++ b/thesis2/2.requirementsanddesign.tex @@ -102,3 +102,53 @@ Together they make the following definitive requirements: \end{itemize} \section{Design} +\subsection{Frontend} +We explain the design of the frontend application through examples and use +cases. In this way we can explain certain design choices visually and more +specific. + +\subsection{Backend} +\subsubsection{Program description} +The backend consists of a main module and a set of libraries all written in +\textit{Python}\footnote{\url{https://www.python.org/}}. The main module can, +and is, be embedded in an apache +webserver\footnote{\url{https://httpd.apache.org/}} via the +\textit{mod\_python} apache module\footnote{\url{http://modpython.org/}}. The +module \textit{mod\_python} allows the webserver to execute \textit{Python} +code in the webserver. We chose Python because of the rich set of standard +libraries and solid cross platform capabilities. We chose Python 2 because it +is still the default Python version on all major operating systems and stays +supported until at least the year 2020 meaning that the program can function +safe at least 5 full years. The application consists of a main Python module +that is embedded in the webserver. Finally there are some libraries and there +is a standalone program that does the periodic crawling. + +\subsubsection{Main module} +The main module is the program that deals with the requests, controls the +fronted, converts the data to patterns and sends it to the crawler. The +module serves the frontend in a modular fashion. For example the buttons and +colors can be easily edited by a non programmer by just changing some values in +a text file. In this way even when conventions change the program can still +function without intervention of a programmer that needs to adapt the source. + +\subsubsection{Libraries} +The libraries are called by the main program and take care of all the hard +work. Basically the libaries are a group of python scripts that for example +minimize the graphs, transform the user data into machine readable data, export +the crawled data to XML and much more. + +\subsubsection{Standalone crawler} +The crawler is a program that is used by the main module and technically is +part of the libraries. The thing the crawler stands out is the fact that it +also can run on its own. The crawler has to be runned periodically by a server +to really crawl the websites. The main module communicates with the crawler +when it needs XML data, when a new crawler is added or when data is edited. The +crawler also offers a command line interface that has the same functionality as +the web interface of the control center. + +The crawler saves all the data in a database. The database is a simple +dictionary where all the entries are hashed so that the crawler knows which +ones are already present in the database and which ones are new so that it +does not have to process all the old entries when they appear in the feed. The +crawler also has a function to export the database to XML format. The XML +format is specified in an XSD file for minimal ambiguity. diff --git a/thesis2/thesis.tex b/thesis2/thesis.tex index 87c0adb..e58c326 100644 --- a/thesis2/thesis.tex +++ b/thesis2/thesis.tex @@ -1,4 +1,4 @@ -\documentclass{book} +\documentclass[a4paper]{book} \usepackage[british]{babel} diff --git a/thesis2/version/mart_thesis_0.3.tar.gz b/thesis2/version/mart_thesis_0.3.tar.gz new file mode 100644 index 0000000..772c085 Binary files /dev/null and b/thesis2/version/mart_thesis_0.3.tar.gz differ