From: Mart Lubbers Date: Mon, 5 Jan 2015 14:13:05 +0000 (+0100) Subject: update X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=94d3f1b5c84722f4858998ca5de460e534ae1c3f;p=bsc-thesis1415.git update --- diff --git a/thesis2/4.discussion.tex b/thesis2/4.discussion.tex index 36fb0e8..689bc30 100644 --- a/thesis2/4.discussion.tex +++ b/thesis2/4.discussion.tex @@ -1,10 +1,24 @@ \section{Conclusion} + \section{Discussion} \label{sec:discuss} + \begin{itemize} \item No low level stuff, future research \item RSS not that great of a source, \item Expand technique to HTML, reuse interface, defining patterns - \item Combine RSS and HTML + The interface for managing the crawlers works very intuitive and therefore + this system could be extended with a dedicated HTML crawler generation + module. The current method for extracting the information is not very + suitable for HTML but due to the modularity of the program a module can be + easily implemented to incorporate another technique in the application. + \item \textbf{Combine RSS and HTML}\\ + A solution for bridging the gap between HTML and RSS could be a software + solution that can convert HTML to RSS feeds that can be fed to the existing + application. When HTML sites are of a certain structure, namely that with + news articles created by a CMS, they can be converted to RSS by flattening + out the structure and create the specified fields of information of RSS + entries. In this way the current application can be used to also process + possibly complicated HTML sources. \end{itemize} diff --git a/thesis2/version/mart_thesis_0.5.tar.gz b/thesis2/version/mart_thesis_0.5.tar.gz new file mode 100644 index 0000000..9c65fbc Binary files /dev/null and b/thesis2/version/mart_thesis_0.5.tar.gz differ