From 85a338d03f2310e725431a0879d50a1dad64f125 Mon Sep 17 00:00:00 2001 From: Mart Lubbers Date: Wed, 27 May 2015 12:33:04 +0200 Subject: [PATCH] spell check done --- thesis2/3.methods.tex | 8 ++++---- thesis2/4.discussion.tex | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/thesis2/3.methods.tex b/thesis2/3.methods.tex index a30dcbe..9a458b0 100644 --- a/thesis2/3.methods.tex +++ b/thesis2/3.methods.tex @@ -2,7 +2,7 @@ The backend consists of several processing steps that the input has go through before it is converted to a crawler specification. These steps are visualized in Figure~\ref{appinternals}. All the nodes are important milestones in the -process of processing the user data. Arrows indicate informatio transfer +process of processing the user data. Arrows indicate information transfer between these steps. The Figure is a detailed explanation of the \textit{Backend} node in Figure~\ref{appoverview}. @@ -14,12 +14,12 @@ between these steps. The Figure is a detailed explanation of the \end{figure} \section{HTML data} -The raw data from the Frontend with the user markings enter the backend as a +The raw data from the frontend with the user markings enter the backend as a HTTP \textit{POST} request. This \textit{POST} request consists of several information data fields. These data fields are either fields from the static description boxes in the frontend or raw \textit{HTML} data from the table showing the processed RSS feed entries which contain the markings made by the -user. The table is sent in whole precicely at the time the user presses the +user. The table is sent in whole precisely at the time the user presses the submit button. Within the \textit{HTML} data of the table markers are placed before sending. These markers make the parsing of the tables more easy and remove the need for an advanced \textit{HTML} parser to extract the markers. @@ -109,7 +109,7 @@ be found in Listing~\ref{pseudodawg} named as the function in Listing~\ref{dawg.py}. \begin{enumerate} \item - Say we add word $w$ to the grahp. Step one is finding the + Say we add word $w$ to the graph. Step one is finding the common prefix of the word already in the graph. The common prefix is defined as the longest subword $w'$ for which there is a $\delta^*(q_0, w')$. When the common prefix is found we diff --git a/thesis2/4.discussion.tex b/thesis2/4.discussion.tex index 5e3d10d..6bce1a0 100644 --- a/thesis2/4.discussion.tex +++ b/thesis2/4.discussion.tex @@ -49,7 +49,7 @@ feeds text fields. The algorithm is designed to detect and extract information via patterns in plain text and the performance on HTML is very bad compared to plain text. A text field with HTML is almost useless to gather information from because they usually include all kinds of information in other modalities -then text. Via a small study on a selecteion of RSS feeds($N=10$) we found that +then text. Via a small study on a selection of RSS feeds($N=10$) we found that about $50\%$ of the RSS feeds misuse the protocol in such a way that extraction of data is almost impossible. This reduces the domain of good RSS feeds to less then $5\%$ of the venues. -- 2.20.1