better ignores
[bsc-thesis1415.git] / log / 2014-04-08.txt
1 TIME: 3
2
3 Meeting with Alessandro and discussed with Jan about the project scope.
4
5 Worst case a trainable by non IT rss feed crawler. Best case also websites
6 parseable.
7
8 PLANS
9 =====
10 literature research, compare programming languages, python, php/javascript.
11 Server of HL has python. Crawler is going to be python for sure.
12
13 So basically there is are three components:
14 - Frontend
15 The frontend is the user interface for the non IT user and is probably a
16 plugin for chrome or firefox. This generates a scheme which is parseable by
17 the crawler.
18 - Crawler
19 The crawler periodically crawls the sites/feeds using the generated schemes
20 and notifies the admins if there is a change in layout. The crawler
21 generates xml that is later parsed by the backend.
22 - Backend
23 The backend is not within the scope of this project but it will parse the
24 xml given by the crawler.