final
[bsc-thesis1415.git] / defense / defense.txt
1 Intro:
2 Hyperleap + infotaintment
3 Relieve programmer fixing crawlers
4 System to generate crawler specification
5 Frontend useable for non programmers
6
7 Frontend:
8 Runs in browser
9 Runs from apache and python
10
11 Backend:
12 Converts the user patterns from frontend to nodelists.
13 Nodelists are merged into DAWG minimization to generate patterns(graphs).
14 The crawler reads the patterns and crawls the site.
15 Crawler results are send via an XML/XSD stream to the original backend.
16
17 Results:
18 Few RSS
19 Much RSS misuse
20
21 Future:
22 Extend to HTML (program to convert HTML to RSS)
23 Reuse interface
24 Low level matching can increase
25
26 Questions:
27 - Why is user interface easy to use
28 Direct feedback
29 Familiar interface with buttons and textboxes
30
31 - Why did you choose RSS
32 We had to limit scope
33 RSS is very consistent in underlying structure
34 But RSS doesn't have any structure in itself but underlying because
35 they are generated
36