-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathTODO
66 lines (37 loc) · 1.37 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
General
-------
XML parser
----------
- Convert namespaces to existing atoms?
- Add BOM handling
- Continuation must be handled character by character. For example need
to be careful with <<"\r">>, <<"\n">>
- Check character references for valid range?
- Add callback for ignorable whitespace?
- Parse entity declarations?
- Parse/ignore <!DOCTYPE
- Allow start_element callback say to parse tag content as characters? Need
to be careful with CDATA sub-elements.
- Add independent callback interface (or just function) to request decoder?
FeedParser
----------
- Store unknown tags as list?
HTTP client
-----------
- Handle Transfer-Encoding headers (see also HTTP/1.1 support below). Also
ignore Content-Length if Transfer-Encoding is present.
- Add support for methods other than GET and HEAD. Maybe behaviour interface
also need to be updated for example to transfer big files with POST method.
- Add support for different content codings
- HTTP/1.1 support: chunked transfer coding, persistent and pipeline
connections.
- HTTP version switch
- HTTPS support
- TCP6 support?
Crawler
-------
- Store URLs in database (we can store hosts in inverse form, like
com.google.www
- Add timeout for every crawl step and restart timed out process if needed
- Add limit on concurrent processes
- Pause creation of new subprocess after abnormal subprocess termination???