arxiv-crawler

Crawl arXiv paper and organize as a database

Modifying crawling range

# crawler.py
fields = ['CV']
months = ['{:0>2d}'.format(i+1) for i in range(12)]
years = ['{:0>2d}'.format(i) for i in range(6, 17)]

Launch the crawler

$ python crawler.py
Retrieving http://arxiv.org/list/cs.CV/0601?show=1000
...

Check the results

$ python
>>> import sqlite3
>>> conn = sqlite3.connect('arxiv_raw.sqlite')
>>> cur = conn.cursor()
>>> cur.execute('SELECT * FROM sqlite_master')
>>> print cur.fetchall() # print the information for all tables

Future work

Still figuring the best way to visualize papers

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BeautifulSoup.py		BeautifulSoup.py
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxiv-crawler

Modifying crawling range

Launch the crawler

Check the results

Future work

About

Releases

Packages

Languages

joelthchao/arxiv-crawler

Folders and files

Latest commit

History

Repository files navigation

arxiv-crawler

Modifying crawling range

Launch the crawler

Check the results

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages