Selenium & BeautifulSoup-powered single-page application scraper

A basic working model of scraping that which doesn't lend itself to scraping, that is, the single-page application sort of page, in this case, a Firebase and ReactJS-powered deal, http://trialsresults.usatf.org, from U.S. Track & Field for the 2016 U.S. Olympic Track & Field Trials held in Eugene, Ore., June 30 - Jul 10.

usatf.p is a Python pickled selenium webdriver-generated page for using with bs.py to tweak the BeautifulSoup munging separately, so you don't have to request and wait the live page every time.

Installation

A rather bare bones how to makes it work on OS X (10.11.6):

1. `$ git clone   [email protected]:registerguard/spa_scraper.git`
2. `$ cd spa_scraper`  
3. `$ pip install -r requirements.txt`  
4. Install geckodriver:  
    * Download   [geckodriver](https://github.com/mozilla/geckodriver/releases/tag/v0.13.0)  
    * Unzip `tar.gz` file  
    * `$ mv /location/of/unzipped/geckodriver   /usr/local/bin/` or somewhere else on your `$PATH`  
5. `$ python script.py`

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bs.py		bs.py
requirements.txt		requirements.txt
script.py		script.py
usatf.p		usatf.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Selenium & BeautifulSoup-powered single-page application scraper

Installation

About

Releases

Packages

Languages

License

richardwmcgovern/spa_scraper

Folders and files

Latest commit

History

Repository files navigation

Selenium & BeautifulSoup-powered single-page application scraper

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages