Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eeoc] Update for new site #300

Merged
merged 8 commits into from
Jan 31, 2017
Merged

[eeoc] Update for new site #300

merged 8 commits into from
Jan 31, 2017

Conversation

shanecav84
Copy link
Contributor

@shanecav84 shanecav84 commented Jan 29, 2017

Closes #247.

  • Seems to run fine with --archive
  • As mentioned in EEOC OIG has a new website #247, the new site does not contain all of the reports of the old site. I don't know what the process would be for adding the new site if the old site contains more data, so I left the original eeoc.py in for now.

TODO:

@divergentdave
Copy link
Contributor

Awesome, thanks for doing this! I'll take a look at it. I'm going to add a commit to rename eeoc-new to eeoc, as we don't need to keep the old scraper around.

Regarding the old reports no longer online, we have copies of the PDFs and associated metadata around still. I'll find which reports are now missing, clean them up, and add them over at the unitedstates/reports repository.

Reports with multiple files are a longstanding hairy issue. (see #112) In cases such as that example, we usually pick the most important/substantive file, dropping transmittal letters or memoranda. The EPA OIG scraper is a good example of that approach.

Thanks again for tackling this!

@shanecav84
Copy link
Contributor Author

Thanks, @divergentdave, for rounding out the PR!

@divergentdave
Copy link
Contributor

divergentdave commented Jan 31, 2017

Looks good, merging and deploying!

Edit: Up at https://oversight.garden/reports?inspector=eeoc

@divergentdave divergentdave merged commit c70b725 into unitedstates:master Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants