Added additional data collection capabilities and fixed bugs in scraper.py #27
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Additional data elements that are now collected per post:
-- Previously collected inconsistently as a string. Now collected reliably as an integer.
-- If a post was being shared by a FB user and additional text was added in the act of sharing, that text was lost. Fixed now.
Fixed a bug in the collection of comment threads. In the previous implementation, comment text was saved in dictionaries that were indexed by the comment author. This would result in dropped content when the same FB user would post multiple times in the comment thread.
The code has been refactored a bit as well to allow the contents of the web scraping to be read from disk and parsed. The contents of the web scraping is saved to disk prior to parsing in case there's an error downstream. This allows for subsequent debugging.