updating a series of py scripts for SP Reputation analysis #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
spReputationAnalysis.py
Will hit the mongoDB and give you a high level cursory analysis of similarities between collections, list of collections and unique ones in the different tables, and samples from each of them so you can get a visual recognition of what's in these DB's. Run this first and you've got a good starting point of what you're looking at. Also gives you the most recent date a record was added to give you a sense of when the table was last updated.
spReputationDateAnalysis.py
Hits Mongo online and does a more indepth analysis of the date ranges of records so you know what's covered in each of the DB's you're looking at.
spReputationExport.py
This will export all of the DB's in TOTAL (they're pretty small less than a gig but still quite a bit to try and load up in a google sheet) to csv's.
load_csvs.py
This assumes you've installed pandas, and will load everything up into data frames so you can do a cleaning process across all the data as one set and start analysis of dates of occurrences, frequency, etc. I think I can bang out a handful of scripts that do this high level here.