-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add margin outlier filter #114
base: develop
Are you sure you want to change the base?
Conversation
@@ -163,24 +196,77 @@ def _get_unexpected_units(self, aggregates): | |||
|
|||
return unexpected_units | |||
|
|||
def _get_non_modeled_units(self, percent_reporting_threshold, turnout_factor_lower, turnout_factor_upper): | |||
expected_geographic_units = self._get_expected_geographic_unit_fips().tolist() | |||
def _fit_outlier_detection_model(self, reporting_units, response_variable, outlier_z_threshold): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sense of:
- how much better this works compared to say, mean ± (
z
* standard deviation) or the IQR method? - how much additional runtime this adds on to the model?
🤔
|
||
|
||
if "margin" in self.estimands: | ||
# reporting_units["normalized_margin_change"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to keep this commented-out code? 🤔
Also can you remove the extra blank line above if "margin" in self.estimands
? I am genuinely surprised pre-commit
didn't remove that lol 🤔
@@ -1024,6 +1024,7 @@ def compute_bootstrap_errors( | |||
self.weighted_z_test_pred = z_test_pred * weights_test | |||
self.ran_bootstrap = True | |||
self.n_contests = aggregate_indicator.shape[1] | |||
# import IPython; IPython.embed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this? (Also I had never heard of IPython.embed()
before and it's cool 😄 )
assert reporting_data.shape[0] == 133 - over | ||
|
||
|
||
def test_unit_blacklist(va_governor_county_data, rng): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with either "blocklist" or "blacklist", so can you either change this to "blocklist" or change CombinedDataHandler
to "blacklist"? 🤔
Description
Adding an outlier detection for margin outliers also (in addition to the turnout outliers, which we already have). Also added the ability to blacklist units (or even entire states).
Jira Ticket
Test Steps
Running the testbed like this will show
3352100
and21083
be printed as unexpected units. And all Maryland units also: