Add margin outlier filter #114

lennybronner · 2024-10-12T20:17:03Z

Description

Adding an outlier detection for margin outliers also (in addition to the turnout outliers, which we already have). Also added the ability to blacklist units (or even entire states).

Jira Ticket

Test Steps

Running the testbed like this will show 3352100 and 21083 be printed as unexpected units. And all Maryland units also:

python run.py 2020-11-03_USA_G redo --office_id P --estimands="['margin']" --features "['baseline_normalized_margin', 'ethnicity_likely_african_american', 'percent_bachelor_or_higher']" --current_run_name "rerun_2020_remove_outlier" --agg_model_preds --start_timestamp 2020-11-03T19:54:29-05:00 --end_timestamp 2020-11-04T06:07:13-05:00 --redo_run_model_every_n 100 --model_parameters "{'unit_blacklist': ['3352100', '21083'], 'postal_code_blacklist': ['MD']}"

dmnapolitano · 2024-10-16T15:45:31Z

src/elexmodel/handlers/data/CombinedData.py

@@ -163,24 +196,77 @@ def _get_unexpected_units(self, aggregates):

        return unexpected_units

-    def _get_non_modeled_units(self, percent_reporting_threshold, turnout_factor_lower, turnout_factor_upper):
-        expected_geographic_units = self._get_expected_geographic_unit_fips().tolist()
+    def _fit_outlier_detection_model(self, reporting_units, response_variable, outlier_z_threshold):


Do you have a sense of:

how much better this works compared to say, mean ± (z * standard deviation) or the IQR method?

how much additional runtime this adds on to the model?

🤔

dmnapolitano · 2024-10-16T15:47:37Z

src/elexmodel/handlers/data/CombinedData.py

+
+
+        if "margin" in self.estimands:
+            # reporting_units["normalized_margin_change"] = (


Do you want to keep this commented-out code? 🤔

Also can you remove the extra blank line above if "margin" in self.estimands? I am genuinely surprised pre-commit didn't remove that lol 🤔

dmnapolitano · 2024-10-16T15:49:20Z

src/elexmodel/models/BootstrapElectionModel.py

@@ -1024,6 +1024,7 @@ def compute_bootstrap_errors(
        self.weighted_z_test_pred = z_test_pred * weights_test
        self.ran_bootstrap = True
        self.n_contests = aggregate_indicator.shape[1]
+        # import IPython; IPython.embed()


Can you remove this? (Also I had never heard of IPython.embed() before and it's cool 😄 )

dmnapolitano · 2024-10-16T15:53:14Z

tests/handlers/test_combined_data.py

+    assert reporting_data.shape[0] == 133 - over
+
+
+def test_unit_blacklist(va_governor_county_data, rng):


I'm fine with either "blocklist" or "blacklist", so can you either change this to "blocklist" or change CombinedDataHandler to "blacklist"? 🤔

lennybronner added 6 commits October 5, 2024 15:47

added filter for margin outliers

4a49506

linter

60d99be

removing em file

1dc4dc7

removing em

5e5d066

Merge branch 'develop' into add-margin-outlier-filter

6b6cd86

added unit blacklister

5269261

lennybronner requested a review from a team as a code owner October 12, 2024 20:17

lennybronner added 9 commits October 12, 2024 16:18

linter

6a42513

fixed tests

a1b8ad2

linter

5c7a374

added additional unit tests

0291912

linter

77d4c68

deal with randomness

f71b7b6

renamed + better printing of non-modeled units

cd22a53

linter

8e8b87d

modeled outliers now

c7f37d2

dmnapolitano self-assigned this Oct 16, 2024

dmnapolitano reviewed Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add margin outlier filter #114

Add margin outlier filter #114

lennybronner commented Oct 12, 2024 •

edited

Loading

dmnapolitano Oct 16, 2024

dmnapolitano Oct 16, 2024

dmnapolitano Oct 16, 2024

dmnapolitano Oct 16, 2024



		if "margin" in self.estimands:
		# reporting_units["normalized_margin_change"] = (

		assert reporting_data.shape[0] == 133 - over


		def test_unit_blacklist(va_governor_county_data, rng):

Add margin outlier filter #114

Are you sure you want to change the base?

Add margin outlier filter #114

Conversation

lennybronner commented Oct 12, 2024 • edited Loading

Description

Jira Ticket

Test Steps

dmnapolitano Oct 16, 2024

Choose a reason for hiding this comment

dmnapolitano Oct 16, 2024

Choose a reason for hiding this comment

dmnapolitano Oct 16, 2024

Choose a reason for hiding this comment

dmnapolitano Oct 16, 2024

Choose a reason for hiding this comment

lennybronner commented Oct 12, 2024 •

edited

Loading