Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: threshold of 1.11m on equal_shape_distance_diff_coordinates #1675

Merged
merged 8 commits into from
Mar 8, 2024

Conversation

cka-y
Copy link
Contributor

@cka-y cka-y commented Feb 16, 2024

Summary:
Closes #1638
Closes #1678
Concerns have been raised about the equal_shape_dist_diff_coordinates error's sensitivity to minor coordinate discrepancies. Historical discussions proposed a $1.11 m$ threshold to distinguish between ERROR ($\ge 1.11m$) and WARNING ($\lt 1.11 m$) levels. Given the expansion of the Mobility Database since these initial discussions, reevaluating this threshold with current data offers a chance to refine the rule's sensitivity.

Expected behavior:

  • Introduces a 1.11m threshold for equal_shape_distance_diff_coordinates, triggering an ERROR for distances $\geq 1.11m$.
  • Creates a new notice, equal_shape_distance_diff_coordinates_below_threshold, with WARNING severity for distances $<1.11m$.

Acceptance tests results:
The acceptance tests results show a 6% reduction (93 notices) in ERROR notices. Additionally, 166 new WARNING notices were generated. This is because certain datasets exhibited distances between coordinates that were both greater than and less than $1.11m$, thus qualifying for both WARNING and ERROR notices.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@MobilityData MobilityData deleted a comment from github-actions bot Feb 19, 2024
@MobilityData MobilityData deleted a comment from github-actions bot Feb 19, 2024
@cka-y cka-y changed the title feat: Test setting threshold of 1.11m on equal_shape_dist_diff_coordi… feat: threshold of 1.11m on equal_shape_distance_diff_coordinates Feb 19, 2024
@cka-y cka-y marked this pull request as ready for review February 19, 2024 16:36
@cka-y cka-y added the do not merge This PR needs more work/discussion or is not meant to be merged label Feb 19, 2024
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1467 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 93 out of 1467 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 166 out of 1467 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1467 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
12 out of 1479 sources (~1 %) are corrupted.
Corrupted sources:
ca-quebec-societe-de-transport-de-sherbrooke-gtfs-756
ch-unknown-shuttler-gtfs-1970
es-madrid-empresa-municipal-de-transportes-de-madrid-emt-madrid-gtfs-793
fi-pohjois-karjala-pos-ely-joensuu-gtfs-1125
us-california-beaumont-gtfs-106
us-california-corona-cruiser-gtfs-104
us-california-fairfield-and-suisun-transit-fast-transit-gtfs-76
us-california-redding-area-bus-authority-raba-gtfs-1972
us-california-san-diego-international-airport-metropolitan-transit-system-mts-gtfs-13
us-california-san-francisco-bay-area-water-emergency-transportation-authority-gtfs-62
us-colorado-el-dorado-transit-gtfs-75
us-colorado-envida-gtfs-455
Commit: 7826f00
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@MobilityData MobilityData deleted a comment from github-actions bot Feb 19, 2024
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1481 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 92 out of 1481 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 166 out of 1481 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1481 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
5 out of 1486 sources (~0 %) are corrupted.
Corrupted sources:
be-wallonne-tec-gtfs-1868
ca-alberta-edmonton-transit-system-gtfs-714
ca-quebec-exo-ville-de-sainte-julie-gtfs-753
cl-la-araucania-temuco-gtfs-1864
fr-auvergne-rhone-alpes-tac-gtfs-1883
Commit: 7baf5f3
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@emmambd
Copy link
Contributor

emmambd commented Feb 20, 2024

I did a deep dive into the data reported during our last release (4.2) and used that as the source of truth to compare the original number of feeds that triggered this error vs. the revised list from these acceptance tests.

This threshold reduces the number of feeds triggered by this error by 50% (174 -> 87).

If there are persistent user concerns after this rule change, we can revisit the threshold by doing analysis on all notice occurrences with the updated rule and the difference in metres for each occurrence, and then doing a distribution graph to help determine a more lenient threshold.

For this release iteration, 1.11m is sufficient.

Copy link
Contributor

@emmambd emmambd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cka-y 1 minor comment re: notice name

@cka-y cka-y removed the do not merge This PR needs more work/discussion or is not meant to be merged label Feb 21, 2024
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 93 out of 1485 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 166 out of 1485 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1485 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
1 out of 1486 sources (~0 %) are corrupted.
Corrupted sources:
ca-quebec-exo-laurentides-gtfs-744
Commit: ebc8589
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@cka-y cka-y requested a review from emmambd February 23, 2024 14:22
Copy link
Contributor

❌ Invalid acceptance test.
New Errors: 0 out of 1500 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 94 out of 1500 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 168 out of 1500 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1500 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
6 out of 1506 sources (~0 %) are corrupted.
Corrupted sources:
ca-ontario-toronto-transit-commission-gtfs-732
ca-ontario-transit-windsor-gtfs-720
ca-quebec-exo-laurentides-gtfs-744
ca-quebec-societe-de-transport-de-sherbrooke-gtfs-756
cy-unknown-osea-gtfs-1916
de-bayern-erfurter-verkehrsbetriebe-ag-gtfs-780
Commit: 6d97a6d
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@emmambd emmambd requested a review from jcpitre February 29, 2024 21:14
@MobilityData MobilityData deleted a comment from github-actions bot Mar 6, 2024
Copy link
Contributor

github-actions bot commented Mar 6, 2024

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 96 out of 1520 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 169 out of 1520 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: 9ae15c9
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

❌ Invalid acceptance test.
New Errors: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 96 out of 1520 datasets (~6%) are invalid due to code change, which is above the provided threshold of 1%.
New Warnings: 170 out of 1520 datasets (~11%) are invalid due to code change, which is above the provided threshold of 1%.
Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1520 sources (~0 %) are corrupted.
Commit: f064b0a
Download the full acceptance test report here (report will disappear after 90 days).
❌ Invalid acceptance test.

@cka-y cka-y merged commit 62c7784 into master Mar 8, 2024
332 of 333 checks passed
@cka-y cka-y deleted the feat/1638 branch March 8, 2024 20:38
@emmambd emmambd linked an issue Mar 8, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants