Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix combine CLI producing empty output with invalid data #56

Merged
merged 9 commits into from
Oct 23, 2024
Merged

Conversation

nebfield
Copy link
Member

@nebfield nebfield commented Oct 18, 2024

When processing a single file, if the combine CLI encountered an invalid variant it would quietly fail to write out any variants (quiet except for a misleading log statement). Some investigation notes:

  • The invalid data wasn't invalid (some variants failed harmonisation and were missing mandatory fields)
    • Update the pydantic models to support this case properly
  • The CLI wasn't re-raising exceptions correctly because I had a return statement inside a finally block 😬
    • Add tests to make sure badly harmonised variants do create an output file and invalid variants do throw a ValidationError exception
    • Add a check to the CLI that an output file actually exists
  • Re-test this branch on the entire Catalog and check for exceptions

Closes #55

Test results

22 (older) scoring files contain invalid rsIDs:

pgscatalog.core.cli.combine_cli: 2024-10-18 14:43:12 CRITICAL PGS000019 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 14:52:23 CRITICAL PGS000042 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000212 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000213 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000214 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000215 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 15:09:03 CRITICAL PGS000216 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000310 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:04 CRITICAL PGS000311 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:26:06 CRITICAL PGS000317 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:36:01 CRITICAL PGS000330 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000332 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:46:39 CRITICAL PGS000333 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000344 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000345 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000346 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 16:48:31 CRITICAL PGS000347 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:14:11 CRITICAL PGS000727 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000728 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:15:24 CRITICAL PGS000729 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:22:39 CRITICAL PGS000754 contains invalid data, stopping and exploding
pgscatalog.core.cli.combine_cli: 2024-10-18 19:33:12 CRITICAL PGS000867 contains invalid data, stopping and exploding

Fix is to relax the rsID check when harmonisation goes wrong. No other ValidationErrors get thrown.

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 98.30508% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.61%. Comparing base (15f7095) to head (1566a89).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pgscatalog.core/src/pgscatalog/core/lib/models.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   89.37%   89.61%   +0.24%     
==========================================
  Files          54       54              
  Lines        3434     3475      +41     
==========================================
+ Hits         3069     3114      +45     
+ Misses        365      361       -4     
Flag Coverage Δ
pgscatalog.core 92.57% <98.30%> (+0.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nebfield nebfield marked this pull request as ready for review October 18, 2024 12:07
@nebfield nebfield requested a review from smlmbrt October 21, 2024 07:23
Copy link
Member

@smlmbrt smlmbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it passes on all existing harmonized files it should be good to go.

@nebfield nebfield merged commit 3b3bf34 into main Oct 23, 2024
10 checks passed
@nebfield nebfield deleted the fix-65 branch October 23, 2024 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WARNING None fewer variants in output compared to original file
2 participants