Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: raise exception if CollectDuplexSeqMetrics run on consensus BAM #1003

Merged
merged 13 commits into from
Jul 31, 2024

Conversation

znorgaard
Copy link
Contributor

@znorgaard znorgaard commented Jul 22, 2024

Closes #992.

This PR adds a new isConsensusRead method to Umis.scala which is then used in CollectDuplexSeqMetrics to raise an exception if the first record is a consensus record.

In addition to the unit tests, I ran this on a consensus BAM locally and generated the expected error message. Running on the preceding grouped BAM generated the expected outputs as well.

Copy link

codecov bot commented Jul 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.62%. Comparing base (9a4b01d) to head (c3aa60c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1003   +/-   ##
=======================================
  Coverage   95.62%   95.62%           
=======================================
  Files         126      126           
  Lines        7382     7388    +6     
  Branches      497      510   +13     
=======================================
+ Hits         7059     7065    +6     
  Misses        323      323           
Flag Coverage Δ
unittests 95.62% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@clintval clintval assigned znorgaard and unassigned clintval Jul 23, 2024
tfenne
tfenne previously requested changes Jul 24, 2024
src/main/scala/com/fulcrumgenomics/bam/api/SamRecord.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
Comment on lines 310 to 314
.tapEach {
r =>
if (Umis.isConsensusRead(r)) throw new IllegalArgumentException("Found consensus record. Expected UMI-grouped BAM")
else r
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts here:

  1. Is it necessary to check each and every read? Or could you add .bufferBetter to the construction of _filteredIterator and then just check the first element?
  2. I think this is a case where a much more comprehensive error message would be useful, something like:
Input BAM file to CollectDuplexSeqMetrics ($input) appears to contain consensus reads.
CollectDuplexSeqMetrics cannot run on consensus BAMs, and instead requires the UMI-grouped BAM prior to consensus calling.  The UMI-grouped BAM is the output of running
GroupReadsByUmi.

First record in $input has consensus tags present:
${_filteredIterator.peek()}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point, I think it depends on what we're trying to guard against. If we're only guarding against a well formed consensus BAM being input then I think checking the first record should work.

If we're guarding against all the chaos a user may concoct, then this will not be sufficient. But, I think guarding against all those possibilities is probably unnecessary and unrealistic.

Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I do like the buffered and peek approach since a SAM/BAM is likely to have all the records with the tags, or none of them (in the non-corrupt example), so checking the first should be sufficient to catch 99%+ of improper inputs.

Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well for me! Thanks Zach! @tfenne could you give a final review?

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but lgtm

src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
src/main/scala/com/fulcrumgenomics/umi/Umis.scala Outdated Show resolved Hide resolved
@nh13 nh13 dismissed tfenne’s stale review July 31, 2024 20:50

All comments addressed, Tim is OOO

@nh13 nh13 removed the request for review from tfenne July 31, 2024 20:51
@znorgaard znorgaard merged commit 6ca7733 into main Jul 31, 2024
6 checks passed
@znorgaard znorgaard deleted the zn_is_consensus branch July 31, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn or raise an exception when CollectDuplexSeqMetrics is run on a consensus BAM
4 participants