Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupReadsByUmi duplicate marking may fail when secondary and supplementary alignments are included #961

Open
nh13 opened this issue Jan 29, 2024 · 2 comments

Comments

@nh13
Copy link
Member

nh13 commented Jan 29, 2024

See: samtools/hts-specs#755

@msto
Copy link
Contributor

msto commented May 21, 2024

Adding some color -

When attempting to mark duplicates in a BAM containing supplementary alignments, fgbio raises an exception. The exception appears to be because the primary alignment in the template was removed (or is possibly not sorted with the associated supplementary alignment?)

$ fgbio GroupReadsByUmi --strategy=Adjacency --input=input.bam --output=output.bam --mark-duplicates --include-supplementary=true
[2024/05/21 13:38:41 | FgBioMain | Info] Executing GroupReadsByUmi from fgbio version 2.2.1 as msto@Matts-MBP on JRE 22.0.1+8 with snappy, JdkInflater, and JdkDeflater
[2024/05/21 13:38:41 | GroupReadsByUmi | Info] Filtering the input.
[2024/05/21 13:38:41 | GroupReadsByUmi | Info] Sorting the input to TemplateCoordinate order.
[2024/05/21 13:38:41 | GroupReadsByUmi | Info] Seen many non-increasing record positions. Printing Read-names as well.
[2024/05/21 13:38:42 | GroupReadsByUmi | Info] Sorted       432,775 records.  Elapsed time: 00:00:01s.  Time for last 432,775:    1s.  Last read position: chr20:42,368,210.  Last read name: FS10002716:9:BTR99611-1426:1:1116:15810:4090
[2024/05/21 13:38:43 | GroupReadsByUmi | Info] Accepted 432,775 reads for grouping.
[2024/05/21 13:38:43 | GroupReadsByUmi | Info] Filtered out 604 reads due to mapping issues.
[2024/05/21 13:38:43 | GroupReadsByUmi | Info] Filtered out 0 reads that contained one or more Ns in their UMIs.
[2024/05/21 13:38:43 | GroupReadsByUmi | Info] Assigning reads to UMIs and outputting.
[2024/05/21 13:38:43 | FgBioMain | Info] GroupReadsByUmi failed. Elapsed time: 0.09 minutes.
Exception in thread "main" java.lang.IllegalStateException: FS10002716:9:BTR99611-1426:1:1103:7210:2350 did not have a primary R1 record.
        at com.fulcrumgenomics.umi.GroupReadsByUmi$ReadInfo$.$anonfun$apply$3(GroupReadsByUmi.scala:118)
        at scala.Option.getOrElse(Option.scala:201)
        at com.fulcrumgenomics.umi.GroupReadsByUmi$ReadInfo$.apply(GroupReadsByUmi.scala:118)
        at com.fulcrumgenomics.umi.GroupReadsByUmi.takeNextGroup(GroupReadsByUmi.scala:765)
        at com.fulcrumgenomics.umi.GroupReadsByUmi.execute(GroupReadsByUmi.scala:710)
        at com.fulcrumgenomics.cmdline.FgBioMain.makeItSo(FgBioMain.scala:124)
        at com.fulcrumgenomics.cmdline.FgBioMain.makeItSoAndExit(FgBioMain.scala:99)
        at com.fulcrumgenomics.cmdline.FgBioMain$.main(FgBioMain.scala:50)
        at com.fulcrumgenomics.cmdline.FgBioMain.main(FgBioMain.scala)

Setting --include-supplementary=False is sufficient to eliminate the exception, but I haven't examined the contents of the resulting BAM.

@nh13
Copy link
Member Author

nh13 commented May 22, 2024

@msto want to give #964 a go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants