Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues when running BaseRecalibrator #8726

Open
MetteBoge opened this issue Mar 9, 2024 · 1 comment
Open

Memory issues when running BaseRecalibrator #8726

MetteBoge opened this issue Mar 9, 2024 · 1 comment

Comments

@MetteBoge
Copy link

Hi,
I need to run BaseRecalibrator as a part of the preprocessing of my RNAseq bam files before variant calling. But I experience difficulties with memory! Here is the error I get:

  22:30:25.477 INFO  BaseRecalibrator - Start Date/Time: March 8, 2024 at 10:30:25 PM GMT
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - HTSJDK Version: 4.1.0
  22:30:25.478 INFO  BaseRecalibrator - Picard Version: 3.1.1
  22:30:25.478 INFO  BaseRecalibrator - Built for Spark Version: 3.5.0
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  22:30:25.478 INFO  BaseRecalibrator - Deflater: IntelDeflater
  22:30:25.478 INFO  BaseRecalibrator - Inflater: IntelInflater
  22:30:25.478 INFO  BaseRecalibrator - GCS max retries/reopens: 20
  22:30:25.479 INFO  BaseRecalibrator - Requester pays: disabled
  22:30:25.479 INFO  BaseRecalibrator - Initializing engine
  WARNING       2024-03-08 22:30:25     SamFiles        The index file /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bai was found by resolving the canonical path of a symlink: VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -> /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam
  22:30:25.631 INFO  FeatureManager - Using codec VCFCodec to read file file://1000G_phase1.snps.high_confidence.hg38.vcf.gz
  22:30:25.754 INFO  FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
  23:39:21.541 INFO  BaseRecalibrator - Shutting down engine
  [March 8, 2024 at 11:39:21 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 68.94 minutes.
  Runtime.totalMemory()=214748364800
  java.lang.OutOfMemoryError: Java heap space
        at htsjdk.tribble.readers.TabixReader.readInt(TabixReader.java:189)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:274)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
        at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
        at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
        at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
        at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
        at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)
  Using GATK jar /gatk/gatk-package-4.5.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms200G -Xmx200G -XX:ParallelGCThreads=2 -jar /gatk/gatk-package-4.5.0.0-local.jar BaseRecalibrator -I VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -O VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.baseRecal.bam -R GRCh38.primary_assembly.genome.fa --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir /tmp --disable-bam-index-caching true

Work dir:
  /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/71/ac26344f0e095f7fe77cbb45a334db

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

I tried to run it like this:

    gatk --java-options "-Xms200G -Xmx200G -XX:ParallelGCThreads=2" \
    BaseRecalibrator \
    -I $input_bam \
    -O "${file(input_bam).baseName}.baseRecal.bam" \
    -R $reference \
    --known-sites $kg_snp \
    --known-sites $kg_indel \
    --tmp-dir /tmp \
    --disable-bam-index-caching true

but I still get the memory error. I have more memory to use, but it seems very inefficient if I need to go up to 1TB? Why can I not make this run? And is there any alternative when I want to do the MarkDup, SplitCigar, BaseRecal ?

Hope you can help,
BR,
Mette

@broadinstitute broadinstitute deleted a comment Aug 2, 2024
@takutosato
Copy link
Contributor

Hello, from the stack trace it looks like the tool runs out of memory before it even starts iterating the bam. Is there anything out of place about the two known sites vcf files? Are the index files up to date? (are they there?) Please let us know if you were able to figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants