Memory issues when running BaseRecalibrator #8726

MetteBoge · 2024-03-09T00:38:18Z

Hi,
I need to run BaseRecalibrator as a part of the preprocessing of my RNAseq bam files before variant calling. But I experience difficulties with memory! Here is the error I get:

  22:30:25.477 INFO  BaseRecalibrator - Start Date/Time: March 8, 2024 at 10:30:25 PM GMT
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - ------------------------------------------------------------
  22:30:25.477 INFO  BaseRecalibrator - HTSJDK Version: 4.1.0
  22:30:25.478 INFO  BaseRecalibrator - Picard Version: 3.1.1
  22:30:25.478 INFO  BaseRecalibrator - Built for Spark Version: 3.5.0
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  22:30:25.478 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  22:30:25.478 INFO  BaseRecalibrator - Deflater: IntelDeflater
  22:30:25.478 INFO  BaseRecalibrator - Inflater: IntelInflater
  22:30:25.478 INFO  BaseRecalibrator - GCS max retries/reopens: 20
  22:30:25.479 INFO  BaseRecalibrator - Requester pays: disabled
  22:30:25.479 INFO  BaseRecalibrator - Initializing engine
  WARNING       2024-03-08 22:30:25     SamFiles        The index file /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bai was found by resolving the canonical path of a symlink: VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -> /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/d6/362957b6215ad2e8193c27c895d42d/VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam
  22:30:25.631 INFO  FeatureManager - Using codec VCFCodec to read file file://1000G_phase1.snps.high_confidence.hg38.vcf.gz
  22:30:25.754 INFO  FeatureManager - Using codec VCFCodec to read file file://Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
  23:39:21.541 INFO  BaseRecalibrator - Shutting down engine
  [March 8, 2024 at 11:39:21 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 68.94 minutes.
  Runtime.totalMemory()=214748364800
  java.lang.OutOfMemoryError: Java heap space
        at htsjdk.tribble.readers.TabixReader.readInt(TabixReader.java:189)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:274)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:433)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:377)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:319)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:291)
        at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
        at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
        at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
        at org.broadinstitute.hellbender.engine.ReadWalker.initializeFeatures(ReadWalker.java:72)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:726)
        at org.broadinstitute.hellbender.engine.ReadWalker.onStartup(ReadWalker.java:51)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:147)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)
  Using GATK jar /gatk/gatk-package-4.5.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms200G -Xmx200G -XX:ParallelGCThreads=2 -jar /gatk/gatk-package-4.5.0.0-local.jar BaseRecalibrator -I VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.bam -O VR0024SA.withoutERCCs.withRG.markedDup.splitNcigar.baseRecal.bam -R GRCh38.primary_assembly.genome.fa --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tmp-dir /tmp --disable-bam-index-caching true

Work dir:
  /mnt/storage/users/dockworker/mpedersen/work/RNAseq_variant_call/work/71/ac26344f0e095f7fe77cbb45a334db

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

I tried to run it like this:

    gatk --java-options "-Xms200G -Xmx200G -XX:ParallelGCThreads=2" \
    BaseRecalibrator \
    -I $input_bam \
    -O "${file(input_bam).baseName}.baseRecal.bam" \
    -R $reference \
    --known-sites $kg_snp \
    --known-sites $kg_indel \
    --tmp-dir /tmp \
    --disable-bam-index-caching true

but I still get the memory error. I have more memory to use, but it seems very inefficient if I need to go up to 1TB? Why can I not make this run? And is there any alternative when I want to do the MarkDup, SplitCigar, BaseRecal ?

Hope you can help,
BR,
Mette

The text was updated successfully, but these errors were encountered:

takutosato · 2024-11-11T03:22:04Z

Hello, from the stack trace it looks like the tool runs out of memory before it even starts iterating the bam. Is there anything out of place about the two known sites vcf files? Are the index files up to date? (are they there?) Please let us know if you were able to figure it out.

broadinstitute deleted a comment Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issues when running BaseRecalibrator #8726

Memory issues when running BaseRecalibrator #8726

MetteBoge commented Mar 9, 2024

takutosato commented Nov 11, 2024

Memory issues when running BaseRecalibrator #8726

Memory issues when running BaseRecalibrator #8726

Comments

MetteBoge commented Mar 9, 2024

takutosato commented Nov 11, 2024