SHOGUN fails silently when bowtie2 runs out of memory #42

ElDeveloper · 2022-05-04T16:42:45Z

Working through a dataset, I found that most of the resulting alignments only included 100K-200K sequence identifiers from the input dataset even though most of my samples have >1M sequences. Unsure of what was going on, I tried running bowtie2 manually (according to the command call here). That's when I noticed my OS was killing bowtie2 with signal 9:

bowtie2 --no-unal -x /[redacted]/shogun-db/bt2/rep82 -S [redacted].sam --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min '"L,0,-0.02"' -f [redacted].fna --very-sensitive -k 16 -p 16 --reorder --no-hd
(ERR): bowtie2-align died with signal 9 (KILL)

After this happened, I checked the exit code (using echo $?) and saw error code 1. As best as I can tell there's nowhere in the SHOGUN code that checks for the exit code of bowtie2. While it is being returned here:

SHOGUN/shogun/aligners/bowtie2_aligner.py

Lines 32 to 38 in 24109b7

    
           proc, out, err = bowtie2_align(infile, outfile, self.prefix, 
        
                                num_threads=self.threads, alignments_to_report=alignments_to_report, shell=self.shell, percent_id=self.percent_id) 
        
           if self.post_align: 
        
               df = self._post_align(outfile) 
        
               self.outfile = os.path.join(outdir, 'taxatable.bowtie2.txt') 
        
               df.to_csv(self.outfile, sep='\t', float_format="%d", na_rep=0, index_label="#OTU ID") 
        
           return proc, out, err

There's no checks for it in align method calls:

SHOGUN/shogun/__main__.py

Line 75 in 24109b7

aligner_cl.align(input, output)

SHOGUN/shogun/__main__.py

Line 78 in 24109b7

aligner_cl.align(input, output)

The worse thing about this error is that since SHOGUN won't fail or catch this error, you can successfully process a dataset and generate incomplete contingency tables. The resulting SAM file is written to disk but it obviously incomplete, unfortunately shogun assign_taxonomy doesn't know this so it just processes the dataset as expected.

In my case running on a 32GB system my samples were missing around 60-80% of their reads.

The text was updated successfully, but these errors were encountered:

bhillmann · 2022-11-10T19:26:40Z

Good call and a thorough investigation. This is indeed a nightmare situation where there is a silent bug. We should open up a PR and handle exit codes from the aligners.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHOGUN fails silently when bowtie2 runs out of memory #42

SHOGUN fails silently when bowtie2 runs out of memory #42

ElDeveloper commented May 4, 2022

bhillmann commented Nov 10, 2022

SHOGUN fails silently when bowtie2 runs out of memory #42

SHOGUN fails silently when bowtie2 runs out of memory #42

Comments

ElDeveloper commented May 4, 2022

bhillmann commented Nov 10, 2022