Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHOGUN fails silently when bowtie2 runs out of memory #42

Open
ElDeveloper opened this issue May 4, 2022 · 1 comment
Open

SHOGUN fails silently when bowtie2 runs out of memory #42

ElDeveloper opened this issue May 4, 2022 · 1 comment

Comments

@ElDeveloper
Copy link

Working through a dataset, I found that most of the resulting alignments only included 100K-200K sequence identifiers from the input dataset even though most of my samples have >1M sequences. Unsure of what was going on, I tried running bowtie2 manually (according to the command call here). That's when I noticed my OS was killing bowtie2 with signal 9:

bowtie2 --no-unal -x /[redacted]/shogun-db/bt2/rep82 -S [redacted].sam --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min '"L,0,-0.02"' -f [redacted].fna --very-sensitive -k 16 -p 16 --reorder --no-hd
(ERR): bowtie2-align died with signal 9 (KILL)

After this happened, I checked the exit code (using echo $?) and saw error code 1. As best as I can tell there's nowhere in the SHOGUN code that checks for the exit code of bowtie2. While it is being returned here:

proc, out, err = bowtie2_align(infile, outfile, self.prefix,
num_threads=self.threads, alignments_to_report=alignments_to_report, shell=self.shell, percent_id=self.percent_id)
if self.post_align:
df = self._post_align(outfile)
self.outfile = os.path.join(outdir, 'taxatable.bowtie2.txt')
df.to_csv(self.outfile, sep='\t', float_format="%d", na_rep=0, index_label="#OTU ID")
return proc, out, err

There's no checks for it in align method calls:

aligner_cl.align(input, output)

aligner_cl.align(input, output)

The worse thing about this error is that since SHOGUN won't fail or catch this error, you can successfully process a dataset and generate incomplete contingency tables. The resulting SAM file is written to disk but it obviously incomplete, unfortunately shogun assign_taxonomy doesn't know this so it just processes the dataset as expected.


In my case running on a 32GB system my samples were missing around 60-80% of their reads.

@bhillmann
Copy link
Collaborator

Good call and a thorough investigation. This is indeed a nightmare situation where there is a silent bug. We should open up a PR and handle exit codes from the aligners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants