-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHOGUN silently produces empty output alignment when BURST segfaults #18
Comments
Hi, thanks for the report!
Could you tell me a bit more about the target computer(s) this fails on?
What is the memory (RAM) and CPU?
I'll look into this!
Gabe
…On Thu, Apr 19, 2018, 8:45 PM Jon Sanders ***@***.***> wrote:
Hey guys,
We've been trying to track down a problem while adapting SHOGUN to Qiita,
the symptom of which was finding this message when running integration
tests in Travis:
+ File "/home/travis/build/qiita-spots/qp-shotgun/miniconda3/envs/qp-shotgun/lib/python3.5/site-packages/pandas/core/groupby.py", line 2934, in _get_grouper
+ raise KeyError(gpr)
+ KeyError: 'summary'
@antgonza <https://github.com/antgonza> also was having the same error on
his OS X install, but neither I (on Barnacle) nor @semarpetrus
<https://github.com/semarpetrus> (on his Linux box) were encountering it.
Running SHOGUN directly using the following commands yielded a good
alignment + downstream files on Barnacle:
aln_out=foo.align
database=/home/jgsanders/git_sw/qp-shotgun/qp_shotgun/shogun/databases/shogun
level=species
aligner=burst
threads=8
profile=profile.tsv
aln_out_fp=foo.align/alignment.burst.b6
redistributed="profile.${level}.tsv"
fun_output=functional
shogun align \
--aligner ${aligner} \
--threads ${threads} \
--database ${database} \
--input combined.fna \
--output ${aln_out}
shogun assign_taxonomy \
--aligner ${aligner} \
--database ${database} \
--input ${aln_out_fp} \
--output ${profile}
shogun redistribute \
--database ${database} \
--level ${level} \
--input ${profile} \
--output ${redistributed}
fun_level=$level
shogun functional \
--database ${database} \
--input ${profile} \
--output ${fun_output} \
--level ${fun_level}
where the test database is here
<https://github.com/antgonza/qp-shotgun/blob/shogun/qp_shotgun/shogun/databases/shogun.tar.bz2>
and the input data are here
<https://www.dropbox.com/s/ocu4c0ft8vhbjwx/combined.fna?dl=0>
Running the same align command on an OS X box (using Gabe's supplied
burst15 binary) ran for a bit and then produced an empty .b6 output file.
Running BURST directly on the OS X box produced the following output:
burst15 --references qp_shotgun/shogun/databases/shogun/burst/5min.edx --queries combined.fna --output test.b6 --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7LL]
--> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 8 threads.
--> [Accel] Accelerator found. Parsing...
--> [Accel] Total accelerants: 805949 [bytes = 2106932]
--> [Accel] Reading 0 ambiguous entries
EDB database provided. Parsing...
--> EDB: Fingerprints are DISABLED
--> EDB: Parsing compressed headers
--> EDB: Sheared database (shear size = 515)
--> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.071752). Calculating minMax...
Found min 150, max 150 (0.000109).
Converting queries... Converted (0.007549)
Copying queries... Copied (0.002561)
Sorting queries... Sorted (0.088294)
Copying indices... Copied (0.001531)
Determining uniqueness... Done (0.007544). Number unique: 397338
Collecting unique sequences... Done (0.001721)
Creating data structures... Done (0.004528) [maxED: 4]
Determining query ambiguity... Determined (0.023589)
Creating bins... Created (0.011927); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.194431)
Calculating divergence... Calculated (0.009815) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
Segmentation fault: 11
What do you think?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHrXBvdct9NKb_Ie48fOmdPloFzcherFks5tqS-8gaJpZM4Tcs1z>
.
|
In travis, we are get between 4 GB and 7.5 GB. Note that we are using Sudo-enabled builds more info. Locally, I have a MacBookPro14,3, with 16 GB |
@tanaes SHOGUN doesn't pick up the failed signal from BURST? Python's subprocess call should log it. |
Under default parameters, it gave no output to STDOUT or STDERR, just produced an empty alignment file. |
What command was used to build the database? Also, does the attached linux binary (compiled from the same code used to compile the Mac binary) work on your high-RAM linux systems? Trying to rule out database creation commands as well as differences in code since the older existing linux version. I ran Then aligned with According to my run with /usr/bin/time -v, this took 12GB of RAM to run. Insufficient RAM might then explain the travis failure, but it's unclear what's causing the Mac failure (unless you had over 4GB consumed by other programs at runtime, leaving less than 12GB for burst15). BURST15 will always reserve ~8GB (the size of the index table in the "database15" mode, adjusted for number of threads) plus the size of the database itself (minimum 4GB), so it'll yank 12GB to run (burst12 can run in under 128MB so that's the one recommended for laptops!). |
Thanks! I'll let @tanaes answer those specific questions. Just out of curiosity, will 15/12 yield the same results? Either way, what are the differences? |
@GabeAl The attached binary does indeed segfault on our high memory linux machine. Here's the output (here, the ./burst15 is the one attached above): ☕ barnacle:qp-shotgun $ ./burst15 \
> --references qp_shotgun/shogun/databases/shogun/burst/5min.edx \
> --queries combined.fna \
> --output test.b6 \
> --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7LL]
--> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 24 threads.
--> [Accel] Accelerator found. Parsing...
--> [Accel] Total accelerants: 805949 [bytes = 2106932]
--> [Accel] Reading 0 ambiguous entries
EDB database provided. Parsing...
--> EDB: Fingerprints are DISABLED
--> EDB: Parsing compressed headers
--> EDB: Sheared database (shear size = 515)
--> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.089528). Calculating minMax...
Found min 150, max 150 (0.000125).
Converting queries... Converted (0.007726)
Copying queries... Copied (0.004054)
Sorting queries... Sorted (0.125254)
Copying indices... Copied (0.000616)
Determining uniqueness... Done (0.004894). Number unique: 397338
Collecting unique sequences... Done (0.001327)
Creating data structures... Done (0.006473) [maxED: 4]
Determining query ambiguity... Determined (0.012322)
Creating bins... Created (0.012095); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.322825)
Calculating divergence... Calculated (0.007467) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
Segmentation fault (core dumped)
☕ barnacle:qp-shotgun $ ls
burst15 combined.fna LICENSE qp_shotgun README.rst scripts setup.py support_files test test.b6
☕ barnacle:qp-shotgun $ ~/miniconda/envs/oecophylla-shogun/bin/burst15 \
> --references qp_shotgun/shogun/databases/shogun/burst/5min.edx \
> --queries combined.fna \
> --output test.b6 \
> --accelerator qp_shotgun/shogun/databases/shogun/burst/5min.acx
This is BURST [v0.99.7f]
--> Using accelerator file qp_shotgun/shogun/databases/shogun/burst/5min.acx
Using up to AVX-128 with 24 threads.
--> [Accel] Accelerator found. Parsing...
--> [Accel] Total accelerants: 805949 [bytes = 2106932]
--> [Accel] Reading 0 ambiguous entries
EDB database provided. Parsing...
--> EDB: Fingerprints are DISABLED
--> EDB: Parsing compressed headers
--> EDB: Sheared database (shear size = 515)
--> EDB: 970 refs [970 orig], 61 clumps, 1030 maxR
Parsed 400000 queries (0.085349). Calculating minMax...
Found min 150, max 150 (0.000108).
Converting queries... Converted (0.007505)
Copying queries... Copied (0.004179)
Sorting queries... Sorted (0.131057)
Copying indices... Copied (0.006557)
Determining uniqueness... Done (0.006628). Number unique: 397338
Collecting unique sequences... Done (0.005024)
Creating data structures... Done (0.007195) [maxED: 4]
Determining query ambiguity... Determined (0.018151)
Creating bins... Created (0.016560); Unambig: 391663, ambig: 5675, super-ambig: 0 [5675,397338,397338]
Re-sorting... Re-sorted (0.340644)
Calculating divergence... Calculated (0.007354) [10.120026 avg div; 150 max]
Fingerprints not enabled
Setting QBUNCH to 16
Using ACCELERATOR to align 397338 unique queries...
Search Progress: [100.00%]
Search complete. Consolidating results...
CAPITALIST: Processed 329 investments
Alignment time: 42.566155 seconds What's the difference, again, between burst12 and burst15? Does the database need to be reindexed for one vs the other? |
This is indeed interesting. Could you share the commandline that was used to make the burst database? It seems to differ from what I used here: In any case, there may be a combination bug that arises from some mix of DB commandline and the most recent changes to CAPITALIST (and/or tallying reads in general). A couple questions to help me hone in:
|
As for the difference between burst12 and burst15, burst12 is primarly intended for amplicon databases. It uses a much more RAM-friendly indexing scheme for small databases. For large (>4GB) databases, burst15 is recommended for speed. As such, while the "edx" will work fine between the two versions, the "acx" is specific to one or the other (whichever version was used to make it). |
Awesome, thanks for the clarification. I’ll try remaking the database and
see how it goes.
…On Mon, Apr 23, 2018 at 12:35 PM Gabriel Al-Ghalith < ***@***.***> wrote:
As for the difference between burst12 and burst15, burst12 is primarly
intended for amplicon databases. It uses a much more RAM-friendly indexing
scheme for small databases. For large (>4GB) databases, burst15 is
recommended for speed.
As such, while the "edx" will work fine between the two versions, the
"acx" is specific to one or the other (whichever version was used to make
it).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH6JAPuUnEBqGIqIAi-EKk-kK7o4wVC7ks5trizigaJpZM4Tcs1z>
.
|
Were you able to solve your problem by rebuilding the database? |
I ran into a similar issue. I wasn't able to get SHOGUN working with burst, since the latest official release of burst, v0.99.8, didn't even compile on my Linux machine (the source release contains syntax errors!). So I installed bowtie2 and I ran SHOGUN with |
The source likely doesn't contain syntax errors, it just requires the Intel
compiler and architecture-specific optimization flags because of the
assembly instructions included.
It is *highly, highly* recommended to grab the prebuilt binary for BURST
from the Releases section of the repo.
Thanks,
Gabe
…On Wed, Apr 29, 2020 at 4:22 AM Árpád Goretity ***@***.***> wrote:
I ran into a similar issue. I wasn't able to get SHOGUN working with
burst, since the latest official release of burst, v0.99.8, didn't even
compile on my Linux machine (the source release contains syntax errors!).
So I installed bowtie2 and I ran SHOGUN with --aligner bowtie2. It kept
crunching for about 18 minutes (htop was showing that the bowtie2 process
was running), then I got the KeyError: 'summary' exception from Python. I
don't know if bowtie2 segfaulted though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5NOBSNIDIP6F3NXRJ45MTRO7PVDANCNFSM4E3SZVZQ>
.
|
[edit] D'oh, I found the test files in the very first post! I'm assuming you're using the same ones. I don't have a Mac, but maybe I can spin up a VM to test this.
What's the memory on the machine you're running it on?
Thanks a bunch,
Gabe
…On Thu, Apr 30, 2020 at 12:46 PM Gabe A. ***@***.***> wrote:
The source likely doesn't contain syntax errors, it just requires the
Intel compiler and architecture-specific optimization flags because of the
assembly instructions included.
It is *highly, highly* recommended to grab the prebuilt binary for BURST
from the Releases section of the repo.
Thanks,
Gabe
On Wed, Apr 29, 2020 at 4:22 AM Árpád Goretity ***@***.***>
wrote:
> I ran into a similar issue. I wasn't able to get SHOGUN working with
> burst, since the latest official release of burst, v0.99.8, didn't even
> compile on my Linux machine (the source release contains syntax errors!).
>
> So I installed bowtie2 and I ran SHOGUN with --aligner bowtie2. It kept
> crunching for about 18 minutes (htop was showing that the bowtie2
> process was running), then I got the KeyError: 'summary' exception from
> Python. I don't know if bowtie2 segfaulted though.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#18 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AB5NOBSNIDIP6F3NXRJ45MTRO7PVDANCNFSM4E3SZVZQ>
> .
>
|
Also, what were the commands run to produce the database itself? Databases aren't compatible across major BURST releases. |
Yes. DB15 and DB12 have fundamentally different database structures. Also, major releases of BURST (lettered are minor, numbered are major) also may have incompatibilities. I think this should be detected if an older database or a database made with a different DB version of BURST is used. I believe later versions of burst (i.e. newer than the 0.97 series) will do this detection automatically, but perhaps Shogun should implement this check in the wrapper first, or warn if pointing to a DB it knows it shipped with an earlier version. DB12 is for low-RAM alignment. It is slower, and primarily intended for amplicons. Burst15 is for higher-RAM alignment and intended for shotgun. This is vaguely similar to the difference between bowtie2-align-s and bowtie2-align-l, which are also non-interchangeable, but the python wrapper "bowtie2" sorts out which should be called with which. |
@GabeAl Hey, no, thank you for getting back to this! Just to bring this in context, I'm familiar with building C code from source. It's not an unsupported assembly extension: the syntax error in particular I noticed was a missing closing curly brace here. After I added the closing curly on the next line, the compiler went ahead and complained about a type error here which is an assignment of a I have since tried SHOGUN with the Linux binary downloadable from the same release (which advertises itself as burst15), with no success, unfortunately. Based on what several others suggested above, it might very well be that I simply don't have enough RAM; I'll be able to check this possibility soon, once I have access to a beefier machine. I have 8 GB in my Linux box, which seems to be close but no cigar. The databases I didn't build myself, I simply downloaded the pre-built ones as suggested by the very last paragraph of this part of the README. Cheers, |
Thanks H2CO3! Oh I see -- the current source indeed looks like it's for a WIP version and updates stopped after that. Later versions (completing the WIP, going into the 0.99.8 series, etc) must have never gotten pushed. I will push my local copy up. Done. Let me know. Cheerio, |
Awesome, thanks for that! |
Hey guys,
We've been trying to track down a problem while adapting SHOGUN to Qiita, the symptom of which was finding this message when running integration tests in Travis:
@antgonza also was having the same error on his OS X install, but neither I (on Barnacle) nor @semarpetrus (on his Linux box) were encountering it.
Running SHOGUN directly using the following commands yielded a good alignment + downstream files on Barnacle:
where the test database is here and the input data are here
Running the same align command on an OS X box (using Gabe's supplied burst15 binary) ran for a bit and then produced an empty
.b6
output file.Running BURST directly on the OS X box produced the following output:
What do you think?
The text was updated successfully, but these errors were encountered: