Kraken2_build step stalling #27

dgolden96 · 2022-07-14T15:35:55Z

Hi there,

I'm continuing to troubleshoot the db-update process for a kraken2 database, and I've hit a wall at the kraken2_build step. The pipeline doesn't throw any errors; it just continues to run indefinitely (12+ hours without failure or completion). It seems similar to the problem described here: DerrickWood/kraken2#428

So far, I've tried to implement the workaround mentioned in the comments of that issue I linked, where you add the --fast-build flag to the kraken2 call in the db-update snakefile, but it doesn't seem to have solved the issue. Any chance you've seen this before and/or have any thoughts on what might be causing it? I definitely have enough RAM. I'm using 28 cores with 16 Gb per core.

Thanks!

nick-youngblut · 2022-07-14T16:40:46Z

I've (thankfully) never experienced that issue. How many genomes are included in the build?

dgolden96 · 2022-07-14T21:01:42Z

The database to be updated is the full GTDB_release207, and the sample TSV I'm trying to add includes ~4,000 genomes

zoey-rw · 2022-07-15T03:16:27Z

A related question: if we instead passed the reads that were unclassified from GTDB into a second database (db-create with only the non-GTDB genomes), should that give similar results as a single database via the db-update workflow? There are methods for combining outputs for the same sample from different databases, though I imagine there could be downstream effects on Bracken estimates.

nick-youngblut · 2022-07-15T08:01:46Z

The downside of a 2-step classification approach versus a 1-step is that there is no direct "competition" during classification across the 2 steps. So, some reads could be falsely classified in the 1st step when they would actually be classified as something in the 2nd step if the 2 reference databases were combined.

MixalisSn · 2023-12-01T11:36:52Z

Same problem here. I ran the kraken2 database building using 40 cores (7 GB each), and after 24 hours the process stalled at this point:

Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 75566900660 bytes
Capacity estimation complete. [37m21.355s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 16 bits reserved for taxid.

nick-youngblut · 2023-12-03T23:03:12Z

@MixalisSn do you think that the stalling could be due to limited memory?

MixalisSn · 2023-12-10T18:58:55Z

@nick-youngblut I thought the 120 GB were enough. Any way, I added the --fast-build flag, using the same resources, and the build was completed successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kraken2_build step stalling #27

Kraken2_build step stalling #27

dgolden96 commented Jul 14, 2022

nick-youngblut commented Jul 14, 2022

dgolden96 commented Jul 14, 2022

zoey-rw commented Jul 15, 2022

nick-youngblut commented Jul 15, 2022

MixalisSn commented Dec 1, 2023

nick-youngblut commented Dec 3, 2023

MixalisSn commented Dec 10, 2023

Kraken2_build step stalling #27

Kraken2_build step stalling #27

Comments

dgolden96 commented Jul 14, 2022

nick-youngblut commented Jul 14, 2022

dgolden96 commented Jul 14, 2022

zoey-rw commented Jul 15, 2022

nick-youngblut commented Jul 15, 2022

MixalisSn commented Dec 1, 2023

nick-youngblut commented Dec 3, 2023

MixalisSn commented Dec 10, 2023