-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how does kraken differentiate masked genomes #122
Comments
That shouldn't happen. Any kmers with "N" don't end up in the database. |
I actually build a krakenHLL database on the cleaned eupathDB and this kmer mapping show up in my reports. Do you know why this is happening? Thanks! |
Can you send me both the read causing this and the Kraken output showing this match? Please email the Kraken line and the read to [email protected] |
@zhaoc1 , maybe this is a KrakenHLL issue. You made a database with nodes for sequences and genomes, right? Did you restart the building process? Maybe the mapping file is not up to date. |
@fbreitwieser this is how I built the KrakenHLL database, after downloading the eupathDBclean files and seqid2taxid.map DBNAME=testDB
krakenhll-download --db $DBNAME taxonomy
cp my_eupathDB_folder/*. $DBNAME/library
cp seqid2taxid.map $DBNAME/library
krakenhll-build --db $DBNAME --taxids-for-genomes --taxids-for-sequences --kmer-len 25 --threads 8 Could you please explain during which step the mapping file is not up to date, or is there any potential problems here? Thank you! |
Hi,
I got the masked genomes from the eupathDBclean, and built a kraken database on it (kmer = 25). By looking back at the reports, some of the kmers fully mapped to the masked sequences, e.g.
Therefore, I am wondering how does Kraken deal with the masked (low complexity) genome regions? How can I filter out those false positive kmer results that mapped to the masked region? Thank you!
Best,
Chunyu
The text was updated successfully, but these errors were encountered: