-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trim GG before clustering #1
Comments
Do you need help on this? |
Sure :) Are you sure you have the time? The issue is this: for classification we map against a version of Greengenes that's pre-clustered at 99% id. The clustering should have happened after primer trimming though to make things comparable. So we would need to primer trim Greengenes (discard the ones not matching the primer?) and then cluster at 99%. Andreas |
I have done the trimming and clustering for SILVA in fact. If you want to stick to GG, I can help you on this. |
Cool! Yes please. With Andreas |
That is tricky then... Previously I kept only sequences with species level Chenhao. On Tue, Jan 26, 2016 at 10:48 PM, Andreas Wilm [email protected]
|
Hey guys, thanks for chasing this. Chenhao, do I understand correctly regarding retaining only sequences with species level assignment in the gg_13_5.fasta file. That 99_OTU_taxonomy.txt file contains, 203,452 entries. Only 16,869 of these can be assigned to one species. In total we have 639 unique species in there. If your suggestion is to only keep the 16,869, it seems drastic to cut out so many entries. #command to find out how many entries have species level designations. Cheers, |
Let's not do that. This will just introduce a bias. I'm happy to live with
|
The classification database (99% OTU) should have been trimmed before clustering instead of using the preclustered database. Pointed out by Christophe LAY
The text was updated successfully, but these errors were encountered: