-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exact canonical match of resolver includes NCBI subspecies/strains #48
Comments
It is, unfortunately, because susbspecies epithet is capitalized (which is against zoological code rules as far as I know). As a result "Homo sapiens ssp. Denisova" is parsed as an 'unknown' subspecies of species Homo sapiens described by Denisova with canonical form detemined as Homo sapiens |
Any way we can convince NCBI to update their capitalization and conform to the zoological code? Do you know anyone on their taxonomy team? |
another problem is that I do not think anybody officially described Homo sapiens denisova so in a way the result you got is probably correct, as at the moment it should be placed as the same species |
I do not think NCBI pretends to be taxonomically correct, they have all kind of things in their data |
Ok, would also explain matches like those observed in http://www.globalbioticinteractions.org/?sourceTaxon=Phytophthora%20infestans&interactionType=interactsWith , where potato late blight fungus |
Hm, this is actually bad, we need to penalize matching score for situations like this |
There's a bunch of this happening for pathogen-host interactions. I updated the title of this issue to include "strains". Please let me know if you need more examples or how else I can help resolve this. |
I added a ticket to penalize score of matching results if parsing was not 'clean'. In both cases the name strings would be marked as "parsed with mistakes" by gnparser GlobalNamesArchitecture/gnparser#291 |
Great! Closing this issue as duplicate of GlobalNamesArchitecture/gnparser#291 . |
Re-opening issue. It appears that the issue still persists in the https://resolver.globalnames.org , where a search for "Procladius sp1 M_PL_014" results in exact canonical matches to fly strains (NCBI:1981569, NCBI:1981570, NCBI:198157, NCBI:1981572, NCBI:1981573, NCBI:1981574) as well as including the expected canonical match to the fly genus NCBI:191633 . |
when using resolver.globalnames.org with "Homo sapiens", exact canonical matches include both NCBI:9606 (Homo sapiens) and NCBI:741158 (Homo sapiens ssp. Denisova). Is this expected?
The text was updated successfully, but these errors were encountered: