Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exact canonical match of resolver includes NCBI subspecies/strains #48

Open
jhpoelen opened this issue May 10, 2016 · 10 comments
Open

exact canonical match of resolver includes NCBI subspecies/strains #48

jhpoelen opened this issue May 10, 2016 · 10 comments

Comments

@jhpoelen
Copy link

when using resolver.globalnames.org with "Homo sapiens", exact canonical matches include both NCBI:9606 (Homo sapiens) and NCBI:741158 (Homo sapiens ssp. Denisova). Is this expected?

@dimus
Copy link
Member

dimus commented May 10, 2016

It is, unfortunately, because susbspecies epithet is capitalized (which is against zoological code rules as far as I know). As a result "Homo sapiens ssp. Denisova" is parsed as an 'unknown' subspecies of species Homo sapiens described by Denisova with canonical form detemined as Homo sapiens

@jhpoelen
Copy link
Author

Any way we can convince NCBI to update their capitalization and conform to the zoological code? Do you know anyone on their taxonomy team?

@dimus
Copy link
Member

dimus commented May 10, 2016

another problem is that I do not think anybody officially described Homo sapiens denisova so in a way the result you got is probably correct, as at the moment it should be placed as the same species

@dimus
Copy link
Member

dimus commented May 10, 2016

I do not think NCBI pretends to be taxonomically correct, they have all kind of things in their data

@jhpoelen
Copy link
Author

Ok, would also explain matches like those observed in http://www.globalbioticinteractions.org/?sourceTaxon=Phytophthora%20infestans&interactionType=interactsWith , where potato late blight fungus
(Phytophthora infestans) is resolved to https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=403677 (Phytophthora infestans T30-4, a strain) and https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=4787 (Phytophthora infestans, the actual species)?

@dimus
Copy link
Member

dimus commented May 10, 2016

Hm, this is actually bad, we need to penalize matching score for situations like this

@jhpoelen jhpoelen changed the title exact canonical match of resolver includes NCBI subspecies exact canonical match of resolver includes NCBI subspecies/strains May 10, 2016
@jhpoelen
Copy link
Author

There's a bunch of this happening for pathogen-host interactions. I updated the title of this issue to include "strains". Please let me know if you need more examples or how else I can help resolve this.

@dimus
Copy link
Member

dimus commented May 10, 2016

I added a ticket to penalize score of matching results if parsing was not 'clean'. In both cases the name strings would be marked as "parsed with mistakes" by gnparser GlobalNamesArchitecture/gnparser#291

@jhpoelen
Copy link
Author

Great! Closing this issue as duplicate of GlobalNamesArchitecture/gnparser#291 .

@jhpoelen
Copy link
Author

jhpoelen commented Jul 6, 2018

Re-opening issue. It appears that the issue still persists in the https://resolver.globalnames.org , where a search for "Procladius sp1 M_PL_014" results in exact canonical matches to fly strains (NCBI:1981569, NCBI:1981570, NCBI:198157, NCBI:1981572, NCBI:1981573, NCBI:1981574) as well as including the expected canonical match to the fly genus NCBI:191633 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants