Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Prefix normalised PubMed ids with pmid: #25

Open
lnielsen opened this issue Oct 28, 2016 · 0 comments
Open

RFC Prefix normalised PubMed ids with pmid: #25

lnielsen opened this issue Oct 28, 2016 · 0 comments

Comments

@lnielsen
Copy link
Member

lnielsen commented Oct 28, 2016

The following PubMed ID is not correctly detected because it is also a valid EAN8 number:
https://www.ncbi.nlm.nih.gov/pubmed/?term=26037202

>>> import idutils
>>> idutils.is_pmid('26037202')
<_sre.SRE_Match at 0x10b774608>
>>> idutils.detect_identifier_schemes('26037202')
['ean8’]
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I think the main problems is when scheme detection is used together with normalisation:

>>> idutils.normalize_pmid('pmid:26037202')
'26037202'
>>> idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202'))
['ean8']
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I would propose that we change PubMed normalisation to include pmid: prefix so that the following holds true:

idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202')) == idutils.detect_identifier_schemes('pmid:26037202')

This is not strictly correct, but having just integers as identifiers is a bad idea anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant