RFC Prefix normalised PubMed ids with pmid: #25

lnielsen · 2016-10-28T08:54:39Z

The following PubMed ID is not correctly detected because it is also a valid EAN8 number:
https://www.ncbi.nlm.nih.gov/pubmed/?term=26037202

>>> import idutils
>>> idutils.is_pmid('26037202')
<_sre.SRE_Match at 0x10b774608>
>>> idutils.detect_identifier_schemes('26037202')
['ean8’]
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I think the main problems is when scheme detection is used together with normalisation:

>>> idutils.normalize_pmid('pmid:26037202')
'26037202'
>>> idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202'))
['ean8']
>>> idutils.detect_identifier_schemes('pmid:26037202')
['pmid']

I would propose that we change PubMed normalisation to include pmid: prefix so that the following holds true:

idutils.detect_identifier_schemes(idutils.normalize_pmid('pmid:26037202')) == idutils.detect_identifier_schemes('pmid:26037202')

This is not strictly correct, but having just integers as identifiers is a bad idea anyway.

The text was updated successfully, but these errors were encountered:

lnielsen added Type: bug Type: RFC Size: easy Need: information labels Oct 28, 2016

lnielsen added this to the v0.3.0 milestone Oct 28, 2016

lnielsen self-assigned this Oct 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC Prefix normalised PubMed ids with pmid: #25

RFC Prefix normalised PubMed ids with pmid: #25

lnielsen commented Oct 28, 2016 •

edited

Loading

RFC Prefix normalised PubMed ids with pmid: #25

RFC Prefix normalised PubMed ids with pmid: #25

Comments

lnielsen commented Oct 28, 2016 • edited Loading

lnielsen commented Oct 28, 2016 •

edited

Loading