Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy Proposal: Mediating Conflict Resolution via Identifier Space Owner #1397

Open
cthoyt opened this issue Feb 1, 2025 · 1 comment
Open
Labels

Comments

@cthoyt
Copy link
Member

cthoyt commented Feb 1, 2025

I’ve written a draft proposal of a workflow that we could use to resolve conflicts in the curation of Bioregistry records by engaging the relevant nomenclature authority, helping them develop and communicate a first-party policy, and ultimately mirroring that policy in the Bioregistry.

I hope this can help resolve the EC identifier pattern by providing an actionable to-do list, and a clear way to decide when to end discussion and make a decision

Here’s the draft so far: https://docs.google.com/document/d/10FE9EYYg5f4Dph6zsEg8DJogkb6Njcp7gzoSNezsm-g/edit

Please take a look and feel free to comment directly on this google doc.

This is also tangentially related to #755, but with a different purpose and approach.

@cmungall
Copy link
Contributor

cmungall commented Feb 4, 2025

I think the guidelines are great but I am not sure these will help with these kinds of issues in the future.

The root of the issue here is that bioregistry needs to recognize that not every authority represented by an entity cares equally or at all about 4 aspects of an identifier (the "triad")

  1. the prefix
  2. the prefix expansion
  3. the local id
  4. the PURL

For many OBO ontologies of the GO lineage, we care deeply about all 4. We don't like people writing cl:0000001 as it breaks things. We don't like people expanding our PURLs which are used in our RDF/OWL to things like http://identifiers.org/cl/0000001, we insist on OBO PURLs. We certainly don't like people writing CL:1 the digits should be zero padded, even though normalizing this is "trivial"

However, many OBO ontologies such as ones used by MODs don't particularly care about the PURL expansion, they use the CURIE.

And many OBO ontologies (as well as semantic web artefacts) have zero use for non-ephemeral CURIEs and will happily write obo:OBI_nnn or OBI_nnn or OBI:nnn or :OBI_nnn in different semantic web documents, provided it is always in the context of a document in an RDF syntax with appropriate prefix expansions in the header.

Very few non-semantic databases care or understand the prefix expansion part hence we end up with 300 ways to make a "PURL" for EC or HGNC, because the concept of a PURL as semantic identifier is confused with a web URL for humans again and again and again (hey, you can always redirect from http to https, what's the problem? solid/solid-namespace#21)

Some (few) may not care so much about some aspects of the local id. We see this in 'semantic spaces' like EC whose primary output are standards that might have multiple informatics renderings, but for which there is a semi-formal consensus of major databases in how a system is used. We've also seen similar cases where some authorities for more web-centric resources might not be able to decide on things like zero padding. See also cases in bioregistry e.g for UCUM still unresolved.

The whole banana issue was also caused by this

When an authority doesn't care about all 4 aspects of the identifier (and arguably they shouldn't), then the community steps in and consensus sometimes emerges, or if not consensus then different communities of practice. This includes

  • very strong consensus around EC local IDs
  • a previous consensus around prefixes that was broken when identifiers.org did not consult people, made their own, made their own casing rules, and this was inherited by bioregistry
  • different communities of practice around prefix expansions for database entities, with

So many of the problems with identifiers have come from interactions from people representing different subsets of the powerset of the identifier quad, not understanding where the other is coming from.

I think the technical and social aspects of bioregistry need to start by acknowledging this challenge with the quad, and that not all authorities care to be an authority on all 4 aspects and until they are then we need to do what we can to enable transparent community governance.

This will be a big lift for sure, and bioregistry has done so much already! But I think we need to face these challenges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants