Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: defining "extension" slot as a list of key-value pairs #263

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions examples/embedded/uberon-external.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# curie_map:
# FBbt: http://purl.obolibrary.org/obo/FBbt_
# UBERON: http://purl.obolibrary.org/obo/UBERON_
# ZFA: http://purl.obolibrary.org/obo/ZFA_
# MYINTERNALNAMESPACE: http://my.internal.ns/vocab/
# semapv: https://w3id.org/semapv/
# skos: http://www.w3.org/2004/02/skos/core#
# dc: http://purl.org/dc/terms/
# license: https://w3id.org/sssom/license/unspecified
# mapping_set_id: https://w3id.org/sssom/mappings/12345287613876278135
# extension_definition:
# - key: modified
# value: dc:modified
# - key: mapping_id
# value: MYINTERNALNAMESPACE:mapping_id
# - key: funded_by
# value: foaf:fundedBy
subject_id predicate_id object_id mapping_justification modified mapping_id funded_by
Copy link
Contributor

@gouttegd gouttegd Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the risk of clashes if a future version of SSSOM defines, say, mapping_id as a standard field?

I’d suggest mandating that columns defined by this extension mechanism should be prefixed, e.g. with ext_ or similar.

That is, if an extension is defined as follows:

#extension_definition:
#  - key: mapping_id
#    value: MYINTERNALNAMESPACE:mapping_id

Then in the TSV file it should appear as:

subject_id   predicate_id   [...] ext_mapping_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that or adding a SSSOM version number in the mapping set metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I was musing about before I stopped working on it. The downside of using "ext" prefixes for column names is that I already know projects using SSSOM that really don't want to change their column names which are part of a large metadata model for mappings (with a few organisation specific columns like, ehem, species, Grant_id, and other Shananigans. I would have much liked the prefix solution for cleanliness.

My thought was that if a custom column is defined in the mapping set header, it trumps any definitions in sssom schema. This is super messy, but it's unclear to me how I should even approach this question.

Copy link
Contributor

@gouttegd gouttegd Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me how I should even approach this question.

My suggestion:

  1. Make a list of those extra column names that you already know are used in the wild.
  2. Mark those column names in the spec as being reserved, meaning that future SSSOM versions will never reuse those names.
  3. From that moment on, new extension columns MUST use prefixed names. Existing projects that are currently using “reserved” names MAY continue to do so, though they are encouraged to switch to prefixed versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be the cleanest solution, but it’s a pretty common way of dealing with existing legacy stuff in specifications and standards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very French way to go about it, while the real world clearly works under Mediterranean principles.. People will just not care about this constraint in the spec - the only consequence it will have is that we can't use normal processing tools on many mapping sets in the wild. I know I sound negative but man, after 10 years of lobbying with great charme and all we barely managed to get people to agree to publish ontologies in a common syntax - it has proven nearly impossible to tell these same people to use a common set of annotation properties to describe their metadata.

On the other hand, I cant see a different solution either! I am 51% inclined to go your way. I wouldn't even bother to enumerate existing columns, they are too messy anyways. Just force the ext_.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real-world approach. Look at any IETF RFCs, any IANA registry: they are littered with this kind of “reserved” attributes, keywords, identifiers. Most of them exist for only one reason: to deal with this kind of mess and the fact that many people don’t give a shit about standards, and standards have to take that into account.

UBERON:0000004 skos:exactMatch ZFA:0000047 semapv:UnspecifiedMatching 2022-03-01 M6972619762 NIH:RM1-HG010860-01
UBERON:0000005 skos:exactMatch FBbt:00005157 semapv:UnspecifiedMatching 2022-02-01 M98879876121 NIH:RM1-HG010860-01
35 changes: 33 additions & 2 deletions src/sssom_schema/schema/sssom_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -456,15 +456,38 @@ slots:
slot_uri: rdfs:seeAlso
range: string
multivalued: true
extension:
description: A way to extend SSSOM mapping fields with your use case specific information.
Any extension must be declaced in the extension_definition element on mapping_set level.
Extensions MUST NOT alter the semantics of the mapping, as they will be disregarded by most standard tools,
i.e. not included as part of API requests and others.
range: key value pair
multivalued: true
extension_definition:
description: A way to extend SSSOM mapping fields with your use case specific information.
# can I make the 'value' slot of type resource for this element, but otherwise re-use the generic key-value pair class?
# Or better keep that element optional and simply tread extensions as package that is ignored by transformers?
range: key value pair
multivalued: true
other:
description: Pipe separated list of key value pairs for properties not part of
the SSSOM spec. Can be used to encode additional provenance data.
description: An open ended field in which any information can be added that can be captured as a string. It is recommended
to avoid using this field in favour of the 'comment' or 'extenion fields.
range: string
comment:
description: Free text field containing either curator notes or text generated
by tool providing additional informative information.
slot_uri: rdfs:comment
range: string
key:
description: Generic slot for key, which should correspond to a entity reference
range: string
# Any more restrictions, like alphabetical?
examples:
- value: mapping_id
- value: audience
value:
description: Generic slot for a value with no restrictions
range: string
classes:
mapping set:
description: Represents a set of mappings
Expand Down Expand Up @@ -497,6 +520,8 @@ classes:
- see_also
- other
- comment
- extension
- extension_definition
mapping:
description: Represents an individual mapping between a pair of entities
slots:
Expand Down Expand Up @@ -540,6 +565,7 @@ classes:
- see_also
- other
- comment
- extension
class_uri: owl:Axiom
mapping registry:
description: A registry for managing mapping sets. It holds a set of
Expand All @@ -563,3 +589,8 @@ classes:
- mapping_set_group
- last_updated
- local_name
key value pair:
description: A simple structure to hold key-value pairs
slots:
- key
- value