Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extension slots #375

Merged
merged 10 commits into from
Aug 6, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Next

- Add the concept of "propagatable slots".
- Add the concept of "extension slots".

## SSSOM version 0.15.1

Expand Down
23 changes: 23 additions & 0 deletions examples/schema/extension-slots.sssom.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#curie_map:
# COMENT: https://example.com/entities/
# EXPROP: https://example.org/properties/
# ORGENT: https://example.org/entities/
#mapping_set_id: https://example.org/sets/exo2c-with-extensions
#mapping_set_title: Sample set EXO2C with extension slots
#license: https://creativecommons.org/licenses/by/4.0/
#extension_definitions:
# - slot_name: ext_bar
# property: EXPROP:barProperty
# type_hint: xsd:integer
# - slot_name: ext_baz
# property: EXPROP:bazProperty
# type_hint: linkml:Uriorcurie
matentzn marked this conversation as resolved.
Show resolved Hide resolved
# - slot_name: ext_foo
# property: EXPROP:fooProperty
#ext_foo: Foo A
#ext_undeclared_foo: Foo B
subject_id subject_label predicate_id object_id object_label mapping_justification ext_bar ext_baz ext_undeclared_baz
ORGENT:0001 alice skos:closeMatch COMENT:0011 alpha semapv:ManualMappingCuration 111 ORGENT:BAZ_0001 BAZ A
ORGENT:0002 bob skos:closeMatch COMENT:0012 beta semapv:ManualMappingCuration 112 ORGENT:BAZ_0002
ORGENT:0004 daphne skos:closeMatch COMENT:0014 delta semapv:ManualMappingCuration 114 Baz C
ORGENT:0005 eve skos:closeMatch COMENT:0015 epsilon semapv:ManualMappingCuration 115 ORGENT:BAZ_0005 Baz E
34 changes: 34 additions & 0 deletions src/docs/spec-formats-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,28 @@ For any given propagatable slot, condensation is only allowed if (1) all mapping
Implementations that support condensation MUST also support propagation.


## Non-standard slots

If an implementation does not support [non-standard slots](spec-model.md#non-standard-slots), then:

* a SSSOM/TSV reader MUST discard any unknown top-level YAML key in the metadata block, and any unknown TSV column in the TSV section;
* a SSSOM/TSV reader MUST NOT write any unknown top-level YAML key in the metadata block, or any unknown TSV column in the TSV section.
gouttegd marked this conversation as resolved.
Show resolved Hide resolved

### Support for defined extensions

This section applies to implementations that supports defined extensions.

A SSSOM/TSV reader MUST check the validity of the extension definitions listed in the `extension_definitions` slot in the YAML metadata block:

* definitions with no `slot_name`, or with a `slot_name` that is not a XML non-colonized name, MUST be ignored;
* definitions with any unexpected content (e.g. other keys than just `slot_name`, `property`, and `type_hint`) MUST be ignored;
* the `property` and `type_hint` values for a given definition, if present, MUST be CURIEs and MUST be resolvable using the mapping set’s `curie_map`, otherwise the definition MUST be ignored.

A SSSOM/TSV reader MUST, upon encountering a non-standard YAML key in the metadata block or an unknown TSV column, check that the name of the key or of the column matches the `slot_name` of one of the extension definitions listed in the mapping set’s `extension_definitions` slot. If there is no match, the non-standard slot MUST be discarded.

Upon encountering a non-standard slot whose corresponding definition has a `type_hint` of `https://w3id.org/linkml/Uriorcurie`, the reader SHOULD check that the value is a CURIE and is resolvable using the mapping set’s `curie_map`.


## Compatibility with previous versions of the specification

Implementations MUST support the current version of the specification. However, SSSOM/TSV parsers MAY additionally accept to parse files that were compliant to a previous version. This section provides advice for implementations willing to support older versions.
Expand Down Expand Up @@ -208,6 +230,13 @@ When writing the metadata block, a canonical SSSOM/TSV writer:
* MUST NOT include in the CURIE map any prefix name that is not used anywhere in the set;
* MUST sort the prefix names in the CURIE map in lexicographical order.

In addition, if [extension slots](spec-model.md#non-standard-slots) are supported, the writer:

* MUST write any extension slot in the mapping set _after_ the standard slots;
* MUST sort the extension slots lexicographically on the `property` of their corresponding extension definitions;
* MUST sort extension definitions on their `property` value;
* MUST not include an extension definition if the corresponding extension is not used anywhere in the set.
matentzn marked this conversation as resolved.
Show resolved Hide resolved


### Rules for the mappings block

Expand All @@ -218,6 +247,11 @@ When writing the mappings block, a canonical SSSOM/TSV writer:
* MUST write the columns in the order the slots appear in the [“Slots” table](Mapping.md#slots), in the documentation for the `Mapping` class;
* MUST sort the mappings in lexicographical order on all their slots, in the order the slots appear in the [“Slots” table](Mapping.md#slots).

In addition, if [extension slots](spec-model.md#non-standard-slots) are supported, the writer:

* MUST write any non-standard column _after_ the standard columns;
* MUST sort the non-standard column lexicographically on the `property` of their corresponding extension definitions.


## Examples

Expand Down
2 changes: 2 additions & 0 deletions src/docs/spec-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,5 @@ Throughout the specification, the following IRI prefix names are used:
| semapv | https://w3id.org/semapv/vocab/ |
| skos | http://www.w3.org/2004/02/skos/core# |
| sssom | https://w3id.org/sssom/ |
| xsd | http://www.w3.org/2001/XMLSchema# |
| linkml | https://w3id.org/linkml/ |
63 changes: 63 additions & 0 deletions src/docs/spec-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,66 @@ In addition, predicates from the following sources MAY also be encouraged:

* any relation from the [Relation Ontology (RO)](https://obofoundry.org/ontology/ro.html);
* any relation under [skos:mappingRelation](http://www.w3.org/2004/02/skos/core#mappingRelation) in the [Semantic Mapping Vocabulary](https://mapping-commons.github.io/semantic-mapping-vocabulary/).

## Non-standard slots

<a id="non-standard-slots"></a>

Implementations are only REQUIRED to support the standard metadata slots defined in the LinkML model.
matentzn marked this conversation as resolved.
Show resolved Hide resolved

However, implementations MAY support the use of supplementary, non-standard slots (hereafter called _extension slots_ or simply _extensions_). There are two types of extension slots: _defined_ extension slots and _undefined_ extension slots.

### Defined extensions

Defined extensions are non-standard slots that are explicitly declared (or, _defined_) before being used. Implementations SHOULD support the use of defined extensions.

Extensions are defined in the `extension_definition` slot of the `MappingSet` object. Each definition is comprised of three elements:

* the name of the slot, as it will appear when used in a mapping set (`slot_name`);
* a property intended to specify the meaning of the slot (`property`);
* the type of values expected by the slot (`type_hint`).

A definition MUST have at least a `slot_name`. The name MUST be a XML “non-colonized name” (“NCName”, see [Namespaces in XML, §2](https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName)). The name MUST NOT match the name of an existing standard slot.

To avoid any conflicy with a future version of the SSSOM specification (which could introduce new standard slot names), implementations are strongly encouraged to craft extension slot names that start with the `ext_` prefix. No new standard slot with a name starting with `ext_` will ever be introduced in any future version of the standard. (This is an advice for SSSOM producers only; SSSOM consumers MUST NOT reject an extension slot solely on the basis that its name does not start with `ext`.)

A definition SHOULD have a `property`. If it does not, implementations MUST automatically construct a default property by catenating the prefix `http://sssom.invalid/` with the name of the extension.
gouttegd marked this conversation as resolved.
Show resolved Hide resolved

The slot name and the property MUST be unique to each definition. No two definitions can share the same name and/or the same property.
matentzn marked this conversation as resolved.
Show resolved Hide resolved

A definition MAY have a `type_hint`. If it does not, a default type of `http://www.w3.org/2001/XMLSchema#string` is assumed.

Once defined, an extension slot may be used as a supplementary slot in either the `Mapping` class or the `MappingSet` class (or both), as if it was a normal, standard slot. How those slots are represented internally and provided to client code is left at the discretion of the implementations.

### Undefined extensions

Undefined extensions are non-standard slots that are not explicitly defined as described in the previous section. Implementations MAY support undefined extensions.

Upon encountering a non-standard slot that is not a defined extension, an implementation that supports undefined extensions MUST behave as if the slot had been defined with:

* a `property` constructed by catenating the prefix `http://sssom.invalid/` to the name of the slot;
matentzn marked this conversation as resolved.
Show resolved Hide resolved
* a `type_hint` of `http://www.w3.org/2001/XMLSchema#string`.

### Restrictions on the values of extension slots

#### General restrictions

The following restrictions apply to all extension slots, regardless of whether they are defined or undefined.

Each mapping set and each mapping can have at most _one_ value for each extension slot. The expected behaviour upon encountering a repeated extension slot is unspecified.

All extension values MUST be representable as literal strings. Complex data structures (e.g., lists or dictionaries) MUST NOT be used.
matentzn marked this conversation as resolved.
Show resolved Hide resolved

#### Further restrictions for typed defined extensions

If a defined extension slot has a `type_hint` other than `http://www.w3.org/2001/XMLSchema#string`, implementations MAY enforce further constraints on extension values based on the type hint, according to the following table:

| Type hint | Constraints |
| --------- | ----------- |
| http://www.w3.org/2001/XMLSchema#integer | Implementations MAY check that the value is an integer |
| http://www.w3.org/2001/XMLSchema#double | Implementations MAY check that the value is a floating number |
| http://www.w3.org/2001/XMLSchema#boolean | Implementations MAY check that the value is either `true` or `false` |
| http://www.w3.org/2001/XMLSchema#date | Implementations MAY check that the value is a date in the ISO 8601 format (`yyyy-mm-dd`) |
| http://www.w3.org/2001/XMLSchema#datetime | Implementations MAY check that the value is a date and time value in the ISO 8601 format (`yyyy-mm-ddThh:mm:ssTZ`) |

Implementations MAY decide to recognise more types and to enforce type-specific constraints. For example, an implementation could recognise the type `http://www.w3.org/2001/XMLSchema#negativeInteger` and check that the value starts with a minus sign.
23 changes: 23 additions & 0 deletions src/sssom_schema/schema/sssom_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,13 @@ slots:
by tool providing additional informative information.
slot_uri: rdfs:comment
range: string
extension_definitions:
description: A list that defines the extension slots used in the mapping set.
range: extension definition
multivalued: true
see_also:
- https://github.com/mapping-commons/sssom/issues/328
- https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv
classes:
mapping set:
description: Represents a set of mappings
Expand Down Expand Up @@ -655,6 +662,7 @@ classes:
- issue_tracker
- other
- comment
- extension_definitions
mapping:
description: Represents an individual mapping between a pair of entities
slots:
Expand Down Expand Up @@ -770,6 +778,21 @@ classes:
- mapping_set_group
- last_updated
- local_name
extension definition:
description: A definition of an extension (non-standard) slot.
attributes:
slot_name:
description: The name of the extension slot.
range: ncname
required: true
property:
description: The property associated with the extension slot. It is
intended to provide a non-ambiguous meaning to the slot (contrary
to the name, which for brevity reasons may be ambiguous).
gouttegd marked this conversation as resolved.
Show resolved Hide resolved
range: uriorcurie
type_hint:
description: Expected type of the values of the extension slot.
range: uriorcurie
Propagatable:
class_uri: sssom:Propagatable
description: Metamodel extension class to describe slots whose value can be
Expand Down
Loading