Skip to content

Commit

Permalink
Specify the use of non-standard slots.
Browse files Browse the repository at this point in the history
We add a section in the description of the data model to explain what
extension slots are and how they should be used.

We add a section in the specification of the SSSOM/TSV format to explain
how extension slots should be managed in that format.
  • Loading branch information
gouttegd committed Jul 23, 2024
1 parent 5755fa5 commit 184521f
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Next

- Add the concept of "propagatable slots".
- Add the concept of "extension slots".

## SSSOM version 0.15.1

Expand Down
34 changes: 34 additions & 0 deletions src/docs/spec-formats-tsv.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,28 @@ For any given propagatable slot, condensation is only allowed if (1) all mapping
Implementations that support condensation MUST also support propagation.


## Non-standard slots

If an implementation does not support [non-standard slots](spec-model.md#non-standard-slots), then:

* a SSSOM/TSV reader MUST discard any unknown top-level YAML key in the metadata block, and any unknown TSV column in the TSV section;
* a SSSOM/TSV reader MUST NOT write any unknown top-level YAML key in the metadata block, or any unknown TSV column in the TSV section.

### Support for defined extensions

This section applies to implementations that supports defined extensions.

A SSSOM/TSV reader MUST check the validity of the extension definitions listed in the `extension_definitions` slot in the YAML metadata block:

* definitions with no `slot_name`, or with a `slot_name` that is not a XML non-colonized name, MUST be ignored;
* definitions with any unexpected content (e.g. other keys than just `slot_name`, `property`, and `type_hint`) MUST be ignored;
* the `property` and `type_hint` values for a given definition, if present, MUST be CURIEs and MUST be resolvable using the mapping set’s `curie_map`, otherwise the definition MUST be ignored.

A SSSOM/TSV reader MUST, upon encountering a non-standard YAML key in the metadata block or an unknown TSV column, check that the name of the key or of the column matches the `slot_name` of one of the extension definitions listed in the mapping set’s `extension_definitions` slot. If there is no match, the non-standard slot MUST be discarded.

Upon encountering a non-standard slot whose corresponding definition has a `type_hint` of `https://w3id.org/linkml/uriOrCurie`, the reader SHOULD check that the value is a CURIE and is resolvable using the mapping set’s `curie_map`.


## Compatibility with previous versions of the specification

Implementations MUST support the current version of the specification. However, SSSOM/TSV parsers MAY additionally accept to parse files that were compliant to a previous version. This section provides advice for implementations willing to support older versions.
Expand Down Expand Up @@ -208,6 +230,13 @@ When writing the metadata block, a canonical SSSOM/TSV writer:
* MUST NOT include in the CURIE map any prefix name that is not used anywhere in the set;
* MUST sort the prefix names in the CURIE map in lexicographical order.

In addition, if [extension slots](spec-model.md#non-standard-slots) are supported, the writer:

* MUST write any extension slot in the mapping set _after_ the standard slots;
* MUST sort the extension slots lexicographically on the `property` of their corresponding extension definitions;
* MUST sort extension definitions on their `property` value;
* MUST not include an extension definition if the corresponding extension is not used anywhere in the set.


### Rules for the mappings block

Expand All @@ -218,6 +247,11 @@ When writing the mappings block, a canonical SSSOM/TSV writer:
* MUST write the columns in the order the slots appear in the [“Slots” table](Mapping.md#slots), in the documentation for the `Mapping` class;
* MUST sort the mappings in lexicographical order on all their slots, in the order the slots appear in the [“Slots” table](Mapping.md#slots).

In addition, if [extension slots](spec-model.md#non-standard-slots) are supported, the writer:

* MUST write any non-standard column _after_ the standard columns;
* MUST sort the non-standard column lexicographically on the `property` of their corresponding extension definitions.


## Examples

Expand Down
63 changes: 63 additions & 0 deletions src/docs/spec-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,66 @@ In addition, predicates from the following sources MAY also be encouraged:

* any relation from the [Relation Ontology (RO)](https://obofoundry.org/ontology/ro.html);
* any relation under [skos:mappingRelation](http://www.w3.org/2004/02/skos/core#mappingRelation) in the [Semantic Mapping Vocabulary](https://mapping-commons.github.io/semantic-mapping-vocabulary/).

## Non-standard slots

<a id="non-standard-slots"></a>

Implementations are only REQUIRED to support the standard metadata slots defined in the LinkML model.

However, implementations MAY support the use of supplementary, non-standard slots (hereafter called _extension slots_ or simply _extensions_). There are two types of extension slots: _defined_ extension slots and _undefined_ extension slots.

### Defined extensions

Defined extensions are non-standard slots that are explicitly declared (or, _defined_) before being used. Implementations SHOULD support the use of defined extensions.

Extensions are defined in the `extension_definition` slot of the `MappingSet` object. Each definition is comprised of three elements:

* the name of the slot, as it will appear when used in a mapping set (`slot_name`);
* a property intended to specify the meaning of the slot (`property`);
* the type of values expected by the slot (`type_hint`).

A definition MUST have at least a `slot_name`. The name MUST be a XML “non-colonized name” (“NCName”, see [Namespaces in XML, §2](https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName)). The name MUST NOT match the name of an existing standard slot.

To avoid any conflicy with a future version of the SSSOM specification (which could introduce new standard slot names), implementations are strongly encouraged to craft extension slot names that start with the `ext_` prefix. No new standard slot with a name starting with `ext_` will ever be introduced in any future version of the standard. (This is an advice for SSSOM producers only; SSSOM consumers MUST NOT reject an extension slot solely on the basis that its name does not start with `ext`.)

A definition SHOULD have a `property`. If it does not, implementations MUST automatically construct a default property by catenating the prefix `http://sssom.invalid/` with the name of the extension.

The slot name and the property MUST be unique to each definition. No two definitions can share the same name and/or the same property.

A definition MAY have a `type_hint`. If it does not, a default type of `http://www.w3.org/2001/XMLSchema#string` is assumed.

Once defined, an extension slot may be used as a supplementary slot in either the `Mapping` class or the `MappingSet` class (or both), as if it was a normal, standard slot. How those slots are represented internally and provided to client code is left at the discretion of the implementations.

### Undefined extensions

Undefined extensions are non-standard slots that are not explicitly defined as described in the previous section. Implementations MAY support undefined extensions.

Upon encountering a non-standard slot that is not a defined extension, an implementation that supports undefined extensions MUST behave as if the slot had been defined with:

* a `property` constructed by catenating the prefix `http://sssom.invalid/` to the name of the slot;
* a `type_hint` of `http://www.w3.org/2001/XMLSchema#string`.

### Restrictions on the values of extension slots

#### General restrictions

The following restrictions apply to all extension slots, regardless of whether they are defined or undefined.

Each mapping set and each mapping can have at most _one_ value for each extension slot. The expected behaviour upon encountering a repeated extension slot is unspecified.

All extension values MUST be representable as literal strings. Complex data structures (e.g., lists or dictionaries) MUST NOT be used.

#### Further restrictions for typed defined extensions

If a defined extension slot has a `type_hint` other than `http://www.w3.org/2001/XMLSchema#string`, implementations MAY enforce further constraints on extension values based on the type hint, according to the following table:

| Type hint | Constraints |
| --------- | ----------- |
| http://www.w3.org/2001/XMLSchema#integer | Implementations MAY check that the value is an integer |
| http://www.w3.org/2001/XMLSchema#double | Implementations MAY check that the value is a floating number |
| http://www.w3.org/2001/XMLSchema#boolean | Implementations MAY check that the value is either `true` or `false` |
| http://www.w3.org/2001/XMLSchema#date | Implementations MAY check that the value is a date in the ISO 8601 format (`yyyy-mm-dd`) |
| http://www.w3.org/2001/XMLSchema#datetime | Implementations MAY check that the value is a date and time value in the ISO 8601 format (`yyyy-mm-ddThh:mm:ssTZ`) |

Implementations MAY decide to recognise more types and to enforce type-specific constraints. For example, an implementation could recognise the type `http://www.w3.org/2001/XMLSchema#negativeInteger` and check that the value starts with a minus sign.

0 comments on commit 184521f

Please sign in to comment.