Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the literal profile idea #382

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 9 additions & 67 deletions src/sssom_schema/schema/sssom_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ enums:
meaning: rdfs:Datatype
rdf property:
meaning: rdf:Property
sssom literal: A special type of literal that is used in SSSOM files to express that the subject_id is not an entity reference.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current path based on your proposal @gouttegd:

  1. We only allow literals in the subject position.
  2. We introduce a sssom literal as a special case in the already existing entity_type field (as you suggested)
  3. If that type is set, "literal mapping" mode is active (with a small number of new fields, see below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only allow literals in the subject position.

Why? Seems like an needless complication to me.

In particular, this means that literal mappings cannot possibly be inverted. And I don’t know about the use cases of those who would manipulate literal mappings, but personally I invert (non-literal) mappings all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, saw your comment below about how this would “go against the whole idea of the format”.

Not convinced, sorry. Once we admit that we can have mapping where one side is a literal rather than an entity, I don’t see how it matters which side is the literal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooookeeeeeeee :P I will fix it. Now sleeping time! :D

predicate_modifier_enum:
permissible_values:
Not: Negating the mapping predicate. The meaning of the triple becomes subject_id is not a predicate_id match to object_id.
Expand Down Expand Up @@ -118,15 +119,17 @@ slots:
subject_id:
description: The ID of the subject of the mapping.
range: EntityReference
required: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate side effect, need some help to ensuring that either this or subject_literal is required.. (@gouttegd generally feel free to just commit to this branch, we will recreate it nicely once the design is finished)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is feasible in pure LinkML.¹ So it would have to be done on top of LinkML, by adding a paragraph to the spec to explicitly state that subject_id MUST have a value, unless the subject_type is sssom literal, in which case it is subject_label (or subject_literal if I did not push you enough to accept reusing subject_label) that MUST have a value.

Implementations that use the Python LinkML runtime may need to be updated to cope with that change, if they were previously relying on the runtime to check that subject_id was always set (I don’t know if that’s the case of SSSOM-Py). But for non-Python implementations, this will not change much, since the absence of a LinkML runtime for any other language than Python means that those implementations already had to “manually” enforce the required constraint anyway.


¹ Feel free to ask people who know LinkML better than I do (so, approximately everyone) just in case. But as far as I can tell, we cannot express variable constraints in LinkML, such as “this slot is required IFF that other slot has this particular value”.

mappings:
- owl:annotatedSource
slot_uri: owl:annotatedSource
examples:
- value: HP:0009894
description: The CURIE denoting the Human Phenotype Ontology concept of 'Thickened ears'
literal:
subject_literal:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather envisioning reusing subject_label rather than creating a new slot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but subject_label has a very different purpose than subject_literal. Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on, so I can probably be convinced to using subject_label as you say. My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly, but probably, I am just introducing churn... One more push with a ring finger and you will have me convinced!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on

Yes it does. And I think it is a better solution (not perfect, but better) than adding a new slot that is only meaningful when the “literal mode is on” (i.e. when subject_type == sssom literal).

If we reuse subject_label, all we have to do is to say that subject_label is normally the label of the entity being mapped, or the literal being mapped if the mapping is a literal mapping. Full stop.

If we create subject_literal, we not only need to explain what it is (the literal being mapped), but also deal with the possible incorrect use of both subject_literal (which must not be used for non-literal mappings) and subject_label (which must not be used for literal mappings). Another needless complication.

Copy link
Contributor

@gouttegd gouttegd Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly

I am going to be blunt (blunter than usual, I mean :P ): the SSSOM data model has not been designed to separate things cleanly.

A “clean design” would not have put all the metadata slots directly in the mapping class, resulting in all those slots that exist in two versions (one for the subject side, another for the object side) such as subject_label and object_label, subject_preprocessing and object_preprocessing, etc.

Instead, a clean design would have another class specifically to represent an entity being mapped (let’s call it entity), without consideration for whether it is the subject or the object. That class would have had slots such as id, label, preprocessing, source, etc. And the mapping class would have had two slots subject and object, both accepting a value of type entity.

This would have been much cleaner, would have make manipulating the mappings and their entities much easier (inverting a mapping, for example, would simply have involved swapping the subject and the object slot, instead of having to swap all the subject_* and object_* slots). It would also have made extending the model to deal with new types of entities easier as well, since it would just have required adding new subclasses of entity. We could even have dispensed with the subject|object_type slots, since the type of an entity would have been naturally indicated by which subclass of entity is used.

My understanding, based on some old discussions that I have read in this repo, is that such a design has been deliberately avoided because there was a clear preference for a data model that was much closer to the TSV serialisation (i.e., where each column of the TSV maps directly to one slot in the mapping class).

I do not criticize that choice, but now we have to stick to the “flat” model that we have as a result. Trying to introduce some clean object-oriented design on top of it would only make things more confusing. (Hence my refusal to the entire idea of having a separate literal mapping class.)

description: The literal being mapped
see_also:
- https://mapping-commons.github.io/sssom/sssom-profiles/
- https://github.com/mapping-commons/sssom/issues/234
range: string
required: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist on having that separate slot (rather than reusing subject_label), you need to remove that required.

mappings:
Expand All @@ -135,7 +138,7 @@ slots:
examples:
- value: "Alzheimer"
description: A string referring to some thing.
literal_datatype:
subject_literal_datatype:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only new field, the rest can be reused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really necessary? Do we expect people to map other values than pure strings (like, integers or dates)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific one may not be, but other like subject_literal_language may become relevant quickly. But I can be convinced to keep things essential and drop it entirely. I kept it mostly to illustrate that I want to be able to extend the slots for the literal without touching the rest of the standard..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that if we decided to reuse subject_label rather than creating subject_literal, we could add a subject_label_language which could also be meaningful for non-literal mappings…

description: The datatype of the literal being mapped
range: uri
required: false
Expand Down Expand Up @@ -386,20 +389,6 @@ slots:
examples:
- value: http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl
description: (A persistent Version IRI pointing to the Mondo version '2021-01-30')
literal_source:
description: URI of ontology source for the literal.
range: EntityReference
examples:
- value: obo:mondo.owl
description: A persistent OBO CURIE pointing to the latest version of the Mondo ontology.
- value: wikidata:Q7876491
description: A Wikidata identifier for the Uberon ontology resource.
literal_source_version:
description: Version IRI or version string of the source of the literal.
range: string
examples:
- value: http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl
description: (A persistent Version IRI pointing to the Mondo version '2021-01-30')
object_source:
description: URI of vocabulary or identifier source for the object.
range: EntityReference
Expand Down Expand Up @@ -526,13 +515,6 @@ slots:
examples:
- value: semapv:Stemming
- value: semapv:StopWordRemoval
literal_preprocessing:
description: Method of preprocessing applied to the literal.
range: EntityReference
multivalued: true
examples:
- value: semapv:Stemming
- value: semapv:StopWordRemoval
curation_rule:
description: A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping.
Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation
Expand Down Expand Up @@ -661,6 +643,8 @@ classes:
- subject_id
- subject_label
- subject_category
- subject_literal
- subject_literal_datatype
- predicate_id
- predicate_label
- predicate_modifier
Expand Down Expand Up @@ -698,52 +682,10 @@ classes:
- object_preprocessing
- semantic_similarity_score
- semantic_similarity_measure
- see_also
- issue_tracker_item
- other
- comment
class_uri: owl:Axiom
literal mapping:
description: Represents an individual mapping between a literal and an entity.
Note that this schema has been created on 01.08.2023 and is subject to change.
see_also:
- https://mapping-commons.github.io/sssom/sssom-profiles/
slots:
- literal
- literal_datatype
- predicate_id
- predicate_label
- predicate_modifier
- object_id
- object_label
- object_category
- mapping_justification
- author_id
- author_label
- reviewer_id
- reviewer_label
- creator_id
- creator_label
- license
- literal_source
- literal_source_version
- object_type
- object_source
- object_source_version
- mapping_provider
- mapping_source
- mapping_cardinality
- mapping_tool
- mapping_tool_version
- mapping_date
- confidence
- object_match_field
- match_string
- literal_preprocessing
- object_preprocessing
- similarity_score
- similarity_measure
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is conveniently obfuscating the fact that these two slots are added to the mappings class. They should have been there from the start anyways, but just being transparent.

- see_also
- issue_tracker_item
- other
- comment
class_uri: owl:Axiom
Expand Down
Loading