Rework the literal profile idea #382

matentzn · 2024-08-03T19:20:38Z

Resolves #234

docs/ have been added/updated if necessary
make test has been run locally
tests have been added/updated (if applicable)
CHANGELOG.md has been updated.

If you are proposing a change to the SSSOM metadata model, you must

provide a full, working and valid example in examples/
provide a link to the related GitHub issue in the see_also field of the linkml model
provide a link to a valid example in the see_also field of the linkml model
run SSSOM-Py test suite against the updated model

These are redundant if we "shoehorn" literal mappings into the `Mappings` class.

Note that the similarity_score and similarity_measure are generalisations of the semantic_similarity_score and semantic_similarity_measure slots which were part of our previous "literal mappings" proposal, which I would like to keep to encode lexical and other kinds of similarity measures.

matentzn · 2024-08-03T19:22:55Z

src/sssom_schema/schema/sssom_schema.yaml

@@ -49,6 +49,7 @@ enums:
        meaning: rdfs:Datatype
      rdf property:
        meaning: rdf:Property
+      sssom literal: A special type of literal that is used in SSSOM files to express that the subject_id is not an entity reference.


current path based on your proposal @gouttegd:

We only allow literals in the subject position.

We introduce a sssom literal as a special case in the already existing entity_type field (as you suggested)

If that type is set, "literal mapping" mode is active (with a small number of new fields, see below)

We only allow literals in the subject position.

Why? Seems like an needless complication to me.

In particular, this means that literal mappings cannot possibly be inverted. And I don’t know about the use cases of those who would manipulate literal mappings, but personally I invert (non-literal) mappings all the time.

OK, saw your comment below about how this would “go against the whole idea of the format”.

Not convinced, sorry. Once we admit that we can have mapping where one side is a literal rather than an entity, I don’t see how it matters which side is the literal.

ooooookeeeeeeee :P I will fix it. Now sleeping time! :D

matentzn · 2024-08-03T19:23:54Z

src/sssom_schema/schema/sssom_schema.yaml

@@ -118,15 +119,17 @@ slots:
  subject_id:
    description: The ID of the subject of the mapping.
    range: EntityReference
-    required: true


unfortunate side effect, need some help to ensuring that either this or subject_literal is required.. (@gouttegd generally feel free to just commit to this branch, we will recreate it nicely once the design is finished)

I don’t think this is feasible in pure LinkML.¹ So it would have to be done on top of LinkML, by adding a paragraph to the spec to explicitly state that subject_id MUST have a value, unless the subject_type is sssom literal, in which case it is subject_label (or subject_literal if I did not push you enough to accept reusing subject_label) that MUST have a value.

Implementations that use the Python LinkML runtime may need to be updated to cope with that change, if they were previously relying on the runtime to check that subject_id was always set (I don’t know if that’s the case of SSSOM-Py). But for non-Python implementations, this will not change much, since the absence of a LinkML runtime for any other language than Python means that those implementations already had to “manually” enforce the required constraint anyway.

¹ Feel free to ask people who know LinkML better than I do (so, approximately everyone) just in case. But as far as I can tell, we cannot express variable constraints in LinkML, such as “this slot is required IFF that other slot has this particular value”.

matentzn · 2024-08-03T19:24:16Z

src/sssom_schema/schema/sssom_schema.yaml

@@ -135,7 +138,7 @@ slots:
    examples:
      - value: "Alzheimer"
        description: A string referring to some thing.
-  literal_datatype:
+  subject_literal_datatype:


The only new field, the rest can be reused.

Is that really necessary? Do we expect people to map other values than pure strings (like, integers or dates)?

This specific one may not be, but other like subject_literal_language may become relevant quickly. But I can be convinced to keep things essential and drop it entirely. I kept it mostly to illustrate that I want to be able to extend the slots for the literal without touching the rest of the standard..

Just noting that if we decided to reuse subject_label rather than creating subject_literal, we could add a subject_label_language which could also be meaningful for non-literal mappings…

matentzn · 2024-08-03T19:25:28Z

src/sssom_schema/schema/sssom_schema.yaml

-    - object_match_field
-    - match_string
-    - literal_preprocessing
-    - object_preprocessing
    - similarity_score
    - similarity_measure


The diff is conveniently obfuscating the fact that these two slots are added to the mappings class. They should have been there from the start anyways, but just being transparent.

matentzn · 2024-08-03T19:27:17Z

@gouttegd I will need some help to get this to work properly. The only main change I have in here compared to your suggestion is to disallow literals in the object position. That goes against the whole idea of the format. Feel free to commit whatever you see fit to this branch, we will create a nice version of it when we are both satisfied with the design.

gouttegd · 2024-08-03T19:40:12Z

src/sssom_schema/schema/sssom_schema.yaml

    mappings:
    - owl:annotatedSource
    slot_uri: owl:annotatedSource
    examples:
      - value: HP:0009894
        description: The CURIE denoting the Human Phenotype Ontology concept of 'Thickened ears'
-  literal:
+  subject_literal:


I was rather envisioning reusing subject_label rather than creating a new slot.

I know, but subject_label has a very different purpose than subject_literal. Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on, so I can probably be convinced to using subject_label as you say. My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly, but probably, I am just introducing churn... One more push with a ring finger and you will have me convinced!

Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on

Yes it does. And I think it is a better solution (not perfect, but better) than adding a new slot that is only meaningful when the “literal mode is on” (i.e. when subject_type == sssom literal).

If we reuse subject_label, all we have to do is to say that subject_label is normally the label of the entity being mapped, or the literal being mapped if the mapping is a literal mapping. Full stop.

If we create subject_literal, we not only need to explain what it is (the literal being mapped), but also deal with the possible incorrect use of both subject_literal (which must not be used for non-literal mappings) and subject_label (which must not be used for literal mappings). Another needless complication.

My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly

I am going to be blunt (blunter than usual, I mean :P ): the SSSOM data model has not been designed to separate things cleanly.

A “clean design” would not have put all the metadata slots directly in the mapping class, resulting in all those slots that exist in two versions (one for the subject side, another for the object side) such as subject_label and object_label, subject_preprocessing and object_preprocessing, etc.

Instead, a clean design would have another class specifically to represent an entity being mapped (let’s call it entity), without consideration for whether it is the subject or the object. That class would have had slots such as id, label, preprocessing, source, etc. And the mapping class would have had two slots subject and object, both accepting a value of type entity.

This would have been much cleaner, would have make manipulating the mappings and their entities much easier (inverting a mapping, for example, would simply have involved swapping the subject and the object slot, instead of having to swap all the subject_* and object_* slots). It would also have made extending the model to deal with new types of entities easier as well, since it would just have required adding new subclasses of entity. We could even have dispensed with the subject|object_type slots, since the type of an entity would have been naturally indicated by which subclass of entity is used.

My understanding, based on some old discussions that I have read in this repo, is that such a design has been deliberately avoided because there was a clear preference for a data model that was much closer to the TSV serialisation (i.e., where each column of the TSV maps directly to one slot in the mapping class).

I do not criticize that choice, but now we have to stick to the “flat” model that we have as a result. Trying to introduce some clean object-oriented design on top of it would only make things more confusing. (Hence my refusal to the entire idea of having a separate literal mapping class.)

gouttegd · 2024-08-03T19:40:53Z

src/sssom_schema/schema/sssom_schema.yaml

    description: The literal being mapped
+    see_also:
+    - https://mapping-commons.github.io/sssom/sssom-profiles/
+    - https://github.com/mapping-commons/sssom/issues/234
    range: string
    required: true


If you insist on having that separate slot (rather than reusing subject_label), you need to remove that required.

matentzn · 2024-08-04T03:01:44Z

Closing in favour of #384.

I hope this addresses all your comments @gouttegd! As always, thanks a ton for steering me towards the light!

matentzn added 4 commits August 3, 2024 21:26

Remove literal mapping class

86a2905

Remove literal_source, literal_source_version and literal_preprocessing

b31d851

These are redundant if we "shoehorn" literal mappings into the `Mappings` class.

Make subject_id optional and introduce new subject_type

7755f56

matentzn commented Aug 3, 2024

View reviewed changes

gouttegd reviewed Aug 3, 2024

View reviewed changes

gouttegd marked this pull request as ready for review August 3, 2024 23:58

matentzn closed this Aug 4, 2024

matentzn deleted the literal-mapping branch August 4, 2024 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the literal profile idea #382

Rework the literal profile idea #382

matentzn commented Aug 3, 2024

matentzn Aug 3, 2024

gouttegd Aug 3, 2024

gouttegd Aug 3, 2024

matentzn Aug 3, 2024

matentzn Aug 3, 2024

gouttegd Aug 3, 2024

matentzn Aug 3, 2024

gouttegd Aug 3, 2024

matentzn Aug 3, 2024

gouttegd Aug 3, 2024

matentzn Aug 3, 2024

matentzn commented Aug 3, 2024

gouttegd Aug 3, 2024

matentzn Aug 3, 2024

gouttegd Aug 3, 2024

gouttegd Aug 3, 2024 •

edited

Loading

gouttegd Aug 3, 2024

matentzn commented Aug 4, 2024

Rework the literal profile idea #382

Rework the literal profile idea #382

Conversation

matentzn commented Aug 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matentzn commented Aug 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gouttegd Aug 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matentzn commented Aug 4, 2024

gouttegd Aug 3, 2024 •

edited

Loading