Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the literal profile idea #382

Closed
wants to merge 4 commits into from
Closed

Rework the literal profile idea #382

wants to merge 4 commits into from

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Aug 3, 2024

Resolves #234

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • tests have been added/updated (if applicable)
  • CHANGELOG.md has been updated.

If you are proposing a change to the SSSOM metadata model, you must

  • provide a full, working and valid example in examples/
  • provide a link to the related GitHub issue in the see_also field of the linkml model
  • provide a link to a valid example in the see_also field of the linkml model
  • run SSSOM-Py test suite against the updated model

These are redundant if we "shoehorn" literal mappings into the `Mappings` class.
Note that the similarity_score and similarity_measure are generalisations of the semantic_similarity_score and semantic_similarity_measure slots which were part of our previous "literal mappings" proposal, which I would like to keep to encode lexical and other kinds of similarity measures.
@@ -49,6 +49,7 @@ enums:
meaning: rdfs:Datatype
rdf property:
meaning: rdf:Property
sssom literal: A special type of literal that is used in SSSOM files to express that the subject_id is not an entity reference.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current path based on your proposal @gouttegd:

  1. We only allow literals in the subject position.
  2. We introduce a sssom literal as a special case in the already existing entity_type field (as you suggested)
  3. If that type is set, "literal mapping" mode is active (with a small number of new fields, see below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only allow literals in the subject position.

Why? Seems like an needless complication to me.

In particular, this means that literal mappings cannot possibly be inverted. And I don’t know about the use cases of those who would manipulate literal mappings, but personally I invert (non-literal) mappings all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, saw your comment below about how this would “go against the whole idea of the format”.

Not convinced, sorry. Once we admit that we can have mapping where one side is a literal rather than an entity, I don’t see how it matters which side is the literal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooookeeeeeeee :P I will fix it. Now sleeping time! :D

@@ -118,15 +119,17 @@ slots:
subject_id:
description: The ID of the subject of the mapping.
range: EntityReference
required: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate side effect, need some help to ensuring that either this or subject_literal is required.. (@gouttegd generally feel free to just commit to this branch, we will recreate it nicely once the design is finished)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is feasible in pure LinkML.¹ So it would have to be done on top of LinkML, by adding a paragraph to the spec to explicitly state that subject_id MUST have a value, unless the subject_type is sssom literal, in which case it is subject_label (or subject_literal if I did not push you enough to accept reusing subject_label) that MUST have a value.

Implementations that use the Python LinkML runtime may need to be updated to cope with that change, if they were previously relying on the runtime to check that subject_id was always set (I don’t know if that’s the case of SSSOM-Py). But for non-Python implementations, this will not change much, since the absence of a LinkML runtime for any other language than Python means that those implementations already had to “manually” enforce the required constraint anyway.


¹ Feel free to ask people who know LinkML better than I do (so, approximately everyone) just in case. But as far as I can tell, we cannot express variable constraints in LinkML, such as “this slot is required IFF that other slot has this particular value”.

@@ -135,7 +138,7 @@ slots:
examples:
- value: "Alzheimer"
description: A string referring to some thing.
literal_datatype:
subject_literal_datatype:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only new field, the rest can be reused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really necessary? Do we expect people to map other values than pure strings (like, integers or dates)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific one may not be, but other like subject_literal_language may become relevant quickly. But I can be convinced to keep things essential and drop it entirely. I kept it mostly to illustrate that I want to be able to extend the slots for the literal without touching the rest of the standard..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that if we decided to reuse subject_label rather than creating subject_literal, we could add a subject_label_language which could also be meaningful for non-literal mappings…

- object_match_field
- match_string
- literal_preprocessing
- object_preprocessing
- similarity_score
- similarity_measure
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is conveniently obfuscating the fact that these two slots are added to the mappings class. They should have been there from the start anyways, but just being transparent.

@matentzn
Copy link
Collaborator Author

matentzn commented Aug 3, 2024

@gouttegd I will need some help to get this to work properly. The only main change I have in here compared to your suggestion is to disallow literals in the object position. That goes against the whole idea of the format. Feel free to commit whatever you see fit to this branch, we will create a nice version of it when we are both satisfied with the design.

mappings:
- owl:annotatedSource
slot_uri: owl:annotatedSource
examples:
- value: HP:0009894
description: The CURIE denoting the Human Phenotype Ontology concept of 'Thickened ears'
literal:
subject_literal:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather envisioning reusing subject_label rather than creating a new slot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but subject_label has a very different purpose than subject_literal. Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on, so I can probably be convinced to using subject_label as you say. My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly, but probably, I am just introducing churn... One more push with a ring finger and you will have me convinced!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on

Yes it does. And I think it is a better solution (not perfect, but better) than adding a new slot that is only meaningful when the “literal mode is on” (i.e. when subject_type == sssom literal).

If we reuse subject_label, all we have to do is to say that subject_label is normally the label of the entity being mapped, or the literal being mapped if the mapping is a literal mapping. Full stop.

If we create subject_literal, we not only need to explain what it is (the literal being mapped), but also deal with the possible incorrect use of both subject_literal (which must not be used for non-literal mappings) and subject_label (which must not be used for literal mappings). Another needless complication.

Copy link
Contributor

@gouttegd gouttegd Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly

I am going to be blunt (blunter than usual, I mean :P ): the SSSOM data model has not been designed to separate things cleanly.

A “clean design” would not have put all the metadata slots directly in the mapping class, resulting in all those slots that exist in two versions (one for the subject side, another for the object side) such as subject_label and object_label, subject_preprocessing and object_preprocessing, etc.

Instead, a clean design would have another class specifically to represent an entity being mapped (let’s call it entity), without consideration for whether it is the subject or the object. That class would have had slots such as id, label, preprocessing, source, etc. And the mapping class would have had two slots subject and object, both accepting a value of type entity.

This would have been much cleaner, would have make manipulating the mappings and their entities much easier (inverting a mapping, for example, would simply have involved swapping the subject and the object slot, instead of having to swap all the subject_* and object_* slots). It would also have made extending the model to deal with new types of entities easier as well, since it would just have required adding new subclasses of entity. We could even have dispensed with the subject|object_type slots, since the type of an entity would have been naturally indicated by which subclass of entity is used.

My understanding, based on some old discussions that I have read in this repo, is that such a design has been deliberately avoided because there was a clear preference for a data model that was much closer to the TSV serialisation (i.e., where each column of the TSV maps directly to one slot in the mapping class).

I do not criticize that choice, but now we have to stick to the “flat” model that we have as a result. Trying to introduce some clean object-oriented design on top of it would only make things more confusing. (Hence my refusal to the entire idea of having a separate literal mapping class.)

description: The literal being mapped
see_also:
- https://mapping-commons.github.io/sssom/sssom-profiles/
- https://github.com/mapping-commons/sssom/issues/234
range: string
required: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist on having that separate slot (rather than reusing subject_label), you need to remove that required.

@gouttegd gouttegd marked this pull request as ready for review August 3, 2024 23:58
@matentzn
Copy link
Collaborator Author

matentzn commented Aug 4, 2024

Closing in favour of #384.

I hope this addresses all your comments @gouttegd! As always, thanks a ton for steering me towards the light!

@matentzn matentzn closed this Aug 4, 2024
@matentzn matentzn deleted the literal-mapping branch August 4, 2024 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSSOM: Separate profile for Literal Mappings
2 participants