-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the literal profile idea #382
Conversation
These are redundant if we "shoehorn" literal mappings into the `Mappings` class.
Note that the similarity_score and similarity_measure are generalisations of the semantic_similarity_score and semantic_similarity_measure slots which were part of our previous "literal mappings" proposal, which I would like to keep to encode lexical and other kinds of similarity measures.
@@ -49,6 +49,7 @@ enums: | |||
meaning: rdfs:Datatype | |||
rdf property: | |||
meaning: rdf:Property | |||
sssom literal: A special type of literal that is used in SSSOM files to express that the subject_id is not an entity reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current path based on your proposal @gouttegd:
- We only allow literals in the subject position.
- We introduce a
sssom literal
as a special case in the already existingentity_type
field (as you suggested) - If that type is set, "literal mapping" mode is active (with a small number of new fields, see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only allow literals in the subject position.
Why? Seems like an needless complication to me.
In particular, this means that literal mappings cannot possibly be inverted. And I don’t know about the use cases of those who would manipulate literal mappings, but personally I invert (non-literal) mappings all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, saw your comment below about how this would “go against the whole idea of the format”.
Not convinced, sorry. Once we admit that we can have mapping where one side is a literal rather than an entity, I don’t see how it matters which side is the literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooooookeeeeeeee :P I will fix it. Now sleeping time! :D
@@ -118,15 +119,17 @@ slots: | |||
subject_id: | |||
description: The ID of the subject of the mapping. | |||
range: EntityReference | |||
required: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate side effect, need some help to ensuring that either this or subject_literal
is required.. (@gouttegd generally feel free to just commit to this branch, we will recreate it nicely once the design is finished)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this is feasible in pure LinkML.¹ So it would have to be done on top of LinkML, by adding a paragraph to the spec to explicitly state that subject_id
MUST have a value, unless the subject_type
is sssom literal
, in which case it is subject_label
(or subject_literal
if I did not push you enough to accept reusing subject_label
) that MUST have a value.
Implementations that use the Python LinkML runtime may need to be updated to cope with that change, if they were previously relying on the runtime to check that subject_id
was always set (I don’t know if that’s the case of SSSOM-Py). But for non-Python implementations, this will not change much, since the absence of a LinkML runtime for any other language than Python means that those implementations already had to “manually” enforce the required
constraint anyway.
¹ Feel free to ask people who know LinkML better than I do (so, approximately everyone) just in case. But as far as I can tell, we cannot express variable constraints in LinkML, such as “this slot is required IFF that other slot has this particular value”.
@@ -135,7 +138,7 @@ slots: | |||
examples: | |||
- value: "Alzheimer" | |||
description: A string referring to some thing. | |||
literal_datatype: | |||
subject_literal_datatype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only new field, the rest can be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really necessary? Do we expect people to map other values than pure strings (like, integers or dates)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific one may not be, but other like subject_literal_language
may become relevant quickly. But I can be convinced to keep things essential and drop it entirely. I kept it mostly to illustrate that I want to be able to extend the slots for the literal without touching the rest of the standard..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that if we decided to reuse subject_label
rather than creating subject_literal
, we could add a subject_label_language
which could also be meaningful for non-literal mappings…
- object_match_field | ||
- match_string | ||
- literal_preprocessing | ||
- object_preprocessing | ||
- similarity_score | ||
- similarity_measure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is conveniently obfuscating the fact that these two slots are added to the mappings class. They should have been there from the start anyways, but just being transparent.
@gouttegd I will need some help to get this to work properly. The only main change I have in here compared to your suggestion is to disallow literals in the |
mappings: | ||
- owl:annotatedSource | ||
slot_uri: owl:annotatedSource | ||
examples: | ||
- value: HP:0009894 | ||
description: The CURIE denoting the Human Phenotype Ontology concept of 'Thickened ears' | ||
literal: | ||
subject_literal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was rather envisioning reusing subject_label
rather than creating a new slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but subject_label has a very different purpose than subject_literal. Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on, so I can probably be convinced to using subject_label as you say. My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly, but probably, I am just introducing churn... One more push with a ring finger and you will have me convinced!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proposal effectively changes the purpose of subject_label if the "literal mode" is switched on
Yes it does. And I think it is a better solution (not perfect, but better) than adding a new slot that is only meaningful when the “literal mode is on” (i.e. when subject_type == sssom literal
).
If we reuse subject_label
, all we have to do is to say that subject_label
is normally the label of the entity being mapped, or the literal being mapped if the mapping is a literal mapping. Full stop.
If we create subject_literal
, we not only need to explain what it is (the literal being mapped), but also deal with the possible incorrect use of both subject_literal
(which must not be used for non-literal mappings) and subject_label
(which must not be used for literal mappings). Another needless complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to seperate the "literal" and "entity" modes a tiny bit more cleanly
I am going to be blunt (blunter than usual, I mean :P ): the SSSOM data model has not been designed to separate things cleanly.
A “clean design” would not have put all the metadata slots directly in the mapping
class, resulting in all those slots that exist in two versions (one for the subject side, another for the object side) such as subject_label
and object_label
, subject_preprocessing
and object_preprocessing
, etc.
Instead, a clean design would have another class specifically to represent an entity being mapped (let’s call it entity
), without consideration for whether it is the subject or the object. That class would have had slots such as id
, label
, preprocessing
, source
, etc. And the mapping
class would have had two slots subject
and object
, both accepting a value of type entity
.
This would have been much cleaner, would have make manipulating the mappings and their entities much easier (inverting a mapping, for example, would simply have involved swapping the subject
and the object
slot, instead of having to swap all the subject_*
and object_*
slots). It would also have made extending the model to deal with new types of entities easier as well, since it would just have required adding new subclasses of entity
. We could even have dispensed with the subject|object_type
slots, since the type of an entity would have been naturally indicated by which subclass of entity
is used.
My understanding, based on some old discussions that I have read in this repo, is that such a design has been deliberately avoided because there was a clear preference for a data model that was much closer to the TSV serialisation (i.e., where each column of the TSV maps directly to one slot in the mapping
class).
I do not criticize that choice, but now we have to stick to the “flat” model that we have as a result. Trying to introduce some clean object-oriented design on top of it would only make things more confusing. (Hence my refusal to the entire idea of having a separate literal mapping
class.)
description: The literal being mapped | ||
see_also: | ||
- https://mapping-commons.github.io/sssom/sssom-profiles/ | ||
- https://github.com/mapping-commons/sssom/issues/234 | ||
range: string | ||
required: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist on having that separate slot (rather than reusing subject_label
), you need to remove that required
.
Resolves #234
docs/
have been added/updated if necessarymake test
has been run locallyIf you are proposing a change to the SSSOM metadata model, you must
examples/
see_also
field of the linkml modelsee_also
field of the linkml model