-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSSOM: Separate profile for Literal Mappings #234
Comments
cc @rsgoncalves, would also like to hear your input on the PR if you don't mind. Happy to answer any questions if you don't understand what exactly it does! |
The PR looks good to me. I think we can swap out our (simple) mapping format with SSSOM straightforwardly. I could use your help clarifying a few things, further below. For context: In the mapping tool we've been developing, the output is a simple table like so: Now to questions:
|
So in your use case, you always have an internal identifier? In this case, you could simply be using normal SSSOM rather than the literal profile?
Seems wrong, @udp.
Great question, can you open a new issue about that? I will try my best to document the difference, but it is true that these two metrics will often coincide. |
No, not always. So far only in a couple of datasets have we had to maintain internal identifiers. I think the literal profile is still the route, with some optional field to specify such identifiers. Could that be |
I think Classic SSSOM:
Literal Profile
|
Fixes #197 Fixes #234 - [x] `docs/` have been added/updated if necessary - [x] `make test` has been run locally - [x] tests have been added/updated (if applicable) - [ ] [CHANGELOG.md](https://github.com/mapping-commons/sssom/blob/0.9.0/CHANGELOG.md) has been updated. This PR adds a new profile to SSSOM for the representation of literal mappings, leaving the default SSSOM intact. --------- Co-authored-by: James McLaughlin <[email protected]>
I see I approved #235 but I actually have a number of problems to raise
|
I see many models that do this, like https://github.com/biolink/biolink-model/blob/1698cf997785490304a617123d5e3a242c6b2bc0/biolink-model.yaml#L6128. Where can I find focs about this?
Is there something to read about modular schema development best practices?
That was an honest mistake, now fixed. Technically literal mappings are not yet connected to the spec, we just wanted to have the docs out there to be able to use it, even if there is no tool support. |
But how is the “literal profile” even supposed to be used? All we have is a I second @cmungall ’s questions:
Those questions should get answered before we make a SSSOM 1.0, or the “literal profile” should be removed from the 1.0 version in my opinion. Right now, the “literal profile” is in effect impossible to implement in code. |
I think a separate literal mapping set would be fine? It was never the intention that they would be in the same file. The use case for this is to publish all of the manually asserted string to term mappings we have collected in ZOOMA, see https://github.com/EBISPOT/zooma2sssom/tree/master/mappings |
I don’t see why we even need a separate “profile” or a separate class for literal mappings for such a use case. Why not simply put the literal in the
|
Yes, I think this approach would work if subject_id is made optional. |
I think it's helpful to have an SSSOM-like approach for literals and I agree it fits well and doesn't necessarily need a separate "profile" but I wonder if it would lead to significant scope creep. Could SSSOM become a TSV format for annotating information about any kind of subject-predicate-object relationship? The more slots become optional and optional slots exist, the more developers will have trouble implementing tools and users trouble finding a tool that does what they are looking for. Why does the |
Maybe, but it seems there is clear interest in being able to represent such “literal mappings”. So the options are:
A likely outcome of this option is that people who need to handle this case will, in effect, “fork” SSSOM to create their own variant that can represent literal mappings. If several people do that, we will end up in the same situation as we were for general mappings before SSSOM: everyone will represent literal mappings with their own custom format, which will all be slightly incompatible with each other.
Two problems with that approach. First, for now it is incomplete. The ”literal profile” defines a Second, even with it is complete, the “literal profile” will be a mess to implement, at least in non-duck-typed languages. There is no relation between
I see no obvious drawbacks to that approach, and only benefits. Notably: a. This allows for either side of the mapping (subject or object) to be the literal. If b. Consequently, this allows inversion of mappings according to SSSOM’s standard rules (contrary to the profile proposed in #235, where the literal can only be on the subject side). c. As a side-effect, this even allows for literal-to-literal mappings, should anyone ever need to do that. d. This allows mixing literal and non-literal mappings, should anyone ever need to do that. Not saying this is necessarily a good idea, but the approach automatically makes it possible without anything special to do. By contrast, the separate fork/profile route would never allow that unless we explicitly plan for this possibility. e. Implementation-wise, this should be a breeze.
Apart from I do agree that the fact that most slots are optional can complicate the use of SSSOM, though. This, in fact, is where the notion of “profile” would be interesting, but it would be different from the type of “profile” that has been proposed in #235. A “profile” could simply be a list of slots that, within the profile, should be considered mandatory. The spec could define a few of such profiles, and users could be free of defining their own. The idea being that, once you have declared a set to adhere to a given profile (and the parser has verified that the set is indeed compliant with the indicated profile), you no longer have to worry about which slots are present or not because you already know that all slots mandated by your profile are present (if they were not, the parser would have rejected the set outright). |
Okay, I understand why an official "fork" for literals is desirable and your argument in 3 for adapting the model by adding literal to the list of possible values for It looks to me like Has anyone tried mixing different types in the same mappingSet? On the one hand, it would be really convenient for me to curate both a mapping between entities, and a mapping between an entity and literal in a single TSV file like this (top = literal to owl:Class, bottom = owl:Class to owl:Class):
I assume I can fairly easily tell which is which with just these two mappings, primarily by the predicate I chose to use. But if both mappings used
I like this idea for a "profile". It seems fairly straightforward to say in the standard mappings "profile" Profiles like this could be defined in the mappingSet metadata. Curators could be alerted that everything in a set is of a particular type (or set of allowed types), preventing the confusion I mentioned above. It does lose some of the convenience of creating mappings between very different types in the same file. I suppose you could always define a super "profile" that allows anything from the other defined profiles and then create tools to merge or split profiles. |
Yes. However it’s unclear to me whether it is suitable here (the poor documentation of the model doesn’t help). Can it be used outside of a RDF context? If I have a list of, say, scRNAseq cell cluster names and I want them to map them to Cell Ontology IDs, would it be correct to use Maybe it would be fine, maybe not. I just don’t know. Whoever came up with the values for the
Do you mean, mixing mappings with different Or did you mean, mixing (normal)
I am sorry but I don’t understand your example at all. The second mapping states that DOID:0070556 is an exact match to MESH:C535731; the first one seems to state that DOID:0070556 is not an exact synonym to MESH:C535731. I don’t understand what is that supposed to mean. Why does MESH:C535731 have a different label in the two mappings? Why is the (Besides,
Again, I don’t understand what you mean here. The spec does not and will not mandate which predicate to use (at most it can recommend that some predicates be used or conversely discourage the use of some others, but that’s it). Just because “literal mappings” would become an officially supported type of mapping does not mean that the spec would force you to use
Something like that, yes.
I would not envision allowing the definition of a profile in a mapping set’s metadata. Instead, profiles should be defined externally, and a mapping set would simply declare that they use a specific profile. Allowing each mapping set to define its own profile seems like a needless complication to me.
You don’t need profiles to do that.
Or you can just not use profiles, if you need to merge several sets that are compliant to different profiles. Profiles, if we ever create them, would not be a mandatory feature – mapping sets would not have to have to a profile. |
Sorry for barging in here, I dont have time to comment on here much. Here is my very short take:
|
OK.
Where is the trace of those “lots of meetings”? All I know about is:
We have a very different opinion of what can be considered “done”. I say it again: the literal profile is right now unusable. There are ways too many questions left open about how it can/should be used. The only thing the literal profile does for now is causing confusion, by leading people to believe they can use SSSOM to represent literal mappings, which is absolutely not the case. The EBI has already started to publish “literal mapping sets” (see James’ message above) in Aren’t we suppose to care about “interoperability”?
OK.
Hard disagree. You’re taking the easy path now without consideration for how hard you will make things in the future. That may be fine for software development in general (“move fast and break things”, as the tech bros of Silicon Valley are saying), but when desiging a (hopefully) long-term standard, you want to move slow and fix things. Right now, the “literal profile” is a half-assed design that no one knows how to use (not even you apparently). Leaving it like that and kicking the can down the road can only come back to hit us hard in the future. |
@goutteg, confusion caused by the example table I shared is exactly the point I was trying to make about mixing literal and entity mappings. The source of both mappings is the same MESH entity, but the top mapping is between the literal in the @matentzn, barging in... hahaha. Like you said, you put in work earlier when you had time. I'm sorry I couldn't contribute more at an earlier stage, but I'm fine with leaving literals as unofficial. Can I ask why |
A somewhat naive comment (it's hard to keep all these arguments clear without spending many hours, I thank you who have devoted that time):
WIth that in mind, I still think these basic thoughts could apply:
I agree that if literals are not crisply specified in this standard, the chance of divergence and even competing standards is high. But if you think literal-included triples are not really mappings for SSSOM, then that's the principled decision on which you should stand, and that other thing is not a profile, it's a different standard. |
Implementing literal mapping to me is achieving the final step to make SSSOM a duplicata of RDF (and even RDF-star as we can say things about the triple). I don't think we need another RDF. You can drop the Simple, yo can change Ontology by Resource and you get SSRM. |
In the interest of driving SSSOM 1.0 home in the coming weeks and the enormous amount of things to unpack in all the comments given here, I am ok with yanking the literal profile from the standard, for now (not happy but I can read a room 😂). I can move it to another repo and develop it independently as a non-Standard, and make sure we communicate use cases clearly for this. One day in the far future we can move this "profile" or "standard for something else" back here and have a vote. Please voice your objections to this approach until 1st August; I will be responsible for the move! |
This is just another version of “kicking the can down the road”, only in a different repo. If the intention is that at some point the “literal mapping” becomes a part of SSSOM, we should think about how this will be done right now. Adding in the future a new class of mappings is a completely different beast than adding or removing a slot in the existing For now, all the code dealing with SSSOM (in Python, Java, or any other language) can be built around the assumption that there is only one class of mapping. This is not something that will be easy to change, and the longer that assumption stays around, the harder it will be to change. So if you already know that at some point you want the standard (and its implementations) to deal with several types of mappings (e.g
If you do this (make SSSOM 1.0 with no room for more than one class of mappings, then come back later with a proposition for another class as if it was an afterthought), I can already tell you what my vote will be: No. Absolutely not. |
Hi, typing from my phone as I’m away camping without access to a computer. For us (biocurators at EBI) mapping from term to term or string to term are both classes of the same problem. We often have datasets that require both types of mappings to get to the types of identifiers we want. For example I am currently working with a dataset that has a mix of chemical names and CAS numbers. I want to map the CAS number where available (obviously a use case for core SSSOM) and otherwise map the chemical name (literal mapping). This is perhaps not the best example but I can dig out unlimited more when I get back to the office. At EBI we use two tools for these term and literal mappings respectively: OXO and ZOOMA. So far we have maintained the databases for these tools internally which is not in the FAIR spirit of our community. We are therefore opening up OXO using SSSOM and hope to do the same with ZOOMA. Just like term mappings, literal mappings are context dependent (we maintain different literal mapping sets per project in zooma for this reason) and have metadata associated eg lexical match or manual curation, a mapping author, a date, etc etc. I don’t think solving these problems twice by making a new SSSLiteralOM complete with website, issue tracker and so on is the best way to spend our time when we already have the community mindshare (or so I thought) and infrastructure here to support it In fact this is extremely unlikely to happen with the resources we have, so ZOOMA’s data would stay loosely specified and difficult to use - but I thought we left this kind of thing in the past and moved towards trying to agree on things to enable interoperability. |
@jamesamcl I am not against representing literal mappings in SSSOM. I do share a bit of @jonquet ’s concerns about re-inventing RDF, but from what I’ve seen in the wild I am afraid that horse has left the barn anyway: people have already started to use SSSOM/TSV to serialise arbitrary RDF triples and not only triples that represent “mappings”. (This is a concern that has already been mentioned in #324). This is not what SSSOM is intended for, but I don’t think there’s much we can do about it. Once you put a tool in people’s hands, they will use it in any way they like. A kitchen knife is not supposed to be used to turn screws, but people will use one for that purpose if they don’t have a screwdriver. So what? I don’t think we should prevent SSSOM from being useful to manipulate mappings just because people find it useful to do other things with it (including things they shouldn’t do). But if we are to allow literal mappings to be represented in SSSOM, we should do so correctly, and I am sorry but #235 is not a correct solution in its current state. I see two ways of representing literal mappings in SSSOM: A) Having a separate If we want to go that route, I will insist that these questions must be addressed ASAP, before SSSOM 1.0 is published, because as I have stated above, this route breaks the assumption that there is only one class of mapping. That assumption has been there since the beginning of SSSOM, and is still present everywhere in the current form of the standard even after #235 has been merged. In particular, the So if we now have to deal with more classes of mappings than just
Whatever method we choose is going to have huge implications on SSSOM implementations (especially implementations in statically typed languages), so I am flatly opposed to postponing any decision on this to after 1.0. I don’t care if this means that 1.0 is going to be delayed by 10 months until we figure out how to do it. B) Shoehorn literal mappings into the existing That would be a much less invasive change, with much less implications on implementations, because the assumption that there will only ever be one class of mapping would stand. For that reason: (1) I tend to favor that route; (2) if we want to go that route we can easily postpone that to after 1.0. |
Alright, here we go. #384 introduces an alternative model to the "literal mapping" proposal we have previously added. It is built on the following assumptions:
I do understand that there are various opposing views on the need for a "literal" profile, but I think this super minimal intervention will satisfy both sides. In essence, we do not have a literal profile; we have a convention that allows us to represent an "entity" by its label ( Huge thanks to @gouttegd 🙏 who managed to steer this massive carrier ship after it had left the harbor. This is rarely successful and needed a huge amount of thought, testing, and patience (mostly with me and my constant questions), and I am supremely happy we managed to make it! 🎉 THAT WAS IT FOLKS - the last issue before SSSOM 1.0 (#189). Thanks to all of you who helped and contributed; now the carrier ship has sailed off the horizon, hopefully, to connect the isolated shores of our data islands! |
For those who had started to create pseudo-SSSOM/TSV files using the “literal profile” (even though this has never been officially feasible since that profile had never been connected to the rest of the spec), SSSOM-Java will support reading such files and converting it them to the new proposed convention. That is, given an input file like this:
SSSOM-CLI will silently convert it into:
|
as discussed in #197 we are now going to provide a basic spec for a literal mapping. This is the suggestion:
The text was updated successfully, but these errors were encountered: