-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: defining "extension" slot as a list of key-value pairs #263
Conversation
range: string | ||
range: key value pair | ||
multivalued: true | ||
inlined_as_list: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is false it makes the serialized dict slightly more idiomatic
This needs to be dealt with in in #263
# value: MYINTERNALNAMESPACE:mapping_id | ||
# - key: funded_by | ||
# value: foaf:fundedBy | ||
subject_id predicate_id object_id mapping_justification modified mapping_id funded_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the risk of clashes if a future version of SSSOM defines, say, mapping_id
as a standard field?
I’d suggest mandating that columns defined by this extension mechanism should be prefixed, e.g. with ext_
or similar.
That is, if an extension is defined as follows:
#extension_definition:
# - key: mapping_id
# value: MYINTERNALNAMESPACE:mapping_id
Then in the TSV file it should appear as:
subject_id predicate_id [...] ext_mapping_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that or adding a SSSOM version number in the mapping set metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I was musing about before I stopped working on it. The downside of using "ext" prefixes for column names is that I already know projects using SSSOM that really don't want to change their column names which are part of a large metadata model for mappings (with a few organisation specific columns like, ehem, species, Grant_id, and other Shananigans. I would have much liked the prefix solution for cleanliness.
My thought was that if a custom column is defined in the mapping set header, it trumps any definitions in sssom schema. This is super messy, but it's unclear to me how I should even approach this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me how I should even approach this question.
My suggestion:
- Make a list of those extra column names that you already know are used in the wild.
- Mark those column names in the spec as being
reserved
, meaning that future SSSOM versions will never reuse those names. - From that moment on, new extension columns MUST use prefixed names. Existing projects that are currently using “reserved” names MAY continue to do so, though they are encouraged to switch to prefixed versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be the cleanest solution, but it’s a pretty common way of dealing with existing legacy stuff in specifications and standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very French way to go about it, while the real world clearly works under Mediterranean principles.. People will just not care about this constraint in the spec - the only consequence it will have is that we can't use normal processing tools on many mapping sets in the wild. I know I sound negative but man, after 10 years of lobbying with great charme and all we barely managed to get people to agree to publish ontologies in a common syntax - it has proven nearly impossible to tell these same people to use a common set of annotation properties to describe their metadata.
On the other hand, I cant see a different solution either! I am 51% inclined to go your way. I wouldn't even bother to enumerate existing columns, they are too messy anyways. Just force the ext_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a real-world approach. Look at any IETF RFCs, any IANA registry: they are littered with this kind of “reserved” attributes, keywords, identifiers. Most of them exist for only one reason: to deal with this kind of mess and the fact that many people don’t give a shit about standards, and standards have to take that into account.
Superceded by #375 |
Resolves [#ISSUE, #ISSUE]
docs/
have been added/updated if necessarymake test
has been run locally[Description, mentioning at least relevant #ISSUE and how it was addressed. A bulleted list of all changes performed by the PR is is helpful.]