Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR to handle dataset information in TIMDEX #125
ADR to handle dataset information in TIMDEX #125
Changes from 4 commits
fc423fa
dd2567e
1e7c667
bb82f62
82d0a90
44e9ef9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't know that I feel strongly enough to veto this option, I do have concerns.
While I agree that, in hindsight,
literary_form
field seems to over-fit the data model to one data source, I'm not sure how comfortable I am creating a vocabulary that encompasses all of these values:nonfiction
,fiction
,point
,polygon
,raster
,image
Terms like
point
,polygon
, andraster
feel like they address the internal structure of the record, butnonfiction
andfiction
are more about the content (akin to genre). Equivalent literary terms feel like they might beprose
,verse
, ordrama
(which are values that I don't think appear in any current field?)image
is a value I also struggle with in this set, as I understand it to be closely related to (if not synonymous with)raster
. I vaguely remember seeing raster data that is not an image, so maybe there's something here I'm missing though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matt-bernhardt would your concern be addressed by the different
kind
provided by this Object?Example:
GIS record
Literary form example
I remain unsure myself if
kind
addresses the "sameness" across fields. When I imagine using this data in a User Interface, I find myself agreeing with your concern that the geo values looks super weird along with the literary form values. We could address that by exposing aggregations that are filtered to the separatekind
values but then it starts to feel slightly weird again of why are we combining things in OpenSearch that we would then split out again in GraphQL and thus nobody would ever use the combined data. It definitely starts to feel like if we are concerned we'll never use the data together -- because it is indeed "different" -- we need to consider if it ever should have been together in the first place. I'm still not sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding Option 4 pretty appealing now.
One question: what if we had a record for a print map? If the Aardvark record had
dct_format_s="Print map"
, are we okay with this value mapping tofile_formats
?Maybe helpful for discussion, this spreadsheet has
dct_format_s
andgbl_resourceType_sm
values across all MIT and OGM records.In support of Option 4, I'm hard pressed to find a value that isn't a digital file format in essence (where even msot of the complex ones could be mapped to "Online resource" or something similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: if we go this route, are we okay losing mimetype as a value in a TIMDEX record? My inclination is that it would be okay as I'd imagine this field is not widely used yet.
If we did want specifics about mimetypes, perhaps that could get added to the
links
section? Seems more meaningful for a mimetype to point precisely to an actual resource vs the record as an abstract whole.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that including it with the actual download link would be more useful,
Link.file_type
attribute?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do go this route, we may want to change the name from
file_formats
so it's more broadly applicable but I don't have a suggestion yetThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had time to look more closely at the spreadsheet linked above, and I don't think we'll encounter "Print Maps", at least in the
gismit
orgisogm
sources. So maybe this is moot? I think it's safe to say that the values we get back fromdct_format_s
in the MITAardvark files will uniformly be digital in nature. And therefore,file_formats
could be applicable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did repurpose
file_formats
, and therefore needed to move those mimetype data points to another field, and we did like putting them on thelinks
entries, I might proposemimetype
explicitly; no ambiguity there.If we go that route, I think we'd want to make sure the links made sense though. If we're just pointing someone back to an Alma record, or ASpace page, I'm not sure that an
link.mimetype=application/pdf
makes sense.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can weigh in on what to do with the mimetype values at the moment, but as far as the rest of this choice is concerned, I think I'd prefer option 4 over the others - at least as I understand them now.
To be clear, this option would leave the current
literary_form
field untouched?