-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema.org markup for video objects #834
Conversation
e6ca5ce
to
6c568b4
Compare
@arcadiafalcone would you take a look at the specs (especially spec/lib/metadata/schema_dot_org_spec.rb) to see if the Cocina is realistic? If there are other scenarios to include in the test, let me know. |
Is there a schema.org reason for joining title and subtitle with a new line rather than the more usual semicolon?
The DOI might be in identifier.uri instead of identifier.value for some objects. purl/spec/lib/metadata/schema_dot_org_spec.rb Line 199 in a94ffa5
And the ORCID may also be in either identifier.value or identifier.uri - MODS doesn't make the distinction. purl/spec/lib/metadata/schema_dot_org_spec.rb Line 389 in a94ffa5
|
db06769
to
6eb5412
Compare
Thanks @arcadiafalcone. I'm fixing title and subtitle concatenation since I was erroneously using the same approach as for the description. Will use semi-colon to concatenate. For the DOI in the identifier.uri, would this spec suffice? purl/spec/lib/metadata/schema_dot_org_spec.rb Line 198 in 6eb5412
For the ORCID, the current code / specs handle a contributor having |
My error, it should be title: subtitle (colon rather than semicolon). |
Re: DOI in identifier.uri, so long as it confirms that it is the right domain for a DOI (not all values in this field will be DOIs). Re: ORCID, identifier.value should always have a type. (If it doesn't, that's a data problem.) |
d907b40
to
0f4d2f5
Compare
Thanks, @arcadiafalcone. I've adjusted the specs and code for these. |
lib/metadata/schema_dot_org.rb
Outdated
@@ -44,38 +42,58 @@ def dataset? | |||
false | |||
end | |||
|
|||
def video? | |||
# Only return video metadata if world-downloadable. | |||
video = JsonPath.new("$.description.form[?(@['value'] == 'moving image' && @['type'] == 'resource type')]").on(@cocina_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this coming from description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was @arcadiafalcone's recommendation: #824 (comment). Is there a different field in the cocina we should consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect https://cocina.sul.stanford.edu/models/media
to include audio as well, and we only want video. Am I not understanding the type correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly sure that I'm guessing here. Alternatively, resource's type = "https://cocina.sul.stanford.edu/models/resources/video"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arcadiafalcone could you make a recommendation about using a fileset resource type vs the descriptive metadata for videos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descriptive metadata is dependent on the user to create it, so the fileset resource type is probably a bit more reliable (it's a core metadata field that's usually provided, but a user could make an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the schema.org expectation that a "video" = streaming video? Or would it include a video file that may be downloadable but isn't presented in a player? If the expectation is streaming, then the fileset resource type is the only guarantee that it will be presented that way.
We may also have description that says something is a video when the video itself hasn't been digitized. I'm not aware of examples but we do have "audio" where only the record label/album cover/liner notes have been digitized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to use the fileset resource type for determining it's a video.
lib/metadata/schema_dot_org.rb
Outdated
end | ||
|
||
def access | ||
def access? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this take into account the file permissions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need help understanding if it's possible to have an object with access.download == "world"
and the video itself is not world-downloadable. Is that the scenario you're thinking of, @justinlittman?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible. It's less likely than the inverse case, where the access.download
is none
but one or more files are world-downloadable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the code to look at the access.download
permissions for the first file where hasMimeType
includes video
. It also looks at the object-level rights and for those to be "world". If you think we need to allow those to be download
== none
, let me know @andrewjbtw
@lwrubel Can a video object have more than one video file? |
fdd0892
to
d2fea9c
Compare
@justinlittman Looking at the canonical examples, I see that it is possible to have multiple video files (e.g. https://purl.stanford.edu/yj807zw8315). I'm trying to figure out what makes the most sense for schema.org. There is an ItemList with limited usage, but it's intended for video carousels more than multi-file objects. Honestly, I'm not sure if it's important to surface thumbnails or embedUrls of more than one video, if on the PURL we currently show the first one, with access to the rest. They share descriptive metadata. (Not sure if there is rights variability among a list of videos, but we could pull the embedURL and thumbnail for the first video that is world-downloadable.) |
08aa387
to
97e4daf
Compare
Noting here that there seem to very few videos that meet our criteria for the uploadDate of the object having an |
I think I've addressed the comments, and this is otherwise fine to proceed on any further code review. We can tweak the date logic or broaden/narrow criteria further if needed, of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/metadata/schema_dot_org.rb
Outdated
def format_specific_fields | ||
if dataset? | ||
return { "identifier": identifier, | ||
"isAccessibleForFree": access?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the implication if this is "false"? Does it make sense to include the metadata in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google Dataset Search includes datasets that are not free, or require a DUA. There's a "free" filter. So it's worth including non-free datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why publicize something that we can't provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyehodge I am making the assumption that there are SDR-deposited datasets that are not freely available but possible to use, once the user goes through some process (e.g. a DUA or contacting the author). Or that creators of Stanford-only datasets may still want some kind of visibility in Google Dataset Search. But maybe that is not a realistic scenario or something we support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do have datasets like this. One example is https://purl.stanford.edu/sg732vt3619, which has a PURL but in order to gain access you must 1) be a Stanford person and 2) agree to the Data Use Agreement, so the files aren't downloadable from the PURL. We would want this crawled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another example here https://purl.stanford.edu/gh587bx9720 where the data files are actually on the PURL.
lib/metadata/schema_dot_org.rb
Outdated
def video? | ||
# Only return video metadata if world-downloadable. | ||
video = JsonPath.new("$.structural.contains[?(@['type'] == 'https://cocina.sul.stanford.edu/models/resources/video')]").on(@cocina_json) | ||
return true if video.any? && access? && video_access? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you name the methods such that it is clearer how access?
is different from video_access?
?
Also, the method name video?
suggests that it answers the question whether this is a video. But this method does more than that. Perhaps "render_video_metadata?" or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing access?
to object_access?
. Changing video?
to render_video_metadata?
lib/metadata/schema_dot_org.rb
Outdated
end | ||
|
||
def schema_type? | ||
dataset? | ||
dataset? || video? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is poorly named since it is taking into account rights as well as the schema type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to render_video_metadata?
lib/metadata/schema_dot_org.rb
Outdated
# need to find the file that is the one for the video (based on mime-type). Then get the access and download rights for that. | ||
file_access = JsonPath.new('$[*].structural.contains[*][?(@.hasMimeType =~ /video/)].access.download').first(video) | ||
|
||
return true if file_access == 'world' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L108-110 could just be file_access == 'world'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Done here and in dataset?
97e4daf
to
27a6fdb
Compare
Resolves #824. Adds schema.org JSON for a videos that are world-downloadable. Example (https://sul-purl-stage.stanford.edu/vx195sw6395):
<script type="application/ld+json">{"@context":"http://schema.org","@type":"VideoObject","name":"Structural relief and fish abundance","description":"In BIOHOPK185 (Ecology and Conservation of Kelp Forest Communities), students learned how to use underwater video as a tool for underwater research. In addition, students were encouraged to use the cameras to collect videos that captured the beauty of the kelp forest and the process of scientific diving at Hopkins Marine Station. In 2023, students were tasked with presenting the results of one-week research projects that used transect video surveys in a five minute movie. This collection contains these movies.\n\nFunding for video equipment was provided by the Stanford Accelerator for Learning through the Gordon and Betty Moore Foundation Grant GBMF10266 to support the work of Virtual Field Trips.","thumbnailUrl":"https://sul-stacks-stage.stanford.edu/file/druid:vx195sw6395/pf804qh8129_kfe_project_2023_mattioli_thumb.jp2","uploadDate":"2023","embedUrl":"https://embed-stage.stanford.edu/iframe/?url=https%3A%2F%2Fsul-purl-stage.stanford.edu%2Fvx195sw6395"}</script>
Example 2: (https://sul-purl-uat.stanford.edu/bc169zr6817) checked on Google's Rich Results Test:
<script type="application/ld+json">{"@context":"http://schema.org","@type":"VideoObject","name":"SCRF session - 1 - PIP-II","thumbnailUrl":"https://stacks-uat.stanford.edu/file/druid:bc169zr6817/bc169zr6817_SRF_AWLC_Oct20-3_thumb.jp2","uploadDate":"2020-10-20","embedUrl":"https://embed-uat.stanford.edu/iframe/?url=https%3A%2F%2Fsul-purl-uat.stanford.edu%2Fbc169zr6817"}</script>