Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema.org markup for video objects #834

Merged
merged 1 commit into from
Dec 11, 2023
Merged

schema.org markup for video objects #834

merged 1 commit into from
Dec 11, 2023

Conversation

lwrubel
Copy link
Contributor

@lwrubel lwrubel commented Nov 28, 2023

Resolves #824. Adds schema.org JSON for a videos that are world-downloadable. Example (https://sul-purl-stage.stanford.edu/vx195sw6395):

<script type="application/ld+json">{"@context":"http://schema.org","@type":"VideoObject","name":"Structural relief and fish abundance","description":"In BIOHOPK185 (Ecology and Conservation of Kelp Forest Communities), students learned how to use underwater video as a tool for underwater research. In addition, students were encouraged to use the cameras to collect videos that captured the beauty of the kelp forest and the process of scientific diving at Hopkins Marine Station. In 2023, students were tasked with presenting the results of one-week research projects that used transect video surveys in a five minute movie. This collection contains these movies.\n\nFunding for video equipment was provided by the Stanford Accelerator for Learning through the Gordon and Betty Moore Foundation Grant GBMF10266 to support the work of Virtual Field Trips.","thumbnailUrl":"https://sul-stacks-stage.stanford.edu/file/druid:vx195sw6395/pf804qh8129_kfe_project_2023_mattioli_thumb.jp2","uploadDate":"2023","embedUrl":"https://embed-stage.stanford.edu/iframe/?url=https%3A%2F%2Fsul-purl-stage.stanford.edu%2Fvx195sw6395"}</script>

Example 2: (https://sul-purl-uat.stanford.edu/bc169zr6817) checked on Google's Rich Results Test:
<script type="application/ld+json">{"@context":"http://schema.org","@type":"VideoObject","name":"SCRF session - 1 - PIP-II","thumbnailUrl":"https://stacks-uat.stanford.edu/file/druid:bc169zr6817/bc169zr6817_SRF_AWLC_Oct20-3_thumb.jp2","uploadDate":"2020-10-20","embedUrl":"https://embed-uat.stanford.edu/iframe/?url=https%3A%2F%2Fsul-purl-uat.stanford.edu%2Fbc169zr6817"}</script>

@lwrubel lwrubel force-pushed the t824-video-schema branch 4 times, most recently from e6ca5ce to 6c568b4 Compare December 1, 2023 22:12
@lwrubel lwrubel changed the title [WIP] Initial schema.org markup for video objects [DRAFT] Initial schema.org markup for video objects Dec 1, 2023
@lwrubel lwrubel changed the title [DRAFT] Initial schema.org markup for video objects [DRAFT] schema.org markup for video objects Dec 1, 2023
@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 1, 2023

@arcadiafalcone would you take a look at the specs (especially spec/lib/metadata/schema_dot_org_spec.rb) to see if the Cocina is realistic? If there are other scenarios to include in the test, let me know.

@arcadiafalcone
Copy link
Collaborator

Is there a schema.org reason for joining title and subtitle with a new line rather than the more usual semicolon?

"name": 'My Dataset\nMore title'

The DOI might be in identifier.uri instead of identifier.value for some objects.

{ "value": "https://doi.org/10.25740/hj293cv5980",

And the ORCID may also be in either identifier.value or identifier.uri - MODS doesn't make the distinction.

"identifier": {"uri": "https://orcid.org/0000-0000-0000-0000"}}]

@lwrubel lwrubel force-pushed the t824-video-schema branch 2 times, most recently from db06769 to 6eb5412 Compare December 5, 2023 14:31
@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 5, 2023

Thanks @arcadiafalcone. I'm fixing title and subtitle concatenation since I was erroneously using the same approach as for the description. Will use semi-colon to concatenate.

For the DOI in the identifier.uri, would this spec suffice?

context 'with DOI in identifier uri' do

For the ORCID, the current code / specs handle a contributor having identifier.uri or the contributor having identifier.value with type being "orcid". Is there also a case where it could be identifier.value without a type?

@arcadiafalcone
Copy link
Collaborator

My error, it should be title: subtitle (colon rather than semicolon).

@arcadiafalcone
Copy link
Collaborator

Re: DOI in identifier.uri, so long as it confirms that it is the right domain for a DOI (not all values in this field will be DOIs).

Re: ORCID, identifier.value should always have a type. (If it doesn't, that's a data problem.)

@lwrubel lwrubel force-pushed the t824-video-schema branch 3 times, most recently from d907b40 to 0f4d2f5 Compare December 6, 2023 17:21
@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 6, 2023

Thanks, @arcadiafalcone. I've adjusted the specs and code for these.

@lwrubel lwrubel changed the title [DRAFT] schema.org markup for video objects schema.org markup for video objects Dec 6, 2023
@lwrubel lwrubel marked this pull request as ready for review December 6, 2023 17:38
@@ -44,38 +42,58 @@ def dataset?
false
end

def video?
# Only return video metadata if world-downloadable.
video = JsonPath.new("$.description.form[?(@['value'] == 'moving image' && @['type'] == 'resource type')]").on(@cocina_json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this coming from description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was @arcadiafalcone's recommendation: #824 (comment). Is there a different field in the cocina we should consider?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect https://cocina.sul.stanford.edu/models/media to include audio as well, and we only want video. Am I not understanding the type correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure that I'm guessing here. Alternatively, resource's type = "https://cocina.sul.stanford.edu/models/resources/video"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arcadiafalcone could you make a recommendation about using a fileset resource type vs the descriptive metadata for videos?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descriptive metadata is dependent on the user to create it, so the fileset resource type is probably a bit more reliable (it's a core metadata field that's usually provided, but a user could make an error).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's helpful!

Copy link

@andrewjbtw andrewjbtw Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schema.org expectation that a "video" = streaming video? Or would it include a video file that may be downloadable but isn't presented in a player? If the expectation is streaming, then the fileset resource type is the only guarantee that it will be presented that way.

We may also have description that says something is a video when the video itself hasn't been digitized. I'm not aware of examples but we do have "audio" where only the record label/album cover/liner notes have been digitized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use the fileset resource type for determining it's a video.

end

def access
def access?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this take into account the file permissions as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need help understanding if it's possible to have an object with access.download == "world" and the video itself is not world-downloadable. Is that the scenario you're thinking of, @justinlittman?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible. It's less likely than the inverse case, where the access.download is none but one or more files are world-downloadable.

Copy link
Contributor Author

@lwrubel lwrubel Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the code to look at the access.download permissions for the first file where hasMimeType includes video. It also looks at the object-level rights and for those to be "world". If you think we need to allow those to be download == none, let me know @andrewjbtw

@justinlittman
Copy link
Contributor

@lwrubel Can a video object have more than one video file?

@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 7, 2023

@justinlittman Looking at the canonical examples, I see that it is possible to have multiple video files (e.g. https://purl.stanford.edu/yj807zw8315). I'm trying to figure out what makes the most sense for schema.org.

There is an ItemList with limited usage, but it's intended for video carousels more than multi-file objects. Honestly, I'm not sure if it's important to surface thumbnails or embedUrls of more than one video, if on the PURL we currently show the first one, with access to the rest. They share descriptive metadata. (Not sure if there is rights variability among a list of videos, but we could pull the embedURL and thumbnail for the first video that is world-downloadable.)

@lwrubel lwrubel force-pushed the t824-video-schema branch 2 times, most recently from 08aa387 to 97e4daf Compare December 8, 2023 19:10
@lwrubel lwrubel changed the title schema.org markup for video objects [HOLD] schema.org markup for video objects Dec 8, 2023
@lwrubel lwrubel changed the title [HOLD] schema.org markup for video objects schema.org markup for video objects Dec 8, 2023
@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 8, 2023

Noting here that there seem to very few videos that meet our criteria for the uploadDate of the object having an event.date.type == 'publication'. Next week I will run a report to see how few they really are. I don't know if there are other date types that might be logical for this purpose, @arcadiafalcone. Here's Google's description of the field: https://developers.google.com/search/docs/appearance/structured-data/video#video-object.

@lwrubel
Copy link
Contributor Author

lwrubel commented Dec 8, 2023

I think I've addressed the comments, and this is otherwise fine to proceed on any further code review. We can tweak the date logic or broaden/narrow criteria further if needed, of course.

Copy link
Contributor

@justinlittman justinlittman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry.

image

def format_specific_fields
if dataset?
return { "identifier": identifier,
"isAccessibleForFree": access?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the implication if this is "false"? Does it make sense to include the metadata in those cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google Dataset Search includes datasets that are not free, or require a DUA. There's a "free" filter. So it's worth including non-free datasets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why publicize something that we can't provide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyehodge I am making the assumption that there are SDR-deposited datasets that are not freely available but possible to use, once the user goes through some process (e.g. a DUA or contacting the author). Or that creators of Stanford-only datasets may still want some kind of visibility in Google Dataset Search. But maybe that is not a realistic scenario or something we support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do have datasets like this. One example is https://purl.stanford.edu/sg732vt3619, which has a PURL but in order to gain access you must 1) be a Stanford person and 2) agree to the Data Use Agreement, so the files aren't downloadable from the PURL. We would want this crawled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example here https://purl.stanford.edu/gh587bx9720 where the data files are actually on the PURL.

def video?
# Only return video metadata if world-downloadable.
video = JsonPath.new("$.structural.contains[?(@['type'] == 'https://cocina.sul.stanford.edu/models/resources/video')]").on(@cocina_json)
return true if video.any? && access? && video_access?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name the methods such that it is clearer how access? is different from video_access??

Also, the method name video? suggests that it answers the question whether this is a video. But this method does more than that. Perhaps "render_video_metadata?" or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing access? to object_access?. Changing video? to render_video_metadata?

end

def schema_type?
dataset?
dataset? || video?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is poorly named since it is taking into account rights as well as the schema type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to render_video_metadata?

# need to find the file that is the one for the video (based on mime-type). Then get the access and download rights for that.
file_access = JsonPath.new('$[*].structural.contains[*][?(@.hasMimeType =~ /video/)].access.download').first(video)

return true if file_access == 'world'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L108-110 could just be file_access == 'world'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Done here and in dataset?

@lwrubel lwrubel merged commit db48b44 into main Dec 11, 2023
1 check passed
@lwrubel lwrubel deleted the t824-video-schema branch December 11, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add schema.org markup for Videos
5 participants