Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter OAI output #680

Open
2 tasks
mlhale7 opened this issue Aug 23, 2024 · 0 comments
Open
2 tasks

Filter OAI output #680

mlhale7 opened this issue Aug 23, 2024 · 0 comments

Comments

@mlhale7
Copy link
Collaborator

mlhale7 commented Aug 23, 2024

Story

Currently the OAI-PMH feed includes attachments that we will never want to share. Types of files we don't want to share include OCR, HOCR, TEI, transcripts, and OBJs. Basically we only want the parent "Work" to be included directly and all files attached to the work should be left off.

In addition to this filtering, we need to be able to either directly include links to the object and links to the thumbnail image in the OAI-PMH or be able to generate this from an identifier that will be included in the OAI-PMH.

Acceptance Criteria

  • Only parent works are present in the OAI-PMH feed
  • All works included in the feed have a link to the work record and a link to the thumbnail image as they will be structured on the live site for all metadata formats (both oai_dc and mods)

Screenshots / Video

Here is an example of the object and thumbnail links in MODS on Islandora for reference so that this can be easily added:

<location>
     <physicalLocation valueURI="http://id.loc.gov/authorities/names/no2014027633">University of Tennessee, Knoxville. Special 
     Collections</physicalLocation>
     <url access="object in context" usage="primary 
     display">https://digital.lib.utk.edu/collections/islandora/object/acwiley%3A319</url>
     <url access="preview">https://digital.lib.utk.edu/collections/islandora/object/acwiley%3A319/datastream/TN/view</url>
</location>

Note that <physicalLocation> has to come before <url> for the MODS to be follow the schema - https://www.loc.gov/standards/mods/userguide/location.html

Here are samples of records that should NOT appear in the feed:

    <record>
        <header>
            <identifier>oai:hyku:24ac22aa-106b-4a88-a346-9e264d13d972</identifier>
            <datestamp>2023-09-01T03:59:26Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:publisher>utk</dc:publisher>
                <dc:rights>http://rightsstatements.org/vocab/InC/1.0/</dc:rights>
                <dc:title>504 error (shana)</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>
    <record>
        <header>
            <identifier>oai:hyku:9a77b15d-554d-4dfc-a49f-09fbcee8118c</identifier>
            <datestamp>2024-07-22T23:24:01Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:description>OCR for alumnus:1507298774</dc:description>
                <dc:title>OCR</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>
    <record>
        <header>
            <identifier>oai:hyku:2f4595f1-7611-490a-b4b6-94336055d037</identifier>
            <datestamp>2024-07-22T23:23:58Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:description>TRANSCRIPT for jsevier:9</dc:description>
                <dc:title>TRANSCRIPT</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>
    <record>
        <header>
            <identifier>oai:hyku:f47c15e5-5055-44b7-9b55-526b3e3bfc68</identifier>
            <datestamp>2024-07-22T23:23:58Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:description>TEI for jsevier:9</dc:description>
                <dc:title>TEI</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>
    <record>
        <header>
            <identifier>oai:hyku:1211e8b1-040f-439d-b5d0-f1ad54f42e8d</identifier>
            <datestamp>2024-07-22T23:23:59Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:description>OBJ for jsevier:25</dc:description>
                <dc:title>OBJ</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>
    <record>
        <header>
            <identifier>oai:hyku:72f7889f-da05-4363-a42a-c74354351672</identifier>
            <datestamp>2024-07-22T23:24:00Z</datestamp>
            <setSpec>collection:admin_set/default</setSpec>
        </header>
        <metadata>
            <oai_dc:dc
                xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
                xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                <dc:description>OBJ for jsevier:2</dc:description>
                <dc:title>OBJ</dc:title>
            </oai_dc:dc>
        </metadata>
    </record>

Testing Instructions and Sample Files

Notes

@mlhale7 mlhale7 added this to utk-hyku Aug 23, 2024
@mlhale7 mlhale7 converted this from a draft issue Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for Development
Development

No branches or pull requests

1 participant