Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item identifiers including filename extensions #109

Open
ghukill opened this issue Jan 31, 2025 · 1 comment
Open

Item identifiers including filename extensions #109

ghukill opened this issue Jan 31, 2025 · 1 comment
Labels
question Further information is requested

Comments

@ghukill
Copy link

ghukill commented Jan 31, 2025

In a recent PR, a question + response comment touches on this question: #105 (comment).

Two interlated questions emerge for me:

  1. should item identifiers in DSC ever contain filename extensions (e.g. ".pdf")?
  2. if they do include them, do these directly become identifiers anywhere in DSpace?

I understand that stakeholders, at least in the past, have included filename extensions in some kind of Item Identifier column in CSVs used for uploads. If the answer to #2 above is "No", then it probably doesn't matter; it sounds like it's a 100% internal identifier that is gone after DSC and DSS ingest the item.

Another question though, how does this scale for items that may have multiple files? Suppose it's a SimpleCSV style workflow, and there are 3 PDFs? Is this when we get into juggling _01 suffixes in the filenames? and is that somehow stripped from the item identifier?

@ghukill ghukill added the question Further information is requested label Jan 31, 2025
@ehanson8
Copy link
Contributor

Ahhh, I get your concern now. # 2 is a no, it is purely for DSC and DSS and it is never written to the metadata.

create_dspace_metadata even has this line to ensure it never goes in:

        for field_name, field_mapping in self.metadata_mapping.items():
            if field_name not in ["item_identifier"]:

                field_value = item_metadata.get(field_mapping["source_field_name"])

And the item identifier in the metadata CSV can be D123.pdf but it is just as often D123 when multiple files are expected (e.g. D123_01.pdf, D123_verso.pdf). Checking that the item identifier is "in" the file name just happens to work in both cases

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants