Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a common data package format to deliver data #88

Open
RKrahl opened this issue Aug 28, 2018 · 0 comments
Open

Use a common data package format to deliver data #88

RKrahl opened this issue Aug 28, 2018 · 0 comments
Labels
enhancement New feature or request idea An idea that might need some discussion first

Comments

@RKrahl
Copy link
Member

RKrahl commented Aug 28, 2018

At the moment, the getData call returns either a single file or a ZIP file containing multiple files. In the latter case, the ZIP file contains only the requested files and the path of datafiles within the ZIP file is determined by the zipMapper class in the plugin.

The present feature request suggests to use some standardized package format in the case of returning a ZIP. Such a package should contain some minimum set of metadata along with the files. There are different efforts to define a common package format:

  • BagIt, which is a draft IETF specification.
  • A recommendation from the RDA Research Data Repository Interoperability WG defines a package format that is based on BagIt but adds a few more requirements on the included metadata.
  • The have been a Approaches to Research Data Packaging BoF meeting at the last RDA plenary in March with the goal to start another group on data packaging in RDA. I don't know if such a group is going to be established though. The session page links more existing package formats.

In general, these package formats may be serialized as ZIP files, so this would fit into the schema of the getData call. The advantage of this would be to have some metadata included in the returned data, so that there is a chance to understand what this blob of data is supposed to be. The metadata also include manifest files with checksums, so that the receiving end may check the integrity of the data. Another advantage would be improved interoperability with other tools and repositories that are able to understand the package format. The drawback would be a little more effort in preparing the package and slightly larger downloads. I would estimate that the difference might be negligible compared to the size of the original data, though.

@RKrahl RKrahl added enhancement New feature or request idea An idea that might need some discussion first labels Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request idea An idea that might need some discussion first
Projects
None yet
Development

No branches or pull requests

1 participant