Skip to content

BDBag release 1.5.0

Compare
Choose a tag to compare
@mikedarcy mikedarcy released this 18 Oct 23:09
· 136 commits to master since this release

Release Notes

Milestone feature release

  • Added materialize CLI and API function. The materialize function is basically a bag bootstrapper. When invoked, it will attempt to fully reconstitute a bag by performing multiple actions depending on the context of the input path parameter. If path is an actionable URL or a URI of a resolvable identifier scheme, the file referenced by this value will first be downloaded to the current directory. Next, if the path value (or previously downloaded file) is a local path to a supported archive format, the archive will be extracted to the current directory. Then, if the path value (or previously extracted file) is a valid bag directory, any remote file references contained within the bag's fetch.txt file will attempt to be resolved. Finally, full validation will be run on the materialized bag. If any one of these steps fail, an error is raised.

  • Refactored identifier resolution into a modular plug-in system. Added support for DOI and DataGUID identifier schemes in addition to existing ARK/Minid schemes. Additional schemes can be supported by creating a compliant "plug-in" resolver class and configuring it via the bdbag.json configuration file.

  • Bagit specification version compliance is now configurable. The default specification version used is 0.97 which permits heterogeneous mixing of checksums in bag payload manifests. Fixes #27 and reverts the restriction introduced in release 1.3.0.

  • Implement cloud storage fetch transports for access to secured Amazon S3 and Google Cloud Store via boto3 library. GCS bucket and object access via boto3 is only supported when the target GCS bucket is set to "interoperability mode". The boto3 library is an optional runtime dependency and need only be installed if support for automatic download of S3 or GS URLs from fetch.txt entries is desired. Various parameters relating to the operation of this fetch handler are exposed via the bdbag.json configuration file and can be tuned accordingly. Fixes #25.

  • Numerous improvements to HTTP fetch handler:

    • Support for "Authorization" header based authentication via the keychain.json configuration file. This authentication mode allows for Bearer Token authentication scenarios such as those used in OAuth 2.0 authorization flows.
    • Improved handling for cookie-based authentication. Added a configurable mechanism that scans for multiple Mozilla/Netscape/CURL/WGET compatible cookie files, merges them, and automatically uses them in outbound HTTP fetch requests.
    • Exposed some of the requests module's session parameters in the bdbag.json configuration file. This allows for tuning such values as connect/read retry count, backoff factor, and the status code retry forcelist, along with the option of disabling automatic redirect following.
  • Refactored bdbag.json configuration file processing into a separate module and significantly increased the scope of the configuration file. Added a basic mechanism for versioning the configuration file and upgrading existing config files to newer versions while preserving forward-compatible configuration settings, when possible.

  • Improved unit test coverage.

  • Updated documentation.