Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up DataPackage to address issues with submitting datasets #2567

Open
1 of 10 tasks
robyngit opened this issue Nov 13, 2024 · 0 comments
Open
1 of 10 tasks

Clean up DataPackage to address issues with submitting datasets #2567

robyngit opened this issue Nov 13, 2024 · 0 comments

Comments

@robyngit
Copy link
Member

robyngit commented Nov 13, 2024

We've been encountering persistent issues with handling of dataset submission errors. The DataPackage collection is a central component of this process, but in the 7+ years since its creation, it's accumulated some issues that may be contributing to the problems we're seeing. Cleaning up the DataPackage will help us improve error handling and make it easier to add unit tests and resource maps validation.

Assessment of DataPackage:

  • The DataPackage is a Backbone Collection being used like a Model, which complicates tracking and responding to property changes.
  • Both DataPackage and PackageModel have methods that handle serialization of resource maps.
  • There's no validation of resource maps before they are serialized and saved. Since Metacat also does not validate resource maps, invalid resource maps can be saved without error, which leads to broken datasets (the "missing files" issue).
  • Some errors during the save process are not caught or communicated to the user, which results in the endless spinner issue.
  • The rdflib dependency has not been updated in over 7 years
  • The DataPackage contains methods that are incomplete or unused (e.g. transferQueue)
  • The DataPackage was intended to replace the older PackageModel, but this transition was never fully completed. Some functionality is still dependent on PackageModel which is still being used in some places.

Where DataPackage vs PackageModel are in use:

  • DataPackage (to cleanup)

    • EML211EditorView
  • PackageModel (to deprecate)

    • SearchResultView
    • DownloadButtonView
    • PackageTableView (deprecated)
  • Both DataPackage & PackageModel

    • DataPackageView
    • MetadataView

During the cleanup, we will:

  • Add validation for resource maps before submission
  • Improve error detection and handling during the save process
  • Refactor DataPackage as a Model containing a Collection of DataONEObject models (EML/ScienceMetadata, DataObjects, and nested DataPackages)
  • Separate system metadata into its own model
  • Fully transition from PackageModel to DataPackage, deprecate PackageModel.
  • Fix all linting errors and warnings.
  • Update rdflib to the latest version.
  • Make sure we don't set multiple listeners on the same event (remove listeners before re-adding them).
  • Write unit tests for DataPackage, at least for the most critical parts.
  • Simplify and modularize complex methods
@robyngit robyngit self-assigned this Nov 13, 2024
robyngit added a commit that referenced this issue Nov 13, 2024
robyngit added a commit that referenced this issue Nov 13, 2024
robyngit added a commit that referenced this issue Nov 14, 2024
In MetadataView. Necessary because we removed two unused props from DataPackage.

Issue #2567
robyngit added a commit that referenced this issue Dec 18, 2024
- To handle file uploads, downloads, transformations, etc. more robustly

Issue #2567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant