Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Provide Tabular Data Packages as a download #222

Closed
Stephen-Gates opened this issue Apr 28, 2018 · 5 comments
Closed

Provide Tabular Data Packages as a download #222

Stephen-Gates opened this issue Apr 28, 2018 · 5 comments

Comments

@Stephen-Gates
Copy link

While there are lots of interesting extras in the datapackage.zip provided for download, it goes beyond the basic tabular data package specification preventing the file from being opened and unpacked by apps such as Data Curator.

Please consider providing Tabular Data Packages for download.

@rufuspollock
Copy link
Member

rufuspollock commented Apr 28, 2018

@Stephen-Gates can you detail exactly what is in there that prevents it being opened -- we of course want it openable in Data Curator 😄

@Stephen-Gates
Copy link
Author

@rufuspollock we check for "one, and only one json file" in the package at #L36 in importPackage.js

When I download the zip from https://datahub.io/core/currency-codes#data, In Data Curator I get...

screenshot 2018-04-29 06 24 15

@mattRedBox the datapackage.json contains some package properties that are part of the spec that we currently don't support, e.g.

screenshot 2018-04-29 06 30 48

and some others that aren't in the spec, e.g.

screenshot 2018-04-29 06 31 41

Other metadata extensions and variations from a tabular data package in the datapackage.json that I'm not sure how we handle, include:

  • at the data package level:
    • a readme property (assume Data Curator would ignore)
    • no "profile": "tabular-data-package"
    • includes a data package view (assume Data Curator would ignore)
  • at the data resource level:
    • datahub property (assume Data Curator would ignore)
    • a json data resource "profile": "data-resource"
    • a csv data resource is also "profile": "data-resource" not "profile": "tabular-data-resource"

Matt, If we got past the "one, and only one json file" error, I assume we'd just drop the above properties until we support them (see Data Curator #730, 419). Would Data Curator object to anything else?

Given the spec says,

A Data Package descriptor MUST be a valid JSON object. (JSON is defined in RFC 4627). When available as a file it MUST be named datapackage.json and it MUST be placed in the top-level directory (relative to any other resources provided as part of the data package).

We should be able to locate the datapackage.json in the zip file and ignore the other json resources.

@ghost
Copy link

ghost commented Apr 30, 2018

Hi @Stephen-Gates
If you need me to change the behaviour of the app, can certainly do that. My understanding from what you have told me is that we want Data-Curator to be opinionated, so enforcing what the spec says. I guess from reading the snippet you provided, it doesn't say specifically that there can be only 1, but the spec also uses 'it/descriptor' ie: singular, which I interpret to mean there should be only 1. If that's not correct, some clarity in the spec would help, but otherwise you just let me know - and can change behaviour in some later sprint if there's room. Would have to check about what happens with other properties as much of the time we just pull them in or push out and ignore what's not needed - but then again there are also places where properties are completely reset, so not 100% sure off top-of-my-head - set a reminder for me somewhere and I can look into it with an example zip to check.

@ghost
Copy link

ghost commented Apr 30, 2018

ok just saw the issue raised. Understood.

@Stephen-Gates
Copy link
Author

@mattRedBox we are opinionated and a tabular data package only has one JSON file in the spec.

Let’s put it on the Data Curator Backlog and hope DataHub publishes tabular data packages or we find time to address in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants