-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ZEP 9 (extension naming) draft #65
base: main
Are you sure you want to change the base?
Conversation
🛠️ We propose defining two categories of names for immediate use by extensions: | ||
raw names and URI-based. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a good place to add an explanation for why exactly we want to use URIs instead of some other kind of prefixing scheme. I understand that URIs are mentioned in the spec, but the spec doesn't provide any reason for their use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aspect of URLs that I like is that it delegates the name registration problem to DNS. But I also am unclear on the advantages of using fake https URLs compared to something slightly less verbose like <domain>/<suffix>
which less strongly implies a real resource. Someone is surely going to try to fetch the documents, find them missing, and think that the URLs are just stale. Then also there is the question of using http or https --- or are other schemes like ftp, etc. allowed?
Aside from that, the added https://
part just adds extra verbosity. Tensorstore relies on just the json representation itself for specifying certain metadata, and making it more verbose just makes it harder to read and write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the proposal a bit more, I see that the URLs are recommended to resolve to some description of the extension, that is just not a requirement. I was thrown off a bit by the later examples showing ttps://zarr.dev/array/data_type
which didn't seem like a plausible actual URL, since for one thing it does not specify zarr v3 at all, and also having a separate document for each field in the zarr v3 metadata was also rather different from the current documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a good place to add an explanation for why exactly we want to use URIs
Thanks, will do.
I see that the URLs are recommended to resolve to some description of the extension, that is just not a requirement
Is this generally a positive? A request that it's a requirement?
I was thrown off a bit by the later examples showing
ttps://zarr.dev/array/data_type
which didn't seem like a plausible actual URL
Sorry, perhaps chosen too quickly. I'll review the example for plausibility.
Josh thanks so much for this important ZEP which addresses some critical issues around V3. 🙏 |
Regarding "raw" name registration: Can you clarify a bit more how you are proposing this to work?
I think it would be better to leave delegation to the URL-based names, or at least names that are clearly prefixed by the delegate name, in order to make the nature of the name clearer. |
Regarding "@context": my expectation is that the nature of the zarr metadata is such that each name is typically only specified once per a given array metadata, and furthermore other than the standard names in the zarr specification itself, a given URL prefix is also unlikely to be used more than once. Consequently, I'm concerned that adding an additional "@context" mechanism (similar to xmlns) would not really help with verbosity, but would make it more complicated for computers to parse, and harder for humans to read the format easily, since they would now need to keep track of this additional name indirection mechanism. |
In general, I would say we're trying to define how things get technically added more than the decision making but a few point from my perspetive:
I'd probably tend to less, say, schema reviewing and more an evaluation of whether or not the name can be safely given out. Will it lead to confusion, name squatting, etc. But open for discussion and perhaps a matter for additional governance document.
I think that depends on which versioning scheme is chosen by the implementation. If versioning is "within extension" then it's just one PR. If it's "versioning by name", then it would be multiple, but that would also put the burden on the reviewers to be comfortable with accepting the multiple names.
Agreed.
Phase 3 definitely needs multiple proposals with weighted pros & cons after phase 1 and phase 2 are complete. I think it's useful though to take a full specification that doesn't need designing and see which modifications (if any) are needed to make it useful. For example, is a single flag in the metadata which states which version of the context one is currently using sufficient?
This is a bit like RDFas "initial contexts": Were this elevated to a zarr "extension point" this could then also take a URI (at the risk of verbosity). P.S. apologies to @context. I suggest we write |
Pushed the minor clarifications. I would work towards merging this as a draft ZEP. Questions and comments would continue to be welcome in issues, PRs, Zulip, or most other places. The bulk of feedback on phase 1, however, is likely better placed on the PR with spec changes in zarr-developers/zarr-specs#330. Once those clarifications have been released (e.g., tagged as Timeline:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should merge this ASAP. According to the ZEP process, the feedback period starts once the draft is published. Let's use the spec PR to discuss and give further feedback.
URIs have been chosen do to ther self-documenting nature | ||
therefore the URL SHOULD resolve to a human-readable explanation of the extension, but | ||
implementations SHOULD NOT attempt to resolve the URL during processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a URI is self-documenting only if it resolves to some form of documentation, which is not in general true of URIs, so I wouldn't say they have a "self documenting nature", plus I'm not sure about the "self" part -- we want the zarr extension to be documented, not the URI.
IMO there's 0 value to URIs that don't resolve to anything -- they are just long strings, and everyone will ask why e.g. https is needed if the string doesn't resolve to a resource. But there would be value to a URL that resolves to JSON schema, for example. That's a concrete resource that clients can do something with, and I think STAC uses URLs this way.
This ZEP drafted with with @normanrz and reviewed by the rest of the @zarr-developers/steering-council attempts to unblock lingering spec-related issues like:
TODOs: