-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recast as a stand alone document #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear David et al.
Thanks for the new version. I have some comments.
-
Since this is now a separate convention, will you put its attributes in Appendix A? Perhaps that would be a good idea, like for UGRID.
-
I am confused about the location variable. The data is contained in an orthogonal array of fragments. I understand that to mean that the decomposition of any given aggregated dimension into fragments is broadcast along all the other fragment dimensions. That means the decomposition needs to be specified only once for each fragment dimension, and I'd therefore expect the location variable to have dimensions (number of fragment dimensions, largest fragment dimension size, 2).
-
The file variable could be omitted if all fragments are in the same file as the aggregation variable.
-
The format variable could be omitted if every fragment is either in the same file as the aggregation variable or is a netCDF file i.e. assume by default than an external file is netCDF.
-
I am confused about the appearance of the heading "Example 1". This doesn't seem to begin an example.
-
Future-proofing would replace Gregorian with standard.
Best wishes
Jonathan
Dear @JonathanGregory Thank you for reviewing this. Here are some responses to your points, which are largely meant to explain why it has been done like it has, not to necessarily imply that we won't change it! We can take from here with this background in mind.
We're currently assuming that the CF conventions do not recognise CFA. If/when this is the case, we'd either move the text into CF, or link to like we do for UGRID. In either case, appendix would indeed get some entries.
I see. Example 2 gives us: // Existing. Location has shape (2, 1, 1, 1, 4, 2)
// Location has shape (4, 2) = shape of fragment array + (number of fragment dims, 2)
location = 0, 5,
0, 0,
0, 72,
0, 143,
6, 11,
0, 0,
0, 72,
0, 143 ; I think you are suggesting: // Jonathan's proposal.
// Location has shape (4, 2, 2) = (number of fragment dims, largest fragment dimension size, 2)
location = 0, 5, 6, 11,
0, 0, _, _,
0, 72, _, _,
0, 143, _ , _; I might also suggest // Another "dask-like" idea: record the sizes, not the ranges.
// Location has shape (4, 2) = (number of fragment dims, largest fragment dimension size)
location = 6, 6,
1, _,
73, _
144, _ ;
This was touched on in #7. We felt that a key feature of these conventions was the ability to aggregate arbitrary file formats, so making a special case for netCDF fragments doesn't really fit in with that. Admittedly the conventions currently only specify how to The case when all fragments are in the parent file is certainly possible, but would be unusual - so creating a special case for
I have assumed CFA does not need to be used with CF, and this example was meant to to show how they can play nicely together. I now think that perhaps this is a mistake, and CFA must be used with CF, in which case the italicised text of example 1 would become main body text (like it was in previous versions). What do @bnlawrence @nmassey001 @sadielbartholomew think?
Cutting edge stuff! Done. |
Dear David
Yes, that's right. Isn't there a lot of redundancy in your earlier version? Your dask-like version is even better, since they must be contiguous.
Alternatively, you could allow the format variable to be a scalar if all the entries in it are the same (all Cheers Jonathan |
Dear @JonathanGregory, I like the idea of using a scalar I'm not so keen on encoding a common format in the @nmassey001 - what do you thank about these format suggestions, and the |
(You may not be surprised to learn that my take on the UML is the text is more ambiguous and if the UML is correct, I personally don't care about the text :-) ... however, I'm in a minority of one on this, but you can bet that I care about 0.6.1!) Before getting into the detail I think we need to address the first paragraph :-). The existing readme states:
But I don't think that's quite right because they don't start from the problem. I'd say:
In doing so we allow the possibility of a CFA aggregation sitting alongside a normal variable and avoid using the phrase "master" file (which I know wasn't in your version, but I am conscious we need to avoid it.) I believe I am otherwise content with things now and agree with using a scalar for format! |
Thanks, @bnlawrence. I have updated the Introduction as you suggest - much better. Do you have an opinion on whether CFA must be used with CF? I'm now of the opinion that there is no use-case for creating a CFA dataset that is not otherwise CF-compliant, and so CFA should be cast as as "add-on" to CF, rather than an add-on to vanilla netCDF. I'll put together a commit for allowing a scalar |
Scalar |
I think there is a use case for using CFA conventions for the aggregation of non-CF compliant netcdf files, but they would need to have enough compliance around coordinates for it to work. Specific examples :
|
Hi Bryan, I think the CF-compliance of the fragments is not the issue, rather the CF-compliance of the of the CFA file. Does that make sense? |
I don't understand how the indexing works here:
Should the first 6 in your version of the location array be a 0? As an aside, I don't think I've picked up on this before, but can we switch to Python / Numpy / C / every sensible programming language style slice definitions? i.e. there are five elements between |
Hi @nmassey001, The first The For example, the current:
would become:
I'm not sure the Fortran and Julia folks would agree with you, but moving to sizes would make the slice definition go away, anyway! |
@davidhassell Aha, they are sizes. Got it! |
Yes, that's fine, other software can ignore the CF'ness and just use aggregation goodies. Sorry for the misunderstanding. |
Thanks @bnlawrence and @nmassey001 - I'll address the CF compliance and location issues in separate commits ... |
Hi @bnlawrence,
I look forward to developing this over at #21 ... :) |
Following an off-line discussion on 2021-07-27, merging this now and tagging master at v0.6 |
Previously the document silently assumed that it was part of the CF conventions, which is not to be the case (at this time, at least). This PR aims to make the document independent.