Replies: 6 comments
-
[I can't edit the original post at the top myself, but here some reflection from my side] Introduction/rationaleAs a community, we should be very happy that cfconventions.org provides a CF convention for trajectory data. However, the present convention (see here) may not be as suitable/flexible to our needs as we would hope. Hence, this discussion is meant to gather thoughts of what we would like to see in a CF convention for Lagrangian particle trajectory data, so that we come to a proposal for the cfconventions team. |
Beta Was this translation helpful? Give feedback.
-
I have two main issues with the present trajectories CF-convention:
|
Beta Was this translation helpful? Give feedback.
-
A few notes:
I don't think that the CF spec requires that they be in that order. For the most part, CF often recommends an order, but doesn't require it. And I don't think in the Trajectory spec that they even recommend. Though I certainly learn best through examples and all the examples have that order. I think that the "trajectory" variable:
is the specification for which dimension is which. That being said, I don't think "easy to plot with MPL" should be the primary motivation for choosing dimension order. dimension order can have a very large impact on performance when reading/writing data, and on working with data in memory. So CF usually recommends that the "fastest varying" dimension (the first one) be chosen to be the one that keeps data together that is liley to be used together -- e.g. for model results, time is usually the first dimension. I think that applies here, too.
CF also has very little to say about what names you use [*] -- obs and trajectory are their examples, but you can use whatever names you want. This does make it a bit awkward to process an arbitrary file, but it is flexible, and once you write the code, it's not too bad:
The "observation" dimension can be called whatever you want. More later .... [*] -- the one exception I know of is that when a variable has the same name as a dimension, then it is a "coordinate" variable. Which means that in the Trajectory format:
The trajectory variable is a coordinate variable. But you could call them both "traj" and that would be perfectly valid. |
Beta Was this translation helpful? Give feedback.
-
In trajan we need to detect the data-layout automatically. Having tested this on several models and different drifter datasets I miss a unambiguous and definite way to know what the layout is. I would prefer the data-layout to be defined in an attribute, or even in a grid_mapping-like variable. Using the latter is kind of a hack that is adapted to netCDF, but it is used in other CF type datasets. It would allow the grid mapping variable to define the layout, the various coordinate variables, and which variables are positions (lon, lat). This would leave much less to chance. A disadvantage with netCDF, Zarr, etc, is that they don't have a proper schema: it is very easy to create incorrectly defined datasets. Trajan already helps drifter-datasets in build CF-compliant datasets, but we should perhaps consider constructing CF-compliant xarray datasets that are suitable for models as well? It would depend on how easy it is to make it general across models needs and optimizations. A few other points that came up when implementing this standard:
My plan for trajan is to have a |
Beta Was this translation helpful? Give feedback.
-
Sorry I haven't had time to reply earlier, and I still don't have time to do it justice, but a couple quick notes:
Indeed, we all want that!
It would be easier with a full grid_mapping, but it should be doable with: trajectory:cf_role = "trajectory_id"; and all the other ways to determine coordinates, standard names, etc. If not, then CF needs a new feature -- and we should propose one. As far as software auto-detection goes: it's a reality that folks don't always do CF, or do it right, but my recommended approach is one of:
(that's what we do in gridded and PyGNOME for gridded model results) or:
That's what's being done in the xarray_subset_grid project. Now that I've got a lot of experience with this, I recommend the second option -- it's much easier to be clear to your users -- it fails early, not buried in the depths of the code, and it simplifies the core code -- you can assume everything is as it should be.
That sounds like my second option -- I agree that that's a good way to go.
This is why I think we do need a extension to CF -- that way we could get the lossless conversion to a full array, and also a more compact and efficient storage layout for the common case of results from particle tracking models. We have been doing that for years with PyGNOME. I have recently started an xarray implimentation of it for working with this format. It's not quite complete, but the idea is sthat you can either:
I think trajan could use (2) sooner than later to easily support these files. Code here -- not complete, but I'd love help: https://github.com/NOAA-ORR-ERD/nc_particles/tree/new_code (look at the new_code) branch. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for these inputs, @ChrisBarker-NOAA ! Yes, you must be right that the dimensions are not supposed to have fixed names (e.g. Line 25 in b75beb7 I see that you say that the trajectory-dimension is always the first dimension of the variable with The trajectory dimension is now stored by TrajAn as We have also made a new commandline utility to inspect the trajectory information: |
Beta Was this translation helpful? Give feedback.
-
@gauteh @erikvansebille
Beta Was this translation helpful? Give feedback.
All reactions