-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String handling #111
Comments
@znichollscr thanks so much for getting in touch ! I absolutely do have thoughts (!) Unfortunately, its rather complicated. I'll just quickly list the key problems I think are involved here, and afterward relate them to what I think is going on ...
|
For reference, I'm hoping to get the v0.2 release out in the next week or so. Sadly I'm not sure we can promise to squeeze fixes to string data handling into this release, but we might take a look at it. I don't have more time this morning to explain more details of the problems you are reporting (though I certainly will do soon), but it would be useful to know what you really need to achieve here -- is ncdata is really important, or do you just need to fix it so you can handle this type of data with Iris ? |
Hi @pp-mo, thanks for the great reply.
Good question. Actually, the answer is, for the very immediate future: I don't need this at all, I just worked around it. However, given that I'd noticed this, I figured I would report it. The back story of what I'm actually trying to do is below, if it's of interest. Back storyI'm helping out with getting all the forcings data published for the CMIP7 AR7 fast-track. As I go along, I'm just trying to put issues in helpful places to generally try and help out the ecosystem (e.g. this issue was a result of trying to get a dust dataset, which includes a region dimension, in the right format for the ESGF PCMDI/input4MIPs_CVs#140). In practice, really what I'm trying to do is work out a sane way to write CF-compliant, CMIP-controlled-vocabulary-compliant netCDF files. When I started https://github.com/climate-resource/input4mips_validation, the simplest way was to use iris. However, in general I find it way easier to work with xarray. In trying to work out how to convert from xarray to iris, I stumbled upon ncdata. I then learnt about the existence of https://github.com/NCAS-CMS/cf-python. That makes it pretty obvious: if you want to write CF-compliant files, use the package which implements the spec. Having said that, I didn't want to convert my entire stack to being cf-python based. (I think that's also the genius of ncdata, you don't have to choose, you just work in whatever format is best for you then convert at the end.) So, my next thought was, help out with the ncdata cf-python converter, which is why I popped up here: #95 (although, as you can tell, I've then had no time since). The issue probably boils down to this: the only package that is actually tightly coupled to the netCDF API is netCDF4. All the other packages (except maybe ncdata) are data containers, so it seems like they make a tradeoff between complete faith to what is in the underlying netCDF file and usability. However, I don't know the netCDF spec well enough to know if that's actually the case, or whether there is a package other than netCDF4 which will just give you a view into a netCDF file, without doing helpful conversions along the way (again, maybe that's the goal of ncdata?). In the meantime, I'm just bumbling my way through, working around quirks as I need and doing my best to report things where I see them without creating a ton of noise for everyone else. |
At the outset, I'm not sure if this is a user error or a bug in ncdata or a bug upstream in xarray/iris or a bug downstream in netCDF4. I'm asking because it occurs in ncdata, but feel free to send me looking elsewhere if that makes more sense.
I was playing around with writing strings into a netCDF file. There seems to be multiple ways to do this, some of which seem to work fine, others of which raise errors.
For running all these demos, I used a Python 3.11 virtual environment with the following
requirements.txt
file. I'm working on a mac.Requirements
Passing example
If you create the array using a character array, this seems to all be happy
The output netCDF file also looks sensible
Failing example 1 - something to do with encoding
If you create the array using a character array but let netCDF4 do the encoding, the string encoding seems to not work if you load from iris then try and convert with ncdata (suggests the bug is in iris?).
The underlying netCDF file looks sensible though.
Failing example 2 - variable length strings
If you write using a variable length string, then the error appears to come from ncdata. However, iris also can't load the file, so maybe this just isn't a supported use case.
The underlying netCDF seems to be valid, but maybe I'm missing something.
@pp-mo not sure if you have any thoughts?
The text was updated successfully, but these errors were encountered: