-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARD workflow for time series analysis of ACCESS-OM2-01 daily output #462
Comments
thanks @sb4233 - do you mind if we edit the issue title so that it's more focused and descriptive? Also: can you provide details on the specific THANKS |
Btw, @sb4233 note that the cosima_cookbook Python package is deprecated so there won't be any method added to it. I think the issue is that the data is chunked in times based on how the files are saved in netCDF files (e.g., every 3 months for 0.1 degree model output). So if one needs to do a time-series analysis at every point you need to rechunk in time. I've bumped onto this before and I didn't find a better solution but perhaps I was just naïve! Btw, you might wanna have a look at the xrft package? Sorry if I misunderstood and this is not something useful. |
@sb4233 would you be able to add some code snippets so we can see what you're trying to do? |
Yeah sure, please go ahead and edit the title.
Thanks for the suggestion, seems like xrft can be useful, as it utilizes dask API.
Nothing special, essentially just trying this function below (
|
Hey @sb4233, hopefully that new title is representative of your use case ( one shared by others ). Next steps might be to access daily ACCESS-OM2-01 via Look forward to documenting better-practice for these specific use cases with you and others. |
@sb4233 - a very useful ref from @dougiesquire et al. and for storage of any temporary intermediate ARD collections on |
@sb4233 et al Here's the kind of overall workflow I'm suggesting each of these specific heuristics could contribute to: You can see and download our full poster from OMO2024 here: https://go.csiro.au/FwLink/climate_ARD |
Hi,
I have been trying to do some spectral analysis using variables from
ACCESS-OM2
output. Due to its large, chunked data doing any kind of analysis is very slow. For example, I am calculating the coherence between two variables (usingscipy.signal.coherence
) at every grid point for a specific domain(356x500)
. Now the actual calculation takes only about 3-4 mins (non-chunked). But due to being chunked it takes forever to do the calculation (as the data is being loaded into memory).As a cheap alternative I found that saving the data as early as possible in my calculation (for example, saving the data just after selecting the variable for the region of interest) works (i.e., reducing the number of operations that I need to do while the data is in chunked state). But even in that case it takes about several hours per variable to save it in a netcdf file.
I wanted to know if there is a better way to effectively chunk large datasets so that processing time can be reduced (as much as possible).
Maybe adding a method to
cosima cookbook
which can dynamically chunk large datasets based on the operation that is being performed on it? I am new to this kind of programming so any help would be much appreciated :)The text was updated successfully, but these errors were encountered: