Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common location for observational datasets? #37

Open
nusbaume opened this issue Jan 5, 2024 · 7 comments
Open

Common location for observational datasets? #37

nusbaume opened this issue Jan 5, 2024 · 7 comments

Comments

@nusbaume
Copy link
Collaborator

nusbaume commented Jan 5, 2024

Currently the ADF is capable of comparing model simulations to certain reanalysis and observational datasets, which are accessible by all ADF users, at least when running on Casper. However, right now all of those observational datasets are under people's personal directories, for example here:

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_obs

I suspect that this strategy will be difficult to maintain long-term, especially as other non-ADF diagnostics are brought into CUPiD that need their own specialized datasets.

Given this, should there be work to identify a common location where all of the CUPiD-relevant observational datasets are stored? Along those lines are several things that should probably be discussed at some point:

  • The observations directory should be globally readable, but who should have write access?
  • How should these files be organized? Should it be a flat directory structure, or should there be subdirectories?
  • What, if any, metadata should we require "official" observational data files to have?
  • Should the files be backed up somewhere, or at least the scripts that were used to generate the files?
  • Should the data located in this directory be accessible outside the NCAR machines? This would probably only matter if someone was trying to run CUPiD on a machine that didn't have access to the glade filesystem.

I assume it will take multiple group discussions to figure all of this out, but I just wanted to open this issue now, as it may impact how at least the ADF is integrated into CUPiD.

@dabail10
Copy link
Collaborator

dabail10 commented Jan 5, 2024

We have our sea ice observational data in:

/glade/campaign/cesm/development/pcwg/ice/data

One thing we do for the CICE Consortium data is we put it on Zenodo. I think they allow 50GB datasets. We end up breaking the datasets up. We also tag releases of the model here:

https://zenodo.org/communities/cice-consortium?q=&l=list&p=1&s=10&sort=newest

@gustavo-marques
Copy link
Collaborator

We are trying to put all the relevant ocean datasets (obs and reanalysis) under /glade/campaign/cgd/oce/datasets
We are also working on an intake catalog, which can be found at https://github.com/NCAR/oce-catalogs
The idea is that all the datasets in the directory above will be included in this catalog.

@wwieder
Copy link
Collaborator

wwieder commented Jan 12, 2024

not sure how this is supposed to be set up, but trying to run CUPiD out of the box I get the following permission error
PermissionError: [Errno 13] Permission denied: '/glade/campaign/cgd/oce/datasets/cesm/tx2_3/mld/deBoyer2004/deBoyer04_MLD_remapped_to_tx2_3.nc'

@gustavo-marques
Copy link
Collaborator

I changed the permissions on this file and it is now readable by everyone. Let me know if you are still having issues.

@wwieder
Copy link
Collaborator

wwieder commented Jan 13, 2024

I'm still getting the same permission error, and can't see anything below the /glade/campaign/cgd/oce/ directory.

@gustavo-marques
Copy link
Collaborator

Please try again.

@wwieder
Copy link
Collaborator

wwieder commented Jan 16, 2024

That did the trick. Thanks @gustavo-marques

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants