Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

settle on IO naming scheme #3001

Open
mahf708 opened this issue Sep 18, 2024 · 10 comments
Open

settle on IO naming scheme #3001

mahf708 opened this issue Sep 18, 2024 · 10 comments

Comments

@mahf708
Copy link
Contributor

mahf708 commented Sep 18, 2024

We should likely settle on a naming scheme for our IO. Currently, we have three different file classes of interest, with differing schemes

file type current pattern purpose
restart [case_name].scream.r.[restart_spec].[date_spec].nc restart simulation from a checkpoint
restart history [any_string].rhist.[restart_spec].[date_spec].nc restart simulation from a checkpoint
history [any_string].[output_spec].[date_spec].nc save simulation output

Proposed changes

file type proposed pattern
restart [case_name].[model_name].r.[restart_spec].[date_spec].nc
restart history [case_name].[model_name].rh.[any_string].[output_spec].[date_spec].nc
history [case_name].[model_name].h.[any_string].[output_spec].[date_spec].nc

Some notes

  • any_string will come from the yaml files, e.g., myNcQcQr
  • output_spec and restart_spec are EAMxx-specific, e.g., INSTANT.nhours_x3, for output and restart frequency
  • restart history takes output specs in its name to avoid bugs like bug in same out, different freq scenario #2981
  • model_name is scream for now, but we should change that to eamxx at some point

Rationale

  • this will make us follow E3SM conventions while maintaining our own custom IO settings, a decent compromise
  • this will enable us to make use of extensive infrastructure tooling (e.g., short-term archive, which is good for production but equally important for testing) without long-standing silent bugs...

Comment and vote below!

@bartgol
Copy link
Contributor

bartgol commented Sep 18, 2024

For rhist files, you have [case_name].[model_name].rhist.[any_string].[output_spec].[restart_spec].[date_spec].nc. I don't think we need to put [restart_spec] in the filename. The driver (or, better the OM) can reconstruct the rest from runtime options, but with this pattern it would have to know restart_specs of the previous run, which is annoying. Besides, with [any_string].[output_spec] in the filename, we are already guaranteed to be unique. Also, without [restart_spec], the .h file name is very close to the .rhist file name, so we can easily match them.

@bartgol
Copy link
Contributor

bartgol commented Sep 18, 2024

Side question: do ppl prefer .h or .hist? I don't have a strong preference, but since we have .rhist, having .hist may make things more consistent? Dunno, just a thought. I'm fine either way.

@PeterCaldwell
Copy link
Contributor

Maybe just .h and .rh? To call them history files feels anachronistic to me anyways, so might as well bury the meaning entirely while minimizing name length?

@mahf708
Copy link
Contributor Author

mahf708 commented Sep 30, 2024

Updated the op with Peter's recommendation as well as Luca's recommendation.

@bartgol
Copy link
Contributor

bartgol commented Oct 1, 2024

Should we set an end date for this discussion? That is, if no objection is heard by Oct X, we just implement what we currently have in the table above?

@bartgol
Copy link
Contributor

bartgol commented Oct 2, 2024

I'm going to propose another convention for the suffixes:

  • model restart: .r
  • history: .h
  • history restart: .h.r

@PeterCaldwell
Copy link
Contributor

I like .r and .h, but why .h.r instead of .rh or .hr? Extra periods seem annoying.

@bartgol
Copy link
Contributor

bartgol commented Oct 2, 2024

Well, it would have allowed me to treat all restart files with a single regex in config_archive.xml. Lazy me. I will go with .rh, since that may still do the trick actually (i have to check).

@mahf708
Copy link
Contributor Author

mahf708 commented Oct 2, 2024

In config_archive.xml, we should distinguishe between the three types, so I don't think you will gain anything from any shortcut here. Let's go with rh to keep it as close to the general conventions as possible.

e.g., for EAM

    <rest_file_extension>[ri]</rest_file_extension>
    <rest_file_extension>rh\d*</rest_file_extension>
    <rest_file_extension>rs</rest_file_extension>
    <hist_file_extension>h\d*.*\.nc$</hist_file_extension>

@bartgol
Copy link
Contributor

bartgol commented Oct 2, 2024

I just thought that that XML syntax does not explain the kind of restart file. Since CIME does not seem to care about the fine-grain distinction about model/hist/surface restart files, I was thinking that just using .r. in all restart-type files would have been a universal format for CIME.

But yeah, I ended up using rh for model output restart files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants