-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush and close output files that are full #3032
Changes from all commits
463ffbc
f1b3a4b
e871e77
f9d8bc9
eabd70f
488c789
c99a7d9
033d31e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -243,16 +243,17 @@ setup (const ekat::Comm& io_comm, const ekat::ParameterList& params, | |
m_resume_output_file = last_output_filename!="" and not restart_pl.get("force_new_file",false); | ||
if (m_resume_output_file) { | ||
int num_snaps = scorpio::get_attribute<int>(rhist_file,"GLOBAL","last_output_file_num_snaps"); | ||
|
||
m_output_file_specs.filename = last_output_filename; | ||
m_output_file_specs.is_open = true; | ||
m_output_file_specs.storage.num_snapshots_in_file = num_snaps; | ||
|
||
if (m_output_file_specs.storage.snapshot_fits(m_output_control.next_write_ts)) { | ||
// The setup_file call will not register any new variable (the file is in Append mode, | ||
// so all dims/vars must already be in the file). However, it will register decompositions, | ||
// since those are a property of the run, not of the file. | ||
m_output_file_specs.filename = last_output_filename; | ||
m_output_file_specs.is_open = true; | ||
setup_file(m_output_file_specs,m_output_control); | ||
} else { | ||
m_output_file_specs.close(); | ||
} | ||
} | ||
scorpio::release_file(rhist_file); | ||
|
@@ -299,6 +300,24 @@ void OutputManager::init_timestep (const util::TimeStamp& start_of_step, const R | |
return; | ||
} | ||
|
||
// Make sure dt is in the control | ||
m_output_control.set_dt(dt); | ||
|
||
if (start_of_step==m_case_t0 and m_avg_type==OutputAvgType::Instant and | ||
m_output_file_specs.storage.type!=NumSnaps and m_output_control.frequency_units=="nsteps") { | ||
// This is the 1st step of the whole run, and a very sneaky corner case. Bear with me. | ||
// When we call run, we also compute next_write_ts. Then, we use next_write_ts to see if the | ||
// next output step will fit in the currently open file, and, if not, close it right away. | ||
// For a storage type!=NumSnaps, we need to have a valid timestamp for next_write_ts, which | ||
// for freq=nsteps requires to know dt. But at t=case_t0, we did NOT have dt, which means we | ||
// computed next_write_ts=last_write_ts (in terms of date:time, the num_steps is correct). | ||
// This means that at that time we deemed that the next_write_ts definitely fit in the same | ||
// file as last_write_ts (date/time are the same!), which may or may not be true for non NumSnaps | ||
// storage. To fix this, we recompute next_write_ts here, and close the file if it doesn't. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor: doesn't ... what? fit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall, this seems a bit too complex? Some questions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true for storage type "NumSnaps". But with type "one_month" (say), we can't say if the file is full unless we know the time stamp of the next write. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, at a given time step, we know (can calculate) its time stamp, right? Then, why can't we deduce if one_month or one_day is ending here? Do we not know dt? I understand the logic can be too convoluted, but it is still doable, no? We don't have to do it now, but trying to understand if it is doable at all (ignoring the fact that we may choose not to make the code super ugly for some corner case) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We cannot compute the next time stamp during t0 output. The driver does not have info about If you want to compute |
||
m_output_control.compute_next_write_ts(); | ||
close_or_flush_if_needed (m_output_file_specs,m_output_control); | ||
} | ||
|
||
// Check if the end of this timestep will correspond to an output step. If not, there's nothing to do | ||
const auto& end_of_step = start_of_step+dt; | ||
|
||
|
@@ -340,10 +359,6 @@ void OutputManager::run(const util::TimeStamp& timestamp) | |
"The most likely cause is an output frequency that is faster than the atm timestep.\n" | ||
"Try to increase 'Frequency' and/or 'frequency_units' in your output yaml file.\n"); | ||
|
||
// Update counters | ||
++m_output_control.nsamples_since_last_write; | ||
++m_checkpoint_control.nsamples_since_last_write; | ||
|
||
if (m_atm_logger) { | ||
m_atm_logger->debug("[OutputManager::run] filename_prefix: " + m_filename_prefix + "\n"); | ||
} | ||
|
@@ -373,6 +388,14 @@ void OutputManager::run(const util::TimeStamp& timestamp) | |
const bool is_full_checkpoint_step = is_checkpoint_step && has_checkpoint_data && not is_output_step; | ||
const bool is_write_step = is_output_step || is_checkpoint_step; | ||
|
||
// Update counters | ||
++m_output_control.nsamples_since_last_write; | ||
if (not is_t0_output) { | ||
// In case REST_OPT=nsteps, don't count t0 output as one of those steps | ||
// NOTE: for m_output_control, it doesn't matter, since it'll be reset to 0 before we return | ||
Comment on lines
+394
to
+395
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmmmm.... I see where things get weird! I wonder if shifting the indexing altogether can help? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tbc, this small issue is unrelated from closing the file at the right time. To be honest, I think we could flat out rm the line that updates nsamples_since_last_write for checkpoint control: it is not used anyways! And since, as the comment states, for output control it doesn't matter, we may as well remove this if block altogether... |
||
++m_checkpoint_control.nsamples_since_last_write; | ||
} | ||
|
||
// Create and setup output/checkpoint file(s), if necessary | ||
start_timer(timer_root+"::get_new_file"); | ||
auto setup_output_file = [&](IOControl& control, IOFileSpecs& filespecs) { | ||
|
@@ -393,10 +416,6 @@ void OutputManager::run(const util::TimeStamp& timestamp) | |
snapshot_start = m_case_t0; | ||
snapshot_start += m_time_bnds[0]; | ||
} | ||
if (filespecs.is_open and not filespecs.storage.snapshot_fits(snapshot_start)) { | ||
release_file(filespecs.filename); | ||
filespecs.close(); | ||
} | ||
|
||
// Check if we need to open a new file | ||
if (not filespecs.is_open) { | ||
|
@@ -546,10 +565,7 @@ void OutputManager::run(const util::TimeStamp& timestamp) | |
scorpio::write_var(filespecs.filename, "time_bnds", m_time_bnds.data()); | ||
} | ||
|
||
// Check if we need to flush the output file | ||
if (filespecs.file_needs_flush()) { | ||
flush_file (filespecs.filename); | ||
} | ||
close_or_flush_if_needed(filespecs,control); | ||
}; | ||
|
||
start_timer(timer_root+"::update_snapshot_tally"); | ||
|
@@ -908,7 +924,18 @@ void OutputManager::set_file_header(const IOFileSpecs& file_specs) | |
set_str_att("Conventions","CF-1.8"); | ||
set_str_att("product",e2str(file_specs.ftype)); | ||
} | ||
/*===============================================================================================*/ | ||
void OutputManager:: | ||
close_or_flush_if_needed ( IOFileSpecs& file_specs, | ||
const IOControl& control) const | ||
{ | ||
if (not file_specs.storage.snapshot_fits(control.next_write_ts)) { | ||
scorpio::release_file(file_specs.filename); | ||
file_specs.close(); | ||
} else if (file_specs.file_needs_flush()) { | ||
scorpio::flush_file (file_specs.filename); | ||
} | ||
} | ||
|
||
void OutputManager:: | ||
push_to_logger() | ||
{ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,6 +118,10 @@ class OutputManager | |
void finalize(); | ||
|
||
long long res_dep_memory_footprint () const; | ||
|
||
// For debug and testing purposes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mind elaborating what debug and testing we mean here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, I want to be able to verify the correctness of the control/filespecs structs during unit tests. The comment was meant to say "this is not really needed at runtime". |
||
const IOControl& output_control () const { return m_output_control; } | ||
const IOFileSpecs& output_file_specs () const { return m_output_file_specs; } | ||
protected: | ||
|
||
std::string compute_filename (const IOControl& control, | ||
|
@@ -133,6 +137,10 @@ class OutputManager | |
void setup_file ( IOFileSpecs& filespecs, | ||
const IOControl& control); | ||
|
||
// If a file can be closed (next snap won't fit) or needs flushing, do so | ||
void close_or_flush_if_needed ( IOFileSpecs& file_specs, | ||
const IOControl& control) const; | ||
|
||
// Manage logging of info to atm.log | ||
void push_to_logger(); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above.