You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@rtodling
I was attempting to run essentially the develop branch of the GEOSadas for reasons that are not pertinent to this issue and run the basic C48f test case which must have not been tested in this branch.
I hit an issue that you will need to resolve that will require proper removal of the bkg files needed by mkiau.x after it runs and BEFORE GEOSgcm.x runs. Here are the details:
I when I ran that test case I was seeing errors when the GEOSgcm.x ran from netcdf at 3z saying it was trying to inquire for the varid given a name from an ncid for a file it was in the process of writing. The error code was that the variable did not exist in the file. I put a print in to determine what the variable was and as you can see it was evap:
AGCM Date: 2019/01/17 Time: 02:52:30 Throughput(days/day)[Avg Tot Run]: 303.1 338.0 338.8 TimeRemaining(Est) 000:01:12 26.8% : 14.9% Mem Comm:Used
Writing: 549 Slices to File: C48f_ben.inst3_3d_asm_Np.20190117_0300z.nc4
Writing: 1083 Slices to File: C48f_ben.inst3_3d_asm_Nv.20190117_0300z.nc4
Writing: 378 Slices to File: C48f_ben.tavg3_3d_cld_Cp.20190117_0130z.nc4
Writing: 365 Slices to File: C48f_ben.tavg3_3d_mst_Ne.20190117_0130z.nc4
Writing: 729 Slices to File: C48f_ben.bkg.eta.20190117_0300z.nc4
Writing: 52 Slices to File: C48f_ben.bkg.sfc.20190117_0300z.nc4
Writing: 290 Slices to File: C48f_ben.cbkg.eta.20190117_0300z.nc4
Writing: 587 Slices to File: C48f_ben.vtx.mix.20190117_03z.nc4
Writing: 729 Slices to File: C48f_ben.asm.eta.20190117_0300z.nc4
bmaa failed write variable EVAP
pe=00085 FAIL at line=00030 NetCDF4_put_var.H <status=-49>
pe=00085 FAIL at line=00842 ServerThread.F90 <status=-49>
pe=00085 FAIL at line=00138 BaseServer.F90 <status=-49>
This was weird and the only plausible way it could be not finding the variable in the file is if the file already existed so I put more prints in and said, if it tries to open an already existing file that contains the experiment id stop. I saw this:
AGCM Date: 2019/01/17 Time: 02:52:30 Throughput(days/day)[Avg Tot Run]: 311.5 353.6 354.4 TimeRemaining(Est) 000:01:10 31.8% : 28.1% Mem Comm:Used
Writing: 549 Slices to File: C48f_ben.inst3_3d_asm_Np.20190117_0300z.nc4
Writing: 1083 Slices to File: C48f_ben.inst3_3d_asm_Nv.20190117_0300z.nc4
Writing: 378 Slices to File: C48f_ben.tavg3_3d_cld_Cp.20190117_0130z.nc4
Writing: 365 Slices to File: C48f_ben.tavg3_3d_mst_Ne.20190117_0130z.nc4
Writing: 729 Slices to File: C48f_ben.bkg.eta.20190117_0300z.nc4
Writing: 52 Slices to File: C48f_ben.bkg.sfc.20190117_0300z.nc4
Writing: 290 Slices to File: C48f_ben.cbkg.eta.20190117_0300z.nc4
Writing: 587 Slices to File: C48f_ben.vtx.mix.20190117_03z.nc4
Writing: 729 Slices to File: C48f_ben.asm.eta.20190117_0300z.nc4
pe=00049 FAIL at line=00265 NetCDF4_FileFormatter.F90 <file exists: C48f_ben.bkg.eta.20190117_0300z.nc4>
pe=00006 FAIL at line=00265 NetCDF4_FileFormatter.F90 <file exists: C48f_ben.cbkg.eta.20190117_0300z.nc4>
pe=00095 FAIL at line=00265 NetCDF4_FileFormatter.F90 <file exists: C48f_ben.bkg.sfc.20190117_0300z.nc4>
I thought, that was odd; why does the file exist? I started re-ran the experiment and stopped it as soon as the GSI started. When I did an
so those files were already there at the time the experiment was created. I realized they must be the background from the previous segment needed for mikau.x. If you look at the History.rc.tmpl you get with the develop branch of the GEOSadas, you will see that the bkg.sfc collection has an "EVAP" variable and that collection does not start writing until 3z to produce the backgrounds for the next segment, which is when the GEOSgcm.x was crashing. BUT the bkg.eta files that get copied in to produce the increments for the current segment when the experiment is created don't have EVAP.
So what is going on is that at 3z, History tries to write the bkg.eta file but it already exists and if the file already exists the server just opens it and tries to write to it so of course the varid inquiry for EVAP fails!
This is really a problem with the DAS scripting
The DAS scripting should be removing the old background files before the GEOSgcm.x runs, a file should not be there that History will try to write; the fact that this worked before means you were just lucky and apparently were not changing the contents of the bkg files.
HistoryGridComp should check when it decides to write a file, if it already exists and error out as it just could lead to a problem at different point in the code when the error is less clear. I will make that change in our development branch so that the existence of the file is caught when History decides it is time to write to a new file and report the file already exists, rather than during the actual writing process when the error is more confusing.
The text was updated successfully, but these errors were encountered:
What I don't understand is: why would overwriting a file depend on what the existing file has. Unless, NC4 overwriting is a very different beast than binary overwriting. Binary overwriting of a sequential file could not care less what was in the original file. But perhaps NC4 opens the content list ... and then, sure, if something got added or removed it would be a problem.
In any case, I will work the removal of the files from the cycle before the model starts.
@rtodling
I was attempting to run essentially the develop branch of the GEOSadas for reasons that are not pertinent to this issue and run the basic C48f test case which must have not been tested in this branch.
I hit an issue that you will need to resolve that will require proper removal of the bkg files needed by mkiau.x after it runs and BEFORE GEOSgcm.x runs. Here are the details:
I when I ran that test case I was seeing errors when the GEOSgcm.x ran from netcdf at 3z saying it was trying to inquire for the varid given a name from an ncid for a file it was in the process of writing. The error code was that the variable did not exist in the file. I put a print in to determine what the variable was and as you can see it was evap:
This was weird and the only plausible way it could be not finding the variable in the file is if the file already existed so I put more prints in and said, if it tries to open an already existing file that contains the experiment id stop. I saw this:
I thought, that was odd; why does the file exist? I started re-ran the experiment and stopped it as soon as the GSI started. When I did an
in the fvwork I saw this:
so those files were already there at the time the experiment was created. I realized they must be the background from the previous segment needed for mikau.x. If you look at the History.rc.tmpl you get with the develop branch of the GEOSadas, you will see that the bkg.sfc collection has an "EVAP" variable and that collection does not start writing until 3z to produce the backgrounds for the next segment, which is when the GEOSgcm.x was crashing. BUT the bkg.eta files that get copied in to produce the increments for the current segment when the experiment is created don't have EVAP.
So what is going on is that at 3z, History tries to write the bkg.eta file but it already exists and if the file already exists the server just opens it and tries to write to it so of course the varid inquiry for EVAP fails!
This is really a problem with the DAS scripting
The DAS scripting should be removing the old background files before the GEOSgcm.x runs, a file should not be there that History will try to write; the fact that this worked before means you were just lucky and apparently were not changing the contents of the bkg files.
HistoryGridComp should check when it decides to write a file, if it already exists and error out as it just could lead to a problem at different point in the code when the error is less clear. I will make that change in our development branch so that the existence of the file is caught when History decides it is time to write to a new file and report the file already exists, rather than during the actual writing process when the error is more confusing.
The text was updated successfully, but these errors were encountered: