You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running ResStock in Kestrel, I got some failed jobs for the time limit error. A few weeks ago, just a few jobs (around 5/50) failed. But this week, this problem becomes more serious. All 3 jobs, or 2 of 3 jobs failed.
The error message in the job.out-*, is
DEBUG:2024-03-04 16:35:42:buildstockbatch.base:Using OpenStudio version: 3.7.0 with SHA: d5269793f1
DEBUG:2024-03-04 16:35:42:__main__:Output directory = /kfs2/projects/redlineres/tcm/summer_phoenix_tcm1_0304
slurmstepd: error: *** JOB 2828658 ON x3001c0s33b0n0 CANCELLED AT 2024-03-04T16:45:29 DUE TO TIME LIMIT ***
For the job that successfully finished, it even took 10 min to finish the run for some jobs.
DEBUG:2024-03-05 12:48:35:buildstockbatch.base:Using OpenStudio version: 3.7.0 with SHA: d5269793f1
DEBUG:2024-03-05 12:48:35:__main__:Output directory = /kfs2/projects/redlineres/tcm/winter_boston_tcm1_0305
DEBUG:2024-03-05 12:58:07:__main__:Trimming buildstock.csv
DEBUG:2024-03-05 12:58:07:__main__:Buildstock.csv trimmed to 168 rows.
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 104 concurrent workers.
[Parallel(n_jobs=-1)]: Done 25 out of 208 | elapsed: 16.2s remaining: 2.0min
[Parallel(n_jobs=-1)]: Done 49 out of 208 | elapsed: 17.9s remaining: 58.2s
[Parallel(n_jobs=-1)]: Done 73 out of 208 | elapsed: 20.8s remaining: 38.5s
[Parallel(n_jobs=-1)]: Done 97 out of 208 | elapsed: 23.7s remaining: 27.2s
[Parallel(n_jobs=-1)]: Done 121 out of 208 | elapsed: 28.4s remaining: 20.4s
[Parallel(n_jobs=-1)]: Done 145 out of 208 | elapsed: 30.7s remaining: 13.4s
[Parallel(n_jobs=-1)]: Done 169 out of 208 | elapsed: 32.5s remaining: 7.5s
[Parallel(n_jobs=-1)]: Done 193 out of 208 | elapsed: 34.0s remaining: 2.6s
[Parallel(n_jobs=-1)]: Done 208 out of 208 | elapsed: 37.5s finished
INFO:2024-03-05 12:58:45:__main__:Simulation time: 0.63 minutes
INFO:2024-03-05 12:58:45:__main__:Writing results to /kfs2/projects/redlineres/tcm/winter_boston_tcm1_0305/results/simulation_output/results_job1.json.gz
INFO:2024-03-05 12:58:45:__main__:Compressing simulation outputs to /kfs2/projects/redlineres/tcm/winter_boston_tcm1_0305/results/simulation_output/simulations_job1.tar.gz
INFO:2024-03-05 12:58:46:__main__:batch complete
INFO:2024-03-05 12:58:46:__main__:Cleaning up /tmp/scratch
DEBUG:2024-03-05 12:58:46:__main__:Removing /tmp/scratch/buildstock
DEBUG:2024-03-05 12:58:46:__main__:Removing /tmp/scratch/weather
DEBUG:2024-03-05 12:58:47:__main__:Removing /tmp/scratch/output
DEBUG:2024-03-05 12:58:47:__main__:Removing /tmp/scratch/housing_characteristics
DEBUG:2024-03-05 12:58:47:__main__:Removing /tmp/scratch/openstudio.simg
real 10m31.240s
user 39m14.246s
sys 11m14.233s
Describe the bug
When running ResStock in Kestrel, I got some failed jobs for the time limit error. A few weeks ago, just a few jobs (around 5/50) failed. But this week, this problem becomes more serious. All 3 jobs, or 2 of 3 jobs failed.
The error message in the
job.out-*
, isFor the job that successfully finished, it even took 10 min to finish the run for some jobs.
Platform:
Workaround method
Increase the minutes_per_sim to a larger value. For example, I use minutes_per_sim=6 for a simulation with 600 models.
If only a few jobs failed, rerun the failed jobs. https://buildstockbatch.readthedocs.io/en/stable/run_sims.html#re-running-failed-array-jobs
The text was updated successfully, but these errors were encountered: