-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault during shutdown, sedgem, can't generate output for restart #264
Comments
Thinking about/investigating this further:
|
Following up:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: real 0m7.147s So a similar error with some more/different information but same origin at l 357 error catching in ./genie.job. Looking at the output of the successfully completed simulation:
To the same box directory: I've uploaded the output of the successful run, the output from the failed restart, and the user-config for that. Which was just a copy of the previous user-config, just restarted from the prior output with 500ky requested instead of 1My. Any ideas? Thanks! |
For what it is worth, ulimit -a |
Further information: the problem seems to originate with this restart simulation: https://umd.box.com/s/ld8ihq9bu2lnl4uwd32jabq4aad63to7 If I restart from this simulation, the new simulation will run to completion. But then a restart from the new simulation will fail with a memory allocation/segfault error in the first few years. For instance, ./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4s 10 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN3c Then make a copy of the experiment *4s as *4sa and run it with restart from *4s: ./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4sa 10 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4s Program received signal SIGSEGV: Segmentation fault - invalid memory reference. SPIN3c and SPIN4 are identical experiments and running with the same base-config. diff exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN3c exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4
|
This is using release v0.9.33. On completing the simulation, shutdown fails at sedgem shutdown. Is there any way to recover without having to rerun the experiment? This is a spinup simulation, running 1e6 years without acceleration. What I would like to do is use this as a restart file for another spinup simulation, but attempting to do so produces a similar segmentation fault originating with the same line number in genie.job, but after only a few years of simulation/saving. Below is the error message.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f8b700218c2 in ???
derpycode/muffindoc#1 0x7f8b70020a55 in ???
derpycode/muffindoc#2 0x7f8b6fd6204f in ???
at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
derpycode/muffindoc#3 0x563636d2b3db in ???
derpycode/muffindoc#4 0x563636d05838 in ???
derpycode/muffindoc#5 0x56363696a9ca in ???
derpycode/muffindoc#6 0x563636973f5e in ???
derpycode/muffindoc#7 0x56363695291e in ???
derpycode/muffindoc#8 0x7f8b6fd4d249 in __libc_start_call_main
at ../sysdeps/nptl/libc_start_call_main.h:58
derpycode/muffindoc#9 0x7f8b6fd4d304 in __libc_start_main_impl
at ../csu/libc-start.c:360
derpycode/muffindoc#10 0x563636952940 in ???
derpycode/muffindoc#11 0xffffffffffffffff in ???
./genie.job: line 357: 1263740 Segmentation fault ./genie.exe
real 26121m14.894s
user 26117m14.995s
sys 1m44.597s
cp: cannot stat 'fort.2': No such file or directory
ERROR: !!!!!!!!!! ERROR PROCESSING !!!!!!!!!!
Thanks in advance for suggestions on how to proceed.
Per advice I am putting the base and user-config file I was using, and the entire output of the 1 Myr experiment, and the restart file it started from, here:
https://umd.box.com/s/qhp196dotupisnd8ufnbkxvjfpvmm7qj
Run command was:
./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4 1000000 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN3c
I can also make a copy of the user-config and run from the failed user-config as restart, and see the same error crop up in just the first few years, e.g.
cp -rp exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN5
./runmuffin.sh cgenie.eb_go_gs_ac_bg_sg_rg_gl_eg.wolr0570t6.BASES PALEO exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN5 100 exp27.CBSGRL.wolr0570t6.OMEN_Prdxo.SPIN4
Let me know if you can see it OK?
Thank you!
The text was updated successfully, but these errors were encountered: