Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when trying to profile application #389

Closed
hmcezar opened this issue Jul 22, 2022 · 14 comments · Fixed by #391
Closed

Segmentation fault when trying to profile application #389

hmcezar opened this issue Jul 22, 2022 · 14 comments · Fixed by #391

Comments

@hmcezar
Copy link

hmcezar commented Jul 22, 2022

I'm trying to use fil-profile without success, getting a segmentation fault. I'm using the latest PyPI version with Python 3.9. Using the conda-forge version didn't help either.

This is what I get when trying to run in my conda environment:

fil-profile run -m hymd dpc.toml final_step1_centered.hdf5 --seed 400755                       ✔  hymd-py39  
=fil-profile= Memory usage will be written out at exit, and opened automatically in a browser.
=fil-profile= You can also run the following command while the program is still running to write out peak memory usage up to that point: kill -s SIGUSR2 31591
=fil-profile= WARNING: Fil does not (yet) support tracking memory in subprocesses.
Fatal Python error: Segmentation fault

Current thread 0x00007f681fe1e140 (most recent call first):
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1181 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/pfft/__init__.py", line 3 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/pmesh/pm.py", line 5 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/pmesh/__init__.py", line 2 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 972 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/Dev/HyMD/hymd/main.py", line 8 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/Dev/HyMD/hymd/__init__.py", line 2 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/runpy.py", line 111 in _get_module_details
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/runpy.py", line 147 in _get_module_details
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/runpy.py", line 221 in run_module
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/filprofiler/_tracer.py", line 135 in trace_until_exit
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/filprofiler/_script.py", line 269 in stage_2
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/site-packages/filprofiler/_script.py", line 279 in <module>
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/runpy.py", line 87 in _run_code
  File "/home/hmcezar/anaconda3/envs/hymd-py39/lib/python3.9/runpy.py", line 197 in _run_module_as_main
zsh: segmentation fault (core dumped)  fil-profile run -m hymd dpc.toml final_step1_centered.hdf5 --seed 400755

The software runs fine without fil-profiler.

I tried using it in a HPC cluster with a completely different environment. In this case, the application starts, but I still get a segfault while my simulation in being set up:

fil-profile run -m hymd dpc.toml final_step1_centered.hdf5 --seed 400755 > out
=fil-profile= Memory usage will be written out at exit, and opened automatically in a browser.
=fil-profile= You can also run the following command while the program is still running to write out peak memory usage up to that point: kill -s SIGUSR2 29631
=fil-profile= WARNING: Fil does not (yet) support tracking memory in subprocesses.
[login-3:29631:0:29631] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
Segmentation fault (core dumped)

The application I'm trying to profile is HyMD.

@itamarst
Copy link
Collaborator

That sounds frustrating, sorry it's not working. I will take a look.

@hmcezar
Copy link
Author

hmcezar commented Jul 22, 2022

No need to be sorry :) thank you for having a look.

Please tell me if I can help in any form.

@itamarst
Copy link
Collaborator

The good news is, it's easy to reproduce, just need to import hymd.

@itamarst
Copy link
Collaborator

  1. Fil uses an alternative memory allocator, jemalloc.
  2. It looks like the crash is caused by fftw calling free() as implemented by jemalloc.

Potential causes:

  1. jemalloc has a bug.
  2. fftw is doing something wrong with memory management, and glibc's memory allocation APIs happen to be more forgiving in this scenario.
  3. There is some reasonable but in this case impactful semantic difference between jemalloc and malloc, so less a case of memory management bug in fftw and more a case of making incorrect assumptions.

I will try new version of jemalloc, if that fixes it, great. Otherwise, options include:

  1. Stop using jemalloc in Fil. This is probably not too hard, I've figured out alternatives in Sciagraph, my commercial variant of Fil.
  2. Create smaller reproducer and file upstream bug with FFTW. Probably won't help in short term.
  3. ...

@itamarst
Copy link
Collaborator

New jemalloc didn't help 😢 So will think a bit if there's any other obvious solutions, but probably next step is ripping out jemalloc.

@itamarst
Copy link
Collaborator

If you want to get unblocked, you could alternatively try memray (https://bloomberg.github.io/memray/).

@hmcezar
Copy link
Author

hmcezar commented Jul 22, 2022

New jemalloc didn't help cry So will think a bit if there's any other obvious solutions, but probably next step is ripping out jemalloc.

Oh, I'm sorry to hear that, but I think it might be a problem that doesn't affect a lot of packages, so I think that can wait for now.

Thank you for pointing me to memray, but unfortunately, I found some problems there too as reported here.

But thank you for looking into the issue! I'll keep my eyes open for future versions!

@itamarst
Copy link
Collaborator

A third option is my commercial version of Fil, which also does performance profiling, aimed for running in production so goal is running at full speed: https://sciagraph.com

It's free for now (and when I do start charging it'll likely have an academic discount).

@itamarst
Copy link
Collaborator

I'll probably try ripping out jemalloc now though, it's probably not very much work.

@hmcezar
Copy link
Author

hmcezar commented Jul 22, 2022

A third option is my commercial version of Fil, which also does performance profiling, aimed for running in production so goal is running at full speed: https://sciagraph.com

It's free for now (and when I do start charging it'll likely have an academic discount).

I'll definitely check it out!

@itamarst
Copy link
Collaborator

OK, I think I got Fil working. Or at least, import hymd doesn't crash 😀 So will open PR and if that looks good can do a release.

@itamarst
Copy link
Collaborator

Looks like there's decent chance you'd be impacted by #390, which I am trying to fix anyway, so will try to get a workaround at least in this release.

@itamarst
Copy link
Collaborator

I have tagged release 2022.07.0, it should be up on PyPI within 30 minutes. Let me know if you have other problems, questions, or anything else I can help with.

@hmcezar
Copy link
Author

hmcezar commented Jul 25, 2022

Thank you very much for the fix! Just tried it and it works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants