Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on NAS benchmark MultiGrid (MG) #56

Open
hbrunie opened this issue Jul 17, 2020 · 4 comments
Open

Segfault on NAS benchmark MultiGrid (MG) #56

hbrunie opened this issue Jul 17, 2020 · 4 comments

Comments

@hbrunie
Copy link

hbrunie commented Jul 17, 2020

Hello,

I tried to run Herbgrind on MG from NAS benchmarks (C++ serial version: NAS-bench) and I got a segfault:

$HOME/utils/tools/herbgrind/valgrind/herbgrind-install/bin/valgrind --tool=herbgrind ../bin/mg.S 
==28521== Herbgrind, a valgrind tool for Herbie
==28521== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern
==28521== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright info
==28521== Command: ../bin/mg.S
==28521== 
NAS Parallel Benchmarks 4.0 OpenMP C++ version - MG Benchmark
Developed by: Dalvan Griebler <[email protected]> & Júnior Löff <[email protected]>

 No input file. Using compiled defaults
 Size:  32x 32x 32 (class_npb S)
 Iterations:   4
==28521== 
==28521== Process terminating with default action of signal 11 (SIGSEGV)
==28521==  Access not within mapped region at address 0x1207CB43F8
==28521==    at 0x404CB1: resid(double***, double***, double***, int, int, int, double*, int) (in /global/u1/h/hbrunie/benchmarks/NPB-CPP/NPB-SER/bin/mg.S)
==28521==    by 0x400EEA: main (in /global/u1/h/hbrunie/benchmarks/NPB-CPP/NPB-SER/bin/mg.S)
==28521==  If you believe this happened as a result of a stack
==28521==  overflow in your program's main thread (unlikely but
==28521==  possible), you can try to increase the size of the
==28521==  main thread stack using the --main-stacksize= flag.
==28521==  The main thread stack size used in this run was 16777216.
==28521== 
Didn't find any marks!
Segmentation fault

Could you help me debug this?

Thanks,
Hugo Brunie

@HazardousPeach
Copy link
Contributor

Hey Hugo, thanks for getting in touch!

It looks like the segfault is happening in client code, not in Herbgrind code, so if the program doesn't segfault when run uninstrumented, then Herbgrind is somehow interfering with the client state (which it's really not supposed to do).

My best bet to debugging something like this is to first test it under the nullgrind Valgrind tool, which does all the normal Valgrind decompiling and JITing, but doesn't actually do any instrumentation. If that still segfaults, then the problem is in Valgrind code. If not, then the next step would be to go into the Herbgrind source, and start removing sections of the instrumentation, to see if you can narrow down where the segfault is being caused. That work can be a little harrowing, so stay in touch on the issue, and I'll try to help out when I have free cycles.

Cheers,
Alex

@hbrunie
Copy link
Author

hbrunie commented Jul 20, 2020

So I don't know how to build nullgrind, but I tested memcheck from the same valgrind install, and it worked well.
I guess memcheck tests at least as most stuff as nullgrind, so we can go to 2nd step of the debug.

Maybe I can use gdb on herbgrind?

@hbrunie
Copy link
Author

hbrunie commented Jul 20, 2020

GDB does not helped much.
I note that herbgrind is compiled with -g, as well as NAS MG.

Starting program: /global/u1/h/hbrunie/utils/tools/herbgrind/valgrind/herbgrind-install/bin/valgrind --tool=herbgrind ./bin/mg.S
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.26-13.45.1.x86_64
process 38724 is executing new program: /global/u1/h/hbrunie/utils/tools/herbgrind/valgrind/herbgrind-install/lib/valgrind/herbgrind-amd64-linux
==38724== Herbgrind, a valgrind tool for Herbie
==38724== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern
==38724== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright info
==38724== Command: ./bin/mg.S
==38724==

Program received signal SIGSEGV, Segmentation fault.
0x0000001002c06f6c in ?? ()
(gdb) bt
#0 0x0000001002c06f6c in ?? ()
#1 0x0000001002a8df30 in ?? ()
#2 0x0000000000013347 in ?? ()
#3 0x000000100200d210 in ?? ()
#4 0x0000000000000000 in ?? ()

@HazardousPeach
Copy link
Contributor

Yeah unfortunately gdb doesn't play well with Valgrind, it seems to have issues emulating the crazy stuff Valgrind does, and segfaults even when Herbgrind/Valgrind wouldn't otherwise. I did some digging myself, and it looks like you can remove most of the Herbgrind code and just leave the creation of Shadow Temporaries, and it'll still crash. My hunch is that something is weird with the allocater that is breaking when there are so many floating point ops in the program that need to allocate shadow values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants