Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault using symmetric eigenvalue problems for large matrices #4868

Closed
jishnub opened this issue Aug 13, 2024 · 15 comments · Fixed by #4871
Closed

Segmentation fault using symmetric eigenvalue problems for large matrices #4868

jishnub opened this issue Aug 13, 2024 · 15 comments · Fixed by #4871

Comments

@jishnub
Copy link

jishnub commented Aug 13, 2024

Cross posting from the julia issue tracker (JuliaLang/julia#55471), symmetric eigenvalue problems (dsyev/dsyevd/dsyevr) seem to be segfaulting for matrix sizes of 64x64 and above on x86_64-linux-gnu using OpenBLAS v0.3.28.

How to reproduce in Julia (using a nightly version > v"1.12.0-DEV.1038" that includes OpenBLAS v0.3.28):

julia> using LinearAlgebra

julia> H = Hermitian(rand(1024, 1024));

julia> eigen(H); # calls `dsyevd`, but the issue is seen for other function as well
[1]    56499 segmentation fault (core dumped)  julia +nightly

System info:

julia> versioninfo()
Julia Version 1.12.0-DEV.1038
Commit 9d222b87d77 (2024-08-12 01:13 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = subl

Matrices smaller than 64x64 do not produce this error.

@martin-frbg
Copy link
Collaborator

Somewhat strange as there is no obvious (to me) change in 0.3.28 that could have caused this, and no such failures in the LAPACK testsuite even if I increase matrix sizes significantly beyond the defaults. That is at least when compiled with gcc, maybe LLVM18 plays a role here although the cpu seems to be fairly standard AVX2-only

@martin-frbg
Copy link
Collaborator

that said, the backtrace (from Julia 1.12.0.DEV.1043 aka 96fd25a764c) has dgemm_itcopy called from dsyr2k_UN (with no changes to either DGEMM or DSYR2K code in 0.3.28, but maybe Julia used a version earlier than 0.3.27 previously ?)

@giordano
Copy link
Contributor

For what is worth, I believe the segmentation fault may depend on the version of GCC used. I can see it when we compile OpenBLAS with GCC 11 but not with GCC 8. I'm trying other combinations now.

but maybe Julia used a version earlier than 0.3.27 previously ?

Julia nightly was using OpenBLAS v0.3.27 until 3 days ago. Upcoming Julia v1.11 will use OpenBLAS v0.3.27 (feature freeze was a few months ago), Julia v1.10 (current stable version) uses OpenBLAS v0.3.23.

@giordano
Copy link
Contributor

I'm trying other combinations now.

It seems to work for me when compiling OpenBLAS with GCC 12.

@martin-frbg
Copy link
Collaborator

Can you tell me your build options for OpenBLAS please ? I'm trying with clang/flang-new from LLVM18.1 but getting a missing reference to FortranACharacterCompareScalar1 when I replace the library that comes with the Julia nightly

@giordano
Copy link
Contributor

It seems to work for me when compiling OpenBLAS with GCC 12.

I was wrong with all my references to GCC versions, I forgot a detail that was mentioned in JuliaLang/julia#55471, but not here: the segmentation fault happens when setting the number of threads with openblas_set_num_threads, in particular when using a number of threads larger than 3. In my tests with different builds I was missing to set the number of threads.

Can you tell me your build options for OpenBLAS please ?

For x86_64 it should be

make USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=400 NO_AFFINITY=1 NO_STATIC=1 NUM_THREADS=512 INTERFACE64=1 SYMBOLSUFFIX=64_ DYNAMIC_ARCH=1 TARGET=GENERIC all

@giordano
Copy link
Contributor

As an additional datapoint, I don't get any crash when setting OPENBLAS_NUM_THREADS:

$ OPENBLAS_NUM_THREADS=4 julia +nightly -q
julia> using LinearAlgebra

julia> BLAS.get_num_threads()
4

julia> H = Hermitian(ones(1024, 1024));

julia> eigen(H);

julia>

If the environment variable OPENBLAS_NUM_THREADS is set before starting Julia, we don't call openblas_set_num_threads automatically.

Is there a different setup of threading in OpenBLAS when the env var is set?

@martin-frbg
Copy link
Collaborator

I could reproduce it without setting any environment variable, perhaps the difference is just in random memory layout/contents if this turns out to be uninitialized or overflowed memory

@martin-frbg
Copy link
Collaborator

Indeed looks like the code reshuffle to support other threading backends has created a loophole for starting a thread without having a memory buffer allocated for it. Working on a patch (or rather, I have a patch but no full understanding of the root cause yet)

@giordano
Copy link
Contributor

giordano commented Aug 15, 2024

@martin-frbg I backported your patch in #4871 with JuliaPackaging/Yggdrasil#9262, but I'm still see segmentation faults on some machines, but not everywhere:

  • AMD Ryzen 9 3950X 16-Core Processor: works
  • AMD EPYC 7742 64-Core Processor: segfaults
  • Neoverse-V2 (Nvidia Grace): segfaults

What's interesting is that Ryzen 9 3950X and EPYC 7742 have the same OpenBLAS core type, both "Zen", but Ryzen 9 3950X has also fewer threads.

@martin-frbg
Copy link
Collaborator

Indeed looks like I simplified yesterday's patch too far before committing, sorry. Have reproduced the persistance of the problem on the EPYC in the gcc compile farm and will push an updated PR shortly

@giordano
Copy link
Contributor

Out of curiosity, did you manage to craft a C reproducer? I'm failing badly at that 😞

@martin-frbg
Copy link
Collaborator

I haven't even tried yet, and I'm wondering if this may somehow be connected to how Julia uses OpenBLAS (though I couldn't say what might be special about it). If it was that easy to get on the "bad" code path simply by calling TRMM or GEMM, this should have blown up in the LAPACK, numpy and scipy testsuites or even one of our trivial benchmarks prior to the release.

@giordano
Copy link
Contributor

@ViralBShah over at JuliaLang/julia#55471 (comment) prepared a C program which I can confirm segfaults on a macbook M1 for me with openblas v0.3.28, but not v0.3.23, nor with #4879:

#include <stdio.h>
#include <stdlib.h>

int main() {
    long n = 1024;
    double *a;
    a = (double*) malloc(n*n*sizeof(double));

    openblas_set_num_threads64_(16);

    // Check if the memory has been successfully
    // allocated by malloc or not
    if (a == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    for(long row=0; row<n; row++){
        for(long col=0; col<=row; col++){
            a[row*n + col] = 1.0;
        }
    }

    // Allocate space for eigenvalues
    double *w;
    w = (double*) malloc(n * sizeof(double));

    if (w == NULL) {
        printf("Memory not allocated.\n");
        exit(0);
    }

    // Leading dimension of the matrix
    long lda = n;

    // workspace query
    double *work = (double*) malloc(sizeof(double));
    long lwork = -1;
    long *iwork = (long*) malloc(sizeof(long));
    long liwork = -1;
    long info = -100;

    char jobz = 'V';
    char uplo = 'U';

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Workspace allocation
    lwork = work[0];
    work = (double *) malloc(lwork * sizeof(double));
    liwork = iwork[0];
    iwork = (long *) malloc(liwork * sizeof(long));

    dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);

    // Print eigenvalues
    printf("First 5 Eigenvalues:\n");
    for (int i = 0; i < 5; i++) {
        printf("%f\n", w[i]);
    }

    return 0;
}

However the same program doesn't segfault for me with OpenBLAS v0.3.28 on the other machines I mentioned at #4868 (comment) where I could still reproduce the segfault after #4871.

@giordano
Copy link
Contributor

However the same program doesn't segfault for me with OpenBLAS v0.3.28 on the other machines I mentioned at #4868 (comment) where I could still reproduce the segfault after #4871.

Actually, I can reproduce it on the other machines, but I need to increase the number of threads quite a bit, perhaps depending on the system. For example this segfaults for me on AMD EPYC 7742 with 384 threads:

-    openblas_set_num_threads64_(16);
+    openblas_set_num_threads64_(384);
$ gcc -o test test.c  -L${HOME}/.julia/juliaup/julia-nightly/lib/julia -Wl,-rpath,${HOME}/.julia/juliaup/julia-nightly/lib/julia -lopenblas64_
test.c: In function ‘main’:
test.c:9:5: warning: implicit declaration of function ‘openblas_set_num_threads64_’ [-Wimplicit-function-declaration]
     openblas_set_num_threads64_(384);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.c:46:5: warning: implicit declaration of function ‘dsyevd_64_’ [-Wimplicit-function-declaration]
     dsyevd_64_(&jobz, &uplo, &n, a, &lda, w, work, &lwork, iwork, &liwork, &info, 1, 1);
     ^~~~~~~~~~
$ ./test
Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants