Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All_reduce_perf segfaults with Custom Built RCCL #72

Open
tks2004 opened this issue Apr 19, 2024 · 1 comment
Open

All_reduce_perf segfaults with Custom Built RCCL #72

tks2004 opened this issue Apr 19, 2024 · 1 comment

Comments

@tks2004
Copy link

tks2004 commented Apr 19, 2024

Problem Description

all_reduce_perf segfaults with custom built RCCL. It works fine if RCCL is from /opt/rocm-6.0.0/lib

Operating System

SLES15.4

CPU

AMD EPYC 7A53

GPU

AMD Instinct MI250

ROCm Version

ROCm 6.0.0

ROCm Component

rccl

Steps to Reproduce

Libfabric with 1.15.2.0
RCCL was custom built using
CXX=hipcc cmake -DCMAKE_PREFIX_PATH=${RCCL_ROOT} -DCMAKE_INSTALL_PREFIX=${RCCL_ROOT}

AWS Libfabric
./autogen.sh
CC=hipcc ./configure --prefix=${RCCL_ROOT} --with-hip=/opt/rocm-6.0.0/ --with-rccl=$RCCL_ROOT --with-libfabric=$OFI_ROOT --prefix=$RCCL_ROOT --disable-tests --with-gdrcopy=$GDRCOPY --with-mpi=$MPI_HOME

git clone https://github.com/ROCm/rccl-tests.git
cd rccl-tests
make MPI=1 MPI_HOME=${MPI_HOME} HIP_HOME=/opt/rocm-6.0.0/ CUSTOM_RCCL_LIB=${RCCL_ROOT}/lib

all_reduce_perf segfaults at:
GDB:
#0 0x000014ca4d0a14ca in ncclTopoCompute(ncclTopoSystem*, ncclTopoGraph*) ()
from /<CUSTOM_PATH>/lib/librccl.so.1

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@tks2004 tks2004 changed the title Segfaults with Custom Built RCCL All_reduce_perf segfaults with Custom Built RCCL Apr 19, 2024
@tks2004
Copy link
Author

tks2004 commented Apr 23, 2024

Any update on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant