You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
git clone https://github.com/ROCm/rccl-tests.git
cd rccl-tests
make MPI=1 MPI_HOME=${MPI_HOME} HIP_HOME=/opt/rocm-6.0.0/ CUSTOM_RCCL_LIB=${RCCL_ROOT}/lib
all_reduce_perf segfaults at:
GDB:
#0 0x000014ca4d0a14ca in ncclTopoCompute(ncclTopoSystem*, ncclTopoGraph*) ()
from /<CUSTOM_PATH>/lib/librccl.so.1
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered:
tks2004
changed the title
Segfaults with Custom Built RCCL
All_reduce_perf segfaults with Custom Built RCCL
Apr 19, 2024
Problem Description
all_reduce_perf segfaults with custom built RCCL. It works fine if RCCL is from /opt/rocm-6.0.0/lib
Operating System
SLES15.4
CPU
AMD EPYC 7A53
GPU
AMD Instinct MI250
ROCm Version
ROCm 6.0.0
ROCm Component
rccl
Steps to Reproduce
Libfabric with 1.15.2.0
RCCL was custom built using
CXX=hipcc cmake -DCMAKE_PREFIX_PATH=${RCCL_ROOT} -DCMAKE_INSTALL_PREFIX=${RCCL_ROOT}
AWS Libfabric
./autogen.sh
CC=hipcc ./configure --prefix=${RCCL_ROOT} --with-hip=/opt/rocm-6.0.0/ --with-rccl=$RCCL_ROOT --with-libfabric=$OFI_ROOT --prefix=$RCCL_ROOT --disable-tests --with-gdrcopy=$GDRCOPY --with-mpi=$MPI_HOME
git clone https://github.com/ROCm/rccl-tests.git
cd rccl-tests
make MPI=1 MPI_HOME=${MPI_HOME} HIP_HOME=/opt/rocm-6.0.0/ CUSTOM_RCCL_LIB=${RCCL_ROOT}/lib
all_reduce_perf segfaults at:
GDB:
#0 0x000014ca4d0a14ca in ncclTopoCompute(ncclTopoSystem*, ncclTopoGraph*) ()
from /<CUSTOM_PATH>/lib/librccl.so.1
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: