Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate gpu_map for parseGraph if devIds != Ranks for NCCL_RINGS #1301

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

BKitor
Copy link
Contributor

@BKitor BKitor commented Aug 15, 2024

When using NCCL_RINGS, parseGraph can fail if ROCR_VISIBLE_DEVICES is set. If the visible devices aren't consecutively numbered form 0, a gpu_map is needed .
This patch adds a check to build a gpu_map if NCCL_RINGS is set. The mapping retains GPU device number ordering, and should let users/developers easily explore RCCL performance over subsets of their system.
Users can use NCCL_RIGNS, where specified values correspond to RCCL 'ranks'

Falure with ROCM 6.1:

(base) user@supergenoa5:~/bkitor$ ./run_rccl-test-4g.sh
	librccl.so.1 => /opt/rocm-6.1.0/lib/llvm/bin/../../../lib/librccl.so.1 (0x00007fca1aa00000)
NCCL_RINGS		=1 2 3 0 | 0 3 2 1 | 1 3 2 0 | 0 2 3 1 | 1 2 0 3 | 3 0 2 1
ROCR_VISIBLE_DEVICES	=4,5,6,7
/home/user/rccl-tests/build/all_reduce_perf -g 4 -c 1 -t 1
# nThread 1 nGpus 4 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
rccl-tests: Version develop:c5cae38
# Using devices
#   Rank  0 Pid  40875 on supergenoa5 device  0 [0000:ad:00.0] AMD Instinct MI300X
#   Rank  1 Pid  40875 on supergenoa5 device  1 [0000:b2:00.0] AMD Instinct MI300X
#   Rank  2 Pid  40875 on supergenoa5 device  2 [0000:b6:00.0] AMD Instinct MI300X
#   Rank  3 Pid  40875 on supergenoa5 device  3 [0000:ba:00.0] AMD Instinct MI300X
supergenoa5: Test NCCL failure /home/user/rccl-tests/build/hipify/common.cu.cpp:1262 'internal error - please report this issue to the NCCL developers / '
 .. supergenoa5 pid 40875: Test failure /home/user/rccl-tests/build/hipify/common.cu.cpp:1140

Expected results with proposed patch:

	librccl.so.1 => /home/user/bkitor/rccl/build/lib/librccl.so.1 (0x00007f8046600000)
NCCL_RINGS		=1 2 3 0 | 0 3 2 1 | 1 3 2 0 | 0 2 3 1 | 1 2 0 3 | 3 0 2 1
ROCR_VISIBLE_DEVICES	=4,5,6,7
/home/user/rccl-tests/build/all_reduce_perf -g 4 -c 1 -t 1
# nThread 1 nGpus 4 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
rccl-tests: Version develop:c5cae38
# Using devices
#   Rank  0 Pid  40898 on supergenoa5 device  0 [0000:ad:00.0] AMD Instinct MI300X
#   Rank  1 Pid  40898 on supergenoa5 device  1 [0000:b2:00.0] AMD Instinct MI300X
#   Rank  2 Pid  40898 on supergenoa5 device  2 [0000:b6:00.0] AMD Instinct MI300X
#   Rank  3 Pid  40898 on supergenoa5 device  3 [0000:ba:00.0] AMD Instinct MI300X
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
    33554432       8388608     float     sum      -1    389.9   86.07  129.10      0    395.0   84.95  127.42      0
# Errors with asterisks indicate errors that have exceeded the maximum threshold.
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 128.26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants