You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Distributed Data Parallel (DDP) with two AMD GPUs communicating via ROCM-aware MPI, AMDGPU.synchronize() is necessary at different steps otherwise the state of the optimizer is inconsistent or the averaged gradients are wrong.
This is a follow-up from this dicussion:
When using Distributed Data Parallel (DDP) with two AMD GPUs communicating via ROCM-aware MPI,
AMDGPU.synchronize()
is necessary at different steps otherwise the state of the optimizer is inconsistent or the averaged gradients are wrong.This is a follow-up from this dicussion:
https://discourse.julialang.org/t/distributed-data-parallel-training-with-2-gpus-fails-with-flux-jl-on-amd-gpus/125993/6
The serial code (using
SERIAL=true
) works as expected:The output is of this program without
AMDGPU.synchronize()
is:My environment:
Just using MPI and AMDGPU, we can see that without
AMDGPU.synchronize()
, the send message is wrong in this example:Rank zero gets the correct message only 2 out of 20 tries. With
AMDGPU.synchronize()
all received messages are correct.Thanks to @pxl-th for suggesting that this is a synchronization issue.
The text was updated successfully, but these errors were encountered: