[QUESTION] The Reason for calling torch.cuda.synchronize() in func recv_from_prev_pipeline_rank_/send_to_next_pipeline_rank #1149
Unanswered
CCCCarpediem
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Why do we need to Call "torch.cuda.synchronize()" to synchronize all streams in /megatron/core/inference/communication_utils.py?
It describes as "To protect against race condition when using batch_isend_irecv()". But Event Record/ Event wait already inserted in communication/compute streams to ensure order. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions