[QUESTION] The Reason for calling torch.cuda.synchronize() in func recv_from_prev_pipeline_rank_/send_to_next_pipeline_rank #1149

CCCCarpediem · 2024-09-14T06:37:44Z

CCCCarpediem
Sep 14, 2024

Why do we need to Call "torch.cuda.synchronize()" to synchronize all streams in /megatron/core/inference/communication_utils.py?
It describes as "To protect against race condition when using batch_isend_irecv()". But Event Record/ Event wait already inserted in communication/compute streams to ensure order. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] The Reason for calling torch.cuda.synchronize() in func recv_from_prev_pipeline_rank_/send_to_next_pipeline_rank #1149

{{title}}

Replies: 0 comments

Select a reply

[QUESTION] The Reason for calling torch.cuda.synchronize() in func recv_from_prev_pipeline_rank_/send_to_next_pipeline_rank #1149

CCCCarpediem Sep 14, 2024

Replies: 0 comments

CCCCarpediem
Sep 14, 2024