-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues for MCore DDP. #1474
Conversation
ec1d3ec
to
4997b56
Compare
Signed-off-by: Dennis Liu <[email protected]>
4997b56
to
3ffd732
Compare
/te-ci pytorch |
@Victarry Just to confirm, Mcore now requires |
MCore always requires param.grad to be allocated when gradient_accumulation_fusion=True. But TE2.0 changed the return value from empty tensor to None.
I'm not familiar with the distributed optimizer. Maybe @deepakn94 can provide some comments? |
ddd2c1a
to
6a2d88a
Compare
Signed-off-by: Dennis Liu <[email protected]>
6a2d88a
to
594ea31
Compare
/te-ci pytorch |
I found above change will cause UT failing with CPU offloading, and the reaons are as follows: TransformerEngine/transformer_engine/pytorch/module/linear.py Lines 268 to 273 in f0d22ca
In the original version code, |
Signed-off-by: Dennis Liu <[email protected]>
for more information, see https://pre-commit.ci
To make the fix MR simple and make MCore work as soon as possible, I added |
Signed-off-by: Dennis Liu <[email protected]>
Yes, distributed optimizer also has this requirement for the same reason. |
Yup, exactly. MCore has had this requirement for the last year plus. Changing the return value to None is a breaking change for us. |
/te-ci pytorch |
Signed-off-by: Dennis Liu <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Dennis Liu <[email protected]>
/te-ci pytorch |
@timmoon10 @ksivaman could I have your approve for this bug fix? It seems the UT fails with unrelated bugs. Thanks! |
Signed-off-by: Dennis Liu <[email protected]>
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
prepare_for_saving
fromtensor_list.append(tensor.data)
totensor_list.append(tensor)
. Since this will remove params attributes likegrad_added_to_main_grad
Checklist: