Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. #450

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ys950902
Copy link

As sequence_parallel is added in Megatron-DeepSpeed for layernorm, for current implementation, non-CUDA device is using from torch.nn import LayerNorm for layernorm, there is no attr named sequence_parallel, will cause init error for non-CUDA device.

This pr is to fix this issue.

@ys950902 ys950902 changed the title Fix init issue for layer_norm in sequence_parallel for non-CUDA device. [Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device. Sep 29, 2024
@ys950902
Copy link
Author

ys950902 commented Sep 29, 2024

#429 for layernorm is added in this pr. Hi @polisettyvarma could you please also take a look on this pr, is it okay for you on layernorm, many thank!

@polisettyvarma
Copy link

@ys950902 will this feature work correctly ? have you done any accuracy check for this ?

@@ -13,7 +13,7 @@
from .fused_rmsnorm import RMSNorm
else:
from .rmsnorm import RMSNorm
from torch.nn import LayerNorm
from .layernorm import LayerNorm
Copy link

@tjruwase tjruwase Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share the failure that is being fixed? I have two concerns about this change:

  1. It is quite subtle since it does not show the connection to sequence-parallelism
  2. It is unclear to me that new LayerNorm is equivalent to torch.nn.LayerNorm for non sequence-parallel case. Maintaining parity with torch.nn.LayerNorm imposes extra development burden.

So, I would like to further understand the problem and explore alternative solutions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants