Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models #6553

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gyou2021
Copy link

Auto TP in auto_tp.py needs to handle linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after running on multiple HPU/GPU cards; The name of those linear modules may be different from those in the method tp_parser(). 2) The weight of some linear modules in a model CANNOT be split to multiple HPU/GPU cards; 3) The weight of some linear modules in a model should NOT be split to multiple HPU/GPU cards to avoid decreasing performance because of afterward all gather operation (gather result from all cards). In case 1) the Linear type should change to AllReduceLinear type in DeepSpeed. In case 2) and 3) the linear modules should keep Linear type. To handle those cases easily, the configurable auto TP was proposed. The method tp_parser() will add the linear modules in case 1) (Here module name list was stored in the environment variable 'allReduceLinearItems') and the method _replace_module() will add the linear modules in case 2) and 3) (Here module name list was stored in the environment variable 'keepLinearItems'). Those environment variables are configurable. They can be configured in environment variables directly or in a configuration file.

@delock
Copy link
Collaborator

delock commented Sep 19, 2024

Hi @gyou2021 I like the goal to avoid repetition of same logic from L296 to L315, but I also have concern that models enabled by these lines will not be able to run out-of-box with this PR. This may not be friendly to self-helping users without access to proper BKC documentation to various models.

Could allReduceLinearItems have an initial value as a built-in list, then pre-pend with os.environment to get runtime configurability? I think if the model to be enabled by environment is a public model, it should be contributed to the built-in list to provide OOB experience, right?

for module in module_list:
for key, submodule in module._modules.items():
if isinstance(submodule, nn.Linear):
layer_list = layer_list + ["." + key]
elif isinstance(submodule, nn.LayerNorm) or key in norm_layer_name_list:
elif isinstance(submodule, nn.LayerNorm) or key == 'LayerNorm' or key == 'layer_norm':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be replaced equivalently?

norm_layer_name_list = ['LayerNorm', 'layer_norm', 'ln_1', 'ln_2']
#ln_1 , ln_2 for Qwen

allReduceLinearItems = os.environ['allReduceLinearItems']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the environment keep DeepSpeed as DS_ALL_REDUCE_LINEAR_ITEMS? This will make the naming more consistent with existing DeepSpeed knobs.

@gyou2021 the command you issued was incorrect. Please try again.

Examples are:

@microsoft-github-policy-service agree

and

@microsoft-github-policy-service agree company="your company"

@gyou2021

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants