Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break down parallelize_llama for inference cases #402

Merged
merged 4 commits into from
Jun 14, 2024

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jun 14, 2024

Stack from ghstack (oldest at bottom):

Breaking up parallelize_llama into:

  • apply_tp
  • apply_ac
  • apply_compile
  • apply_dp

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.

kwen2501 added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: d8a32ad293ce8f1fafa141e3bbfa06654db75910
Pull Request resolved: #402
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 14, 2024
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change -- making them more modular is great!!
I had some comments -- basically I think we should make each sub-function call more modular and only pass in related arguments and configs, leaving experimental configs and interacting flags in parallelize_llama.

torchtitan/parallelisms/parallelize_llama.py Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Show resolved Hide resolved
torchtitan/parallelisms/parallelize_llama.py Show resolved Hide resolved
@kwen2501 kwen2501 changed the title Break up parallelize_llama for inference cases Break down parallelize_llama for inference cases Jun 14, 2024
Breaking up `parallelize_llama` into:
- `apply_tp`
- `apply_ac`
- `apply_compile`
- `apply_dp`

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.

[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: 9aeee4c063c63eed380cac219c9c8e1eb4169f9d
Pull Request resolved: #402
Breaking up `parallelize_llama` into:
- `apply_tp`
- `apply_ac`
- `apply_compile`
- `apply_dp`

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.

[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: 72e37e2e506af6115f9cb18179543dd6df602961
Pull Request resolved: #402
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for helping make it modulized!

Comment on lines 303 to 305
if job_config.model.norm_type == "fused_rmsnorm":
raise NotImplementedError(
"fused_rmsnorm not yet compatible with TP. Please use layernorm or rmsnorm."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed thanks to #404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, noticed in CI. Removed.

Breaking up `parallelize_llama` into:
- `apply_tp`
- `apply_ac`
- `apply_compile`
- `apply_dp`

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.

[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88
Pull Request resolved: #402
@kwen2501 kwen2501 merged commit 042a00c into gh/kwen2501/6/base Jun 14, 2024
5 checks passed
kwen2501 added a commit that referenced this pull request Jun 14, 2024
ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88
Pull Request resolved: #402
@kwen2501 kwen2501 deleted the gh/kwen2501/6/head branch June 14, 2024 20:44
tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024
ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88
Pull Request resolved: pytorch#402
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88
Pull Request resolved: pytorch#402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants