Break down parallelize_llama for inference cases #402

kwen2501 · 2024-06-14T00:39:47Z

Stack from ghstack (oldest at bottom):

-> Break down parallelize_llama for inference cases #402

Breaking up parallelize_llama into:

apply_tp
apply_ac
apply_compile
apply_dp

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.

[ghstack-poisoned]

ghstack-source-id: d8a32ad293ce8f1fafa141e3bbfa06654db75910 Pull Request resolved: #402

tianyu-l

Thanks for the change -- making them more modular is great!!
I had some comments -- basically I think we should make each sub-function call more modular and only pass in related arguments and configs, leaving experimental configs and interacting flags in parallelize_llama.

torchtitan/parallelisms/parallelize_llama.py

Breaking up `parallelize_llama` into: - `apply_tp` - `apply_ac` - `apply_compile` - `apply_dp` This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there. Can also improve code modularity and readability. [ghstack-poisoned]

ghstack-source-id: 9aeee4c063c63eed380cac219c9c8e1eb4169f9d Pull Request resolved: #402

torchtitan/parallelisms/parallelize_llama.py

Breaking up `parallelize_llama` into: - `apply_tp` - `apply_ac` - `apply_compile` - `apply_dp` This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there. Can also improve code modularity and readability. [ghstack-poisoned]

ghstack-source-id: 72e37e2e506af6115f9cb18179543dd6df602961 Pull Request resolved: #402

tianyu-l

LGTM. Thanks for helping make it modulized!

tianyu-l · 2024-06-14T18:58:21Z

torchtitan/parallelisms/parallelize_llama.py

+    if job_config.model.norm_type == "fused_rmsnorm":
+        raise NotImplementedError(
+            "fused_rmsnorm not yet compatible with TP. Please use layernorm or rmsnorm."


this can be removed thanks to #404

Yeah, noticed in CI. Removed.

Breaking up `parallelize_llama` into: - `apply_tp` - `apply_ac` - `apply_compile` - `apply_dp` This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there. Can also improve code modularity and readability. [ghstack-poisoned]

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: #402

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: pytorch#402

Break up parallelize_llama for inference cases

0af22a1

[ghstack-poisoned]

kwen2501 mentioned this pull request Jun 14, 2024

Cosmetic changes to train.py #398

Merged

kwen2501 added a commit that referenced this pull request Jun 14, 2024

Break up parallelize_llama for inference cases

94ccf0f

ghstack-source-id: d8a32ad293ce8f1fafa141e3bbfa06654db75910 Pull Request resolved: #402

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 14, 2024

kwen2501 requested review from wanchaol, tianyu-l and wconstab June 14, 2024 00:43

tianyu-l requested changes Jun 14, 2024

View reviewed changes

kwen2501 changed the title ~~Break up parallelize_llama for inference cases~~ Break down parallelize_llama for inference cases Jun 14, 2024

kwen2501 added a commit that referenced this pull request Jun 14, 2024

Break down parallelize_llama for inference cases

6dba41f

ghstack-source-id: 9aeee4c063c63eed380cac219c9c8e1eb4169f9d Pull Request resolved: #402

tianyu-l reviewed Jun 14, 2024

View reviewed changes

kwen2501 added a commit that referenced this pull request Jun 14, 2024

Break down parallelize_llama for inference cases

0b428b6

ghstack-source-id: 72e37e2e506af6115f9cb18179543dd6df602961 Pull Request resolved: #402

tianyu-l approved these changes Jun 14, 2024

View reviewed changes

tianyu-l reviewed Jun 14, 2024

View reviewed changes

kwen2501 added a commit that referenced this pull request Jun 14, 2024

Break down parallelize_llama for inference cases

8454241

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: #402

kwen2501 merged commit 042a00c into gh/kwen2501/6/base Jun 14, 2024
5 checks passed

kwen2501 added a commit that referenced this pull request Jun 14, 2024

Break down parallelize_llama for inference cases

d5b7525

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: #402

kwen2501 deleted the gh/kwen2501/6/head branch June 14, 2024 20:44

tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024

Break down parallelize_llama for inference cases

88d75c7

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: pytorch#402

philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024

Break down parallelize_llama for inference cases

a96fb82

ghstack-source-id: fc8e221b5047337f59dea31f2c51d6168fe4fe88 Pull Request resolved: pytorch#402

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break down parallelize_llama for inference cases #402

Break down parallelize_llama for inference cases #402

kwen2501 commented Jun 14, 2024 •

edited

Loading

tianyu-l left a comment

tianyu-l left a comment

tianyu-l Jun 14, 2024

kwen2501 Jun 14, 2024

Break down parallelize_llama for inference cases #402

Break down parallelize_llama for inference cases #402

Conversation

kwen2501 commented Jun 14, 2024 • edited Loading

tianyu-l left a comment

Choose a reason for hiding this comment

tianyu-l left a comment

Choose a reason for hiding this comment

tianyu-l Jun 14, 2024

Choose a reason for hiding this comment

kwen2501 Jun 14, 2024

Choose a reason for hiding this comment

kwen2501 commented Jun 14, 2024 •

edited

Loading