-
Notifications
You must be signed in to change notification settings - Fork 22.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FSDP2] Allowed
List[nn.Module]
as arg (#127786)
This PR allows `fully_shard`'s first argument to be `List[nn.Module]` instead of strictly `nn.Module`. This allows more flexible grouping of modules/parameters for communication, which can lead to memory savings and/or more efficient communication. **Approach** At a high level, we can think of a model as a tree of modules. Previously, we could only select specific module nodes in this tree as representing one FSDP parameter group. With this PR, we can select a group of module nodes, effectively becoming a single super node. To implement the runtime schedule, we define new forward hooks that run based on the following semantics: - If a module is the first to run the pre-hook, actually run the given pre-hook. Otherwise, the pre-hook is no-op. - If a module is the last to run the post-hook, actually run the given post-hook. Otherwise, the post-hook is a no-op. - First and last are determined by scoreboarding against a set of the modules. - This set must get cleared at the end of backward in the case that >=1 module in the list is never used, in which case we still want the forward hooks to run in the next forward after this backward. Beyond these new forward hooks, everything else is some simple generalization from `Module` to `List[Module]` or `Tuple[Module, ...]`. **Examples** This PR enables wrapping Llama models more efficiently by grouping the final norm and output linear together: pytorch/torchtitan#382. If at least one of the modules in the list does not run forward before backward, then there will be a warning message like: ``` 1 of the 2 modules passed to fully_shard did not run forward before backward, which is error-prone since FSDP post-forward/pre-backward logic will not run for these modules. We recommend passing only modules that run forward together. Modules that did not run forward: [FSDPLinear(in_features=1, out_features=1, bias=True)] ``` Pull Request resolved: #127786 Approved by: https://github.com/yf225, https://github.com/weifengpy ghstack dependencies: #127773 ghstack-source-id: c70870546bed6108b163062b88415cbba3d37925
- Loading branch information
Showing
8 changed files
with
402 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.