Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix offload gpu tests etc #10366

Merged
merged 2 commits into from
Jan 21, 2025
Merged

fix offload gpu tests etc #10366

merged 2 commits into from
Jan 21, 2025

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Dec 24, 2024

this PR:

  1. fix the gpu offload tests
  2. refator sana tranformer so it can work with device map "auto"

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu yiyixuxu changed the title [WIP] fix offload gpu tests & a few device_map related refactor [WIP] fix offload gpu tests etc Jan 13, 2025
@@ -1080,7 +1080,7 @@ def test_cpu_offload(self):
torch.manual_seed(0)
base_output = model(**inputs_dict)

model_size = compute_module_persistent_sizes(model)[""]
model_size = compute_module_sizes(model)[""]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu yiyixuxu changed the title [WIP] fix offload gpu tests etc fix offload gpu tests etc Jan 13, 2025
@yiyixuxu yiyixuxu requested review from DN6 and a-r-r-o-w January 13, 2025 22:10
Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self, hidden_states: torch.Tensor, temb: torch.Tensor, scale_shift_table: torch.Tensor
) -> torch.Tensor:
hidden_states = self.norm(hidden_states)
shift, scale = (scale_shift_table[None] + temb[:, None].to(scale_shift_table.device)).chunk(2, dim=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a fan of this kind of device casting in forward but okay to keep it since we don't have better solution yet. These usually end up creating problems for anything that modifies device/dtype with hooks and we then have to use some workarounds.

Going forward, I think nn.Parameter's can be put in their own dummy nn.Module so that device map, or other things we're introducing (like group offloading or fp8 layerwise upcasting), works out of the box (as they will handle the weight/type-casting of inputs in overwritten pre-hook methods). If this sounds good, will do future model integrations with this design

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I actually did not think about this at all (I just copied from the original code) - could you explain why do we need this device casting here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I see. I think I missed it when reviewing the PR that added Sana, otherwise would have probably removed it then. I'm not really sure why it is needed here, and think it might be okay to remove

@DN6 DN6 merged commit a1f9a71 into main Jan 21, 2025
15 checks passed
@yiyixuxu yiyixuxu deleted the fix-max_memory branch January 21, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants