Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPU Adaption for Sanna #10409

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

NPU Adaption for Sanna #10409

wants to merge 18 commits into from

Conversation

leisuzz
Copy link
Contributor

@leisuzz leisuzz commented Dec 30, 2024

What does this PR do?

Adaption for NPU training dreambooth lora for Sanna

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 30, 2024

@sayakpaul Please take a look at this PR for making Sanna suitable for NPU training. Thank you so much!

@@ -979,10 +982,10 @@ def main(args):
)

# VAE should always be kept in fp32 for SANA (?)
vae.to(dtype=torch.float32)
vae.to(accelerator.device, dtype=torch.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. As we conditionally put the VAE on and off the accelerator device.

transformer.to(accelerator.device, dtype=weight_dtype)
# because Gemma2 is particularly suited for bfloat16.
text_encoder.to(dtype=torch.bfloat16)
text_encoder.to(accelerator.device, dtype=torch.bfloat16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Training related changes look straightforward to me. Could we also add a note about this support in the README?

I will leave the models related changes for @yiyixuxu to review.

@sayakpaul sayakpaul requested a review from yiyixuxu December 30, 2024 04:04
@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 30, 2024

@sayakpaul Thanks for your help! You can definitely add this support in the README! By the way, I think once we've done the sd3 lora function, we should change the accelerate loading and saving functions at least for training scripts for dreambooth Lora

@sayakpaul
Copy link
Member

You can definitely add this support in the README!

Oh, I was wondering if you could just add a note about the NPU support in the README directly.

By the way, I think once we've done the sd3 lora function, we should change the accelerate loading and saving functions at least for training scripts for dreambooth Lora

This is not relevant for this PR, so, we can ignore.

@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 30, 2024

@sayakpaul I'm not sure what is the process, as both flux and sd don't have these notes. By the way, the npu training should be automatically proceed if the npu is available.

@sayakpaul
Copy link
Member

Okay we can leave it out of this PR then and open a future PR to add that note.

@leisuzz
Copy link
Contributor Author

leisuzz commented Dec 30, 2024

Sure, thanks for your help!

@leisuzz leisuzz requested a review from sayakpaul January 3, 2025 01:45
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in the training script looks good to me. Off to @yiyixuxu for the other changes.

@leisuzz
Copy link
Contributor Author

leisuzz commented Jan 6, 2025

@yiyixuxu Please take a look. Thank you!

@@ -119,6 +120,12 @@ def __init__(
# 2. Cross Attention
if cross_attention_dim is not None:
self.norm2 = nn.LayerNorm(dim, elementwise_affine=norm_elementwise_affine, eps=norm_eps)

if is_torch_npu_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in the other PR - let's not update default attn processor logic for now
we can manually set it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in the other PR - let's not update default attn processor logic for now we can manually set it

I've updated the new one, please take a look. This can just use set up NPU FA directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let you know when the full test is complete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu It still needs to modify the sanna_transformer file, so I think to check in the init it;s the best option now

@leisuzz
Copy link
Contributor Author

leisuzz commented Jan 8, 2025

@yiyixuxu Please take a look at this modification, if this is fine, I will update the FLUX one as well once this has been merged. Thank you!

@@ -294,6 +294,10 @@ def __init__(
processor = (
AttnProcessor2_0() if hasattr(F, "scaled_dot_product_attention") and self.scale_qk else AttnProcessor()
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm I don't think we should change the default attention processor here
let's keep this logic in SANA:)

@leisuzz
Copy link
Contributor Author

leisuzz commented Jan 14, 2025

@yiyixuxu I've changed the logic back in SANA, thanks for your help!

@@ -119,6 +120,13 @@ def __init__(
# 2. Cross Attention
if cross_attention_dim is not None:
self.norm2 = nn.LayerNorm(dim, elementwise_affine=norm_elementwise_affine, eps=norm_eps)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lawrence-cj let me know if it's ok with you to default to NPU attention when it's available:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm not familiar with NPU training and inference. Is this NPU device very popular in diffusers community?

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!
let's wait for SANA author to see if they like the default processor change, ok to me otherwise

@yiyixuxu
Copy link
Collaborator

actually, I think it's not so great if we only have this behavior for Sana: automatically use NPU when it is available

maybe let's not do this unless we want to change default for everything? you can still explicitly set NPU attention processor, no?

@leisuzz
Copy link
Contributor Author

leisuzz commented Jan 17, 2025

Hi @yiyixuxu, because the default attention processor is AttnProcessor2_0, so either to change it inside model or change the set processor in the attention_processor.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants