bugfix for npu not support float64 #10123

baymax591 · 2024-12-05T02:30:50Z

What does this PR do?

When using the FLUX model on NPU devices, an error was found in the embeddings.py file. After locating the problem, it was discovered that the issue was due to freqs_dtype being float64, which is not supported on NPU. To resolve this issue, a check for device.type was implemented. When using an NPU, float32 is used instead.

example used

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("/data/baymax/models/FLUX.1-dev", 
                                    torch_dtype=torch.bfloat16,
                                    device_map="balanced"
                                    )

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

Before this PR

(baymax) [root@modelfoundry-prod-node-0002 baymax]# python test_diffusers.py 
Loading checkpoint shards: 100%|███████████████████████████████████████████████| 2/2 [00:02<00:00,  1.18s/it]
Loading pipeline components...:  71%|██████████████████████████████            | 5/7 [00:04<00:01,  1.40it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████████████████████████████████████| 7/7 [00:12<00:00,  1.82s/it]
  0%|                                                                                 | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/baymax/test_diffusers.py", line 11, in <module>
    image = pipe(
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux.py", line 730, in __call__
    noise_pred = self.transformer(
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 475, in forward
    image_rotary_emb = self.pos_embed(ids)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 761, in forward
    cos, sin = get_1d_rotary_pos_embed(
  File "/root/miniconda3/envs/baymax/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 683, in get_1d_rotary_pos_embed
    freqs_cos = freqs.cos().repeat_interleave(2, dim=1).float()  # [S, D]
RuntimeError: call aclnnRepeatInterleaveIntWithDim failed, detail:EZ1001: [PID: 483017] 2024-12-04-17:12:59.064.468 self not implemented for DT_DOUBLE, should be in dtype support list [DT_UINT8,DT_INT8,DT_INT16,DT_INT32,DT_INT64,DT_BOOL,DT_FLOAT16,DT_FLOAT,DT_BFLOAT16,].

[ERROR] 2024-12-04-17:12:59 (PID:483017, Device:4, RankID:-1) ERR01005 OPS internal error

After this PR

(baymax) [root@modelfoundry-prod-node-0002 baymax]# python test_diffusers.py 
Loading checkpoint shards: 100%|███████████████████████████████████████████| 2/2 [00:03<00:00,  1.67s/it]
Loading pipeline components...:  57%|█████████████████████▋                | 4/7 [00:17<00:16,  5.51s/it]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████████████████████████████████| 7/7 [00:18<00:00,  2.70s/it]
100%|████████████████████████████████████████████████████████████████████| 50/50 [01:05<00:00,  1.31s/it]

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @yiyixuxu @sayakpaul

github-actions · 2025-01-04T15:02:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

a-r-r-o-w

I think the changes should be relatively safe as it is device-dependant changes only in this PR.

Would may be cleaner if we do a follow-up to handle these kinds of device-specific handling of dtypes with better design in the scheduler. Off to @yiyixuxu for review, and I can handle any of the pipelines we're missing or that were newly added in the duration this PR went stale

HuggingFaceDocBuilderDev · 2025-01-10T06:42:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

I've applied this to newer code and changed it from is_mps_or_is_npu to separate is_mps is_npu to match what was done in FluxPosEmbed.

baymax591 · 2025-01-20T01:17:39Z

cc @yiyixuxu

bugfix for npu not support float64

b9ec619

sayakpaul requested review from yiyixuxu and a-r-r-o-w and removed request for yiyixuxu December 5, 2024 02:41

github-actions bot added the stale Issues that haven't received updates label Jan 4, 2025

a-r-r-o-w reviewed Jan 7, 2025

View reviewed changes

a-r-r-o-w added wip and removed stale Issues that haven't received updates labels Jan 7, 2025

hlky added 2 commits January 10, 2025 06:24

Merge branch 'main' into main

3454384

is_mps is_npu

8fe6408

hlky approved these changes Jan 10, 2025

View reviewed changes

Merge branch 'main' into main

343c5cb

hlky added close-to-merge and removed wip labels Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix for npu not support float64 #10123

bugfix for npu not support float64 #10123

baymax591 commented Dec 5, 2024 •

edited

Loading

github-actions bot commented Jan 4, 2025

a-r-r-o-w left a comment

HuggingFaceDocBuilderDev commented Jan 10, 2025

hlky left a comment

baymax591 commented Jan 20, 2025

bugfix for npu not support float64 #10123

Are you sure you want to change the base?

bugfix for npu not support float64 #10123

Conversation

baymax591 commented Dec 5, 2024 • edited Loading

What does this PR do?

example used

Before this PR

After this PR

Before submitting

Who can review?

github-actions bot commented Jan 4, 2025

a-r-r-o-w left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 10, 2025

hlky left a comment

Choose a reason for hiding this comment

baymax591 commented Jan 20, 2025

baymax591 commented Dec 5, 2024 •

edited

Loading