Scheduling fixes on MPS #10549

hlky · 2025-01-13T05:49:41Z

What does this PR do?

segfault in MPS scheduler tests is caused by randn_like, there are a few related PyTorch issues about problems with *_like functions on MPS.

float64 is unsupported on MPS, timesteps are float64 in scheduling_heun_discrete and scheduling_lms_discrete, this change should be ok as the timestep is downcast later anyway.

In test_schedulers using .to(sample.device, dtype=sample.dtype) instead of .to(sample.device).to(sample.dtype) should be the same but compatible with MPS.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-01-13T05:56:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-01-13T19:54:02Z

maybe it is easier to only use np32 for MPs? Some of the models we recently integrated are very sensitive to precision (e.g. MOCHI, LTX)

cc @bghira here for this opinions too

bghira · 2025-01-13T22:34:51Z

Sana is especially sensitive but it could be like the RoPE for Flux where we went from fp64 to fp32 and saw no real degradation. if it won't work on MPS maybe some CPU fallback code can work for those systems, but that sounds like an upstream pytorch limitation.

i guess i'd give it a whirl and see if the known sensitive models have an issue, and document potential instabilities with Pytorch on MPS (which is in general a good idea to hamper expectations)

hlky · 2025-01-14T07:06:01Z

Given the timesteps range casting int64->int32 should be lossless, no? and when the timestep is cast to float type before model int32->float16 etc. should also be lossless, no? Anyway it seems like the main fix for CI failure is

diffusers/tests/schedulers/test_schedulers.py

Line 725 in 28c1f3b

noise = torch.randn(scaled_sample.shape).to(torch_device)

There are some issues on PyTorch regarding *_like failures on MPS.

self.timesteps is cast to torch.int64 in some and add_noise (which is what the failing test was for) already handles casting for MPS, so we can revert these np.int64->np.int32 changes.

hlky · 2025-01-16T10:11:45Z

np.int64->np.int32 changes were not needed. With this PR scheduler tests on MPS are all passing.

917 passed, 15 skipped, 46 deselected, 15 warnings in 15.48s

yiyixuxu

thanks!

hlky added 2 commits January 13, 2025 05:35

use np.int32 in scheduling

7b53e97

test_add_noise_device

28c1f3b

hlky requested a review from yiyixuxu January 13, 2025 05:49

hlky added 2 commits January 16, 2025 09:59

-np.int32, fixes

67e3c4f

Merge branch 'main' into mps-scheduling-fix

fa0b19f

yiyixuxu approved these changes Jan 16, 2025

View reviewed changes

yiyixuxu merged commit 08e62fe into huggingface:main Jan 16, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling fixes on MPS #10549

Scheduling fixes on MPS #10549

hlky commented Jan 13, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 13, 2025

yiyixuxu commented Jan 13, 2025

bghira commented Jan 13, 2025

hlky commented Jan 14, 2025

hlky commented Jan 16, 2025

yiyixuxu left a comment

Scheduling fixes on MPS #10549

Scheduling fixes on MPS #10549

Conversation

hlky commented Jan 13, 2025 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jan 13, 2025

yiyixuxu commented Jan 13, 2025

bghira commented Jan 13, 2025

hlky commented Jan 14, 2025

hlky commented Jan 16, 2025

yiyixuxu left a comment

Choose a reason for hiding this comment

hlky commented Jan 13, 2025 •

edited

Loading