[LoRA] feat: support loading loras into 4bit quantized Flux models. #10578

sayakpaul · 2025-01-14T12:06:06Z

What does this PR do?

We broke Flux LoRA (not Control LoRA) loading for 4bit BnB Flux in 0.32.0, when supporting Flux Control LoRAs (yeah only applies to Flux).

To reproduce:

Code

import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
from huggingface_hub import hf_hub_download


transformer_4bit = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
    quantization_config=DiffusersBitsAndBytesConfig(
        load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
    ),
    torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer_4bit,
    torch_dtype=torch.bfloat16,
).to("cuda")

pipe.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), 
    adapter_name="hyper-sd"
)
pipe.set_adapters("hyper-sd", adapter_weights=0.125)

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    max_sequence_length=512,
    num_inference_steps=8,
    guidance_scale=50,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("out.jpg")

This above code will run with v0.31.0-release branch but will fail with v0.32.0-release along with main. This went uncaught because we don't test for it.

This PR attempts to partially fix the problem so that we can at least resort to a behavior similar to what was happening in the v0.31.0-release branch. Want to use this PR to refine how we're doing that. I want to ship this PR first and tackle the TODOs in a follow-up. Once this PR is done, we might have to do a patch release.

Related issue: #10550

sayakpaul · 2025-01-14T12:07:45Z

src/diffusers/loaders/lora_pipeline.py

+                    if quantization_config.load_in_4bit:
+                        expansion_shape = torch.Size(expansion_shape).numel()
+                        expansion_shape = ((expansion_shape + 1) // 2, 1)


Only 4bit bnb models flatten.

How about adding a comment along the lines of: "Handle 4bit bnb weights, which are flattened and compress 2 params into 1".

I'm not quite sure why we need (shape+1) // 2, maybe this could be added to the comment too.

Yeah, this comes from bitsandbytes. Cc: @matthewdouglas

This is for rounding, i.e. if expansion_shape is odd it will have an additional 8bit tensor with just one value packed into it.

HuggingFaceDocBuilderDev · 2025-01-14T12:13:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for fixing the issue with loading LoRA into Flux models that are quantized with 4 bit bnb.

The issue of the actual parameter shape is a common trap, it would be great if the original shape could be retrieved from bnb, not sure if there is a method for that.

Sayak, could you quickly sketch why this used to work and no longer does? IIRC, there was no special handling for 4 bit bnb previously, was there?

BenjaminBossan · 2025-01-14T12:45:23Z

src/diffusers/loaders/lora_pipeline.py

+                    if quantization_config.load_in_4bit:
+                        expansion_shape = torch.Size(expansion_shape).numel()
+                        expansion_shape = ((expansion_shape + 1) // 2, 1)


How about adding a comment along the lines of: "Handle 4bit bnb weights, which are flattened and compress 2 params into 1".

I'm not quite sure why we need (shape+1) // 2, maybe this could be added to the comment too.

BenjaminBossan · 2025-01-14T12:49:05Z

src/diffusers/loaders/lora_pipeline.py

+                module_weight_shape = module_weight.shape
+                expansion_shape = (out_features, in_features)
+                quantization_config = getattr(transformer, "quantization_config", None)
+                if quantization_config and quantization_config.quant_method == "bitsandbytes":


Would it make sense to have a utility function to get the shape to avoid code duplication?

Yes, this needs to happen. As I mentioned this PR is very much a PoC to gather feedback and I will refine it. But I wanted to first explore if this a good way to approach the problem.

sayakpaul · 2025-01-14T12:51:39Z

Sayak, could you quickly sketch why this used to work and no longer does? IIRC, there was no special handling for 4 bit bnb previously, was there?

It used to because we didn't have any support for loading Flux Control LoRA, some relevant pieces of vital code:

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 1941 in be62c85

def _maybe_expand_transformer_param_shape_or_error_(

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 2055 in be62c85

def _maybe_expand_lora_state_dict(cls, transformer, lora_state_dict):

BenjaminBossan · 2025-01-14T13:21:38Z

It used to because we didn't have any support for loading Flux Control LoRA

Okay, so Flux control LoRA + 4bit bnb was never possible. From the initial description, I got the wrong impression that this is a full regression.

sayakpaul · 2025-01-14T14:04:56Z

Okay, so Flux control LoRA + 4bit bnb was never possible. From the initial description, I got the wrong impression that this is a full regression.

Flux control LoRA + 4bit bnb was never possible.
However, Flux LoRA + 4bit bnb was possible. So, I think Flux Control LoRA support introduced a regression at this point as we cannot do Flux LoRA + 4bit bnb anymore? What am I missing?

BenjaminBossan · 2025-01-14T14:27:59Z

Flux control LoRA + 4bit bnb was never possible.

However, Flux LoRA + 4bit bnb was possible. So, I think Flux Control LoRA support introduced a regression at this point as we cannot do Flux LoRA + 4bit bnb anymore? What am I missing?

We fully agree there. What I meant is that I misunderstood your original message to mean that Flux control LoRA + 4 bit bnb used to work and was trying to understand why it breaks now. This is what I meant by "full regression".

matthewdouglas · 2025-01-14T14:50:04Z

The issue of the actual parameter shape is a common trap, it would be great if the original shape could be retrieved from bnb, not sure if there is a method for that.

@BenjaminBossan That's a good point. The Params4bit and Linear4bit should have a quant_state attribute on them, which has a shape to indicate the original shape. I haven't used this much myself but it could be an option here.

sayakpaul · 2025-01-14T15:03:16Z

We fully agree there. What I meant is that I misunderstood your original message to mean that Flux control LoRA + 4 bit bnb used to work and was trying to understand why it breaks now. This is what I meant by "full regression".

Thanks, @BenjaminBossan. I will work on the suggestions to make it ready for another review pass.

That's a good point. The Params4bit and Linear4bit should have a quant_state attribute on them, which has a shape to indicate the original shape. I haven't used this much myself but it could be an option here.

Thanks @matthewdouglas! Will try to see if we can use it here.

sayakpaul · 2025-01-14T15:10:08Z

Just confirmed that this works:

from diffusers import FluxTransformer2DModel

model = FluxTransformer2DModel.from_pretrained(
    "hf-internal-testing/flux.1-dev-nf4-pkg", subfolder="transformer"
)
print(model.context_embedder.weight.quant_state.shape)
# torch.Size([3072, 4096])

sayakpaul · 2025-01-15T04:15:27Z

@BenjaminBossan I have addressed the rest of the feedback. PTAL.

@DN6 ready for your review, too. Just as FYI, I have run all the LoRA related integration tests for Flux and they all pass.

DN6 · 2025-01-15T05:27:59Z

src/diffusers/loaders/lora_pipeline.py

+        base_weight_param_name: str = None,
+    ) -> "torch.Size":
+        def _get_weight_shape(weight: torch.Tensor):
+            return weight.quant_state.shape if weight.__class__.__name__ == "Params4bit" else weight.shape


8bit params preserve the original shape?

DN6 · 2025-01-15T06:11:17Z

src/diffusers/loaders/lora_pipeline.py

+            module_path = (
+                base_weight_param_name.rsplit(".weight", 1)[0]
+                if base_weight_param_name.endswith(".weight")
+                else base_weight_param_name


In what case would this param name not end with weight? Since the subsequent call to get_weight_shape assumes that there is a weight attribute?

True. Let me modify that. Thanks for catching.

sayakpaul · 2025-01-15T06:57:55Z

@DN6 applied changes and re-run all the required tests and they passed.

BenjaminBossan · 2025-01-15T11:00:08Z

Thanks for fixing this, Sayak.

…10578) * feat: support loading loras into 4bit quantized models. * updates * update * remove weight check.

feat: support loading loras into 4bit quantized models.

779c17b

sayakpaul requested a review from BenjaminBossan January 14, 2025 12:06

sayakpaul changed the title ~~[WIP] [LoRA] feat: support loading loras into 4bit quantized models.~~ [WIP] [LoRA] feat: support loading loras into 4bit quantized Flux models. Jan 14, 2025

sayakpaul commented Jan 14, 2025

View reviewed changes

sayakpaul requested a review from matthewdouglas January 14, 2025 12:08

BenjaminBossan reviewed Jan 14, 2025

View reviewed changes

sayakpaul mentioned this pull request Jan 15, 2025

[LoRA] Quanto Flux LoRA can't load #10512

Open

Merge branch 'main' into 4bit-lora-loading

f46ba42

sayakpaul mentioned this pull request Jan 15, 2025

[Tests] add: test to check 8bit bnb quantized models work with lora loading. #10576

Merged

updates

d3d8ef2

sayakpaul marked this pull request as ready for review January 15, 2025 04:14

sayakpaul changed the title ~~[WIP] [LoRA] feat: support loading loras into 4bit quantized Flux models.~~ [LoRA] feat: support loading loras into 4bit quantized Flux models. Jan 15, 2025

sayakpaul requested a review from DN6 January 15, 2025 04:14

update

8b13c1e

sayakpaul requested a review from BenjaminBossan January 15, 2025 04:26

DN6 reviewed Jan 15, 2025

View reviewed changes

remove weight check.

c92758f

sayakpaul requested a review from DN6 January 15, 2025 06:51

Merge branch 'main' into 4bit-lora-loading

a3f533b

DN6 approved these changes Jan 15, 2025

View reviewed changes

sayakpaul merged commit 2432f80 into main Jan 15, 2025
15 checks passed

sayakpaul deleted the 4bit-lora-loading branch January 15, 2025 07:10

sayakpaul mentioned this pull request Jan 15, 2025

[LoRA] support loading Flux Control LoRAs with bitsandbytes quantization #10588

Open

DN6 pushed a commit that referenced this pull request Jan 15, 2025

[LoRA] feat: support loading loras into 4bit quantized Flux models. (#…

263b973

…10578) * feat: support loading loras into 4bit quantized models. * updates * update * remove weight check.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoRA] feat: support loading loras into 4bit quantized Flux models. #10578

[LoRA] feat: support loading loras into 4bit quantized Flux models. #10578

sayakpaul commented Jan 14, 2025 •

edited

Loading

sayakpaul Jan 14, 2025

BenjaminBossan Jan 14, 2025

sayakpaul Jan 14, 2025

matthewdouglas Jan 14, 2025

HuggingFaceDocBuilderDev commented Jan 14, 2025

BenjaminBossan left a comment

BenjaminBossan Jan 14, 2025

BenjaminBossan Jan 14, 2025

sayakpaul Jan 14, 2025

sayakpaul commented Jan 14, 2025

BenjaminBossan commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

BenjaminBossan commented Jan 14, 2025

matthewdouglas commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

sayakpaul commented Jan 15, 2025

DN6 Jan 15, 2025

sayakpaul Jan 15, 2025

DN6 Jan 15, 2025

sayakpaul Jan 15, 2025

sayakpaul commented Jan 15, 2025

BenjaminBossan commented Jan 15, 2025

[LoRA] feat: support loading loras into 4bit quantized Flux models. #10578

[LoRA] feat: support loading loras into 4bit quantized Flux models. #10578

Conversation

sayakpaul commented Jan 14, 2025 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 14, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jan 14, 2025

BenjaminBossan commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

BenjaminBossan commented Jan 14, 2025

matthewdouglas commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

sayakpaul commented Jan 14, 2025

sayakpaul commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jan 15, 2025

BenjaminBossan commented Jan 15, 2025

sayakpaul commented Jan 14, 2025 •

edited

Loading