Skip to content

v0.31.0

Latest
Compare
Choose a tag to compare
@sayakpaul sayakpaul released this 22 Oct 14:15
· 69 commits to main since this release

v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):

  • A regular one
  • A timestep-distilled one enabling few-step inference

Make sure to fill up the form by going to the model page, and then run huggingface-cli login before running the code below.

# make sure to update diffusers
# pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
	"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

Follow the documentation to know more.

Cogview3-plus

We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!

from diffusers import CogView3PlusPipeline
import torch

pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")

# Enable it to reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."

image = pipe(
    prompt=prompt,
    guidance_scale=7.0,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview3.png")

Refer to the documentation to know more.

Quantization

We have landed native quantization support in Diffusers, starting with bitsandbytes as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
pip install -Uq diffusers
from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
import torch

ckpt_id = "black-forest-labs/FLUX.1-dev"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = FluxTransformer2DModel.from_pretrained(
    ckpt_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

Then, we use model_nf4 to instantiate the FluxPipeline:

from diffusers import FluxPipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    ckpt_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"

image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.

Training scripts

We have a fresh bucket of training scripts with this release:

Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.

Misc

  • We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
  • Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!

All commits

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ighoshsubho
    • Feature flux controlnet img2img and inpaint pipeline (#9408)
    • flux controlnet control_guidance_start and control_guidance_end implement (#9571)
  • @noskill
    • adapt masked im2im pipeline for SDXL (#7790)
  • @saqlain2204
    • [Tests] Reduce the model size in the lumina test (#8985)
    • Add Differential Diffusion to Kolors (#9423)
  • @hlky
    • [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
    • Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
    • Add exponential sigmas to other schedulers and update docs (#9518)
    • [Schedulers] Add beta sigmas / beta noise schedule (#9509)
    • Add beta sigmas to other schedulers and update docs (#9538)
    • FluxMultiControlNetModel (#9647)
    • Add pred_original_sample to if not return_dict path (#9649)
    • Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
    • Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
    • Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
    • Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
    • Add prompt scheduling callback to community scripts (#9718)
  • @yiyixuxu
    • a few fix for SingleFile tests (#9522)
    • update get_parameter_dtype (#9526)
    • flux controlnet fix (control_modes batch & others) (#9507)
    • [sd3] make sure height and size are divisible by 16 (#9573)
    • [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
    • minor doc/test update (#9734)
    • fix singlestep dpm tests (#9716)
  • @PromeAIpro
    • [examples] add train flux-controlnet scripts in example. (#9324)
  • @juancopi81
    • Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
  • @glide-the
    • fix: CogVideox train dataset _preprocess_data crop video (#9574)
    • Docs: CogVideoX (#9578)
  • @SahilCarterr
    • add PAG support for SD Img2Img (#9463)
    • Added Lora Support to SD3 Img2Img Pipeline (#9659)
  • @ryanlyn
    • Flux - soft inpainting via differential diffusion (#9268)
  • @zRzRzRzRzRzRzR
    • CogView3Plus DiT (#9570)
  • @tolgacangoz
    • [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
    • Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
  • @linoytsaban
    • [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
    • [Flux] Add advanced training script + support textual inversion inference (#9434)
    • [advanced flux lora script] minor updates to readme (#9705)