Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dreambooth finetune FLUX dev CLIPTextModel #10925

Open
Wuyiche opened this issue Feb 28, 2025 · 0 comments
Open

Dreambooth finetune FLUX dev CLIPTextModel #10925

Wuyiche opened this issue Feb 28, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Wuyiche
Copy link

Wuyiche commented Feb 28, 2025

Describe the bug

ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I used four A100 to full amount of fine-tuning Flux. 1 dev model, according to https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

I used the toy dog dataset (5 images) for fine-tuning.
I ran into a problem with max_position_embeddings for CLIPTextModel:

Reproduction

[rank1]: Traceback (most recent call last):
[rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1812, in
[rank1]: main(args)
[rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1351, in main
[rank1]: instance_prompt_hidden_states, instance_pooled_prompt_embeds, instance_text_ids = compute_text_embeddings(
[rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1339, in compute_text_embeddings
[rank1]: prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt(
[rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 963, in encode_prompt
[rank1]: pooled_prompt_embeds = _encode_prompt_with_clip(
[rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 937, in _encode_prompt_with_clip
[rank1]: prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=False)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1056, in forward
[rank1]: return self.text_model(
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 947, in forward
[rank1]: hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 283, in forward
[rank1]: raise ValueError(
[rank1]: ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I changed max_position_embeddings in CLIPTextModel but it doesn't work:
text_encoder_one = class_one.from_pretrained(
args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, variant=args.variant, max_position_embeddings=77,ignore_mismatched_sizes=True
)

My training script is as follows:

export MODEL_NAME="black-forest-labs/FLUX.1-dev"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="trained-flux"

accelerate launch train_dreambooth_flux.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--mixed_precision="bf16"
--instance_prompt="a photo of sks dog"
--resolution=1024
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="prodigy"
--learning_rate=1.
--report_to="wandb"
--lr_scheduler="constant"
--lr_warmup_steps=0
--max_train_steps=500
--validation_prompt="A photo of sks dog in a bucket"
--validation_epochs=25
--seed="0"
--push_to_hub

Logs

System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.31
  • Running on Google Colab?: No
  • Python version: 3.10.16
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.29.1
  • Transformers version: 4.49.0
  • Accelerate version: 1.4.0
  • PEFT version: 0.14.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA A100-SXM4-40GB, 40960 MiB
    NVIDIA A100-SXM4-40GB, 40960 MiB
    NVIDIA A100-SXM4-40GB, 40960 MiB
    NVIDIA A100-SXM4-40GB, 40960 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

@Wuyiche Wuyiche added the bug Something isn't working label Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant