Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpo_vlm.py #2563

Open
5 of 9 tasks
liuchaohu opened this issue Jan 12, 2025 · 1 comment
Open
5 of 9 tasks

dpo_vlm.py #2563

liuchaohu opened this issue Jan 12, 2025 · 1 comment
Labels
🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models

Comments

@liuchaohu
Copy link

System Info

trl env

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
accelerate launch examples/scripts/dpo_vlm.py \
    --dataset_name HuggingFaceH4/rlaif-v_formatted \
    --model_name_or_path HuggingFaceM4/idefics2-8b \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 32 \
    --dataset_num_proc 32 \
    --output_dir dpo_idefics_rlaif-v \
    --bf16 \
    --torch_dtype bfloat16 \
    --gradient_checkpointing \
    --use_peft \
    --lora_target_modules=all-linear
"""
import os
import torch
from datasets import load_dataset, features
from transformers import AutoModelForVision2Seq, AutoProcessor

from trl import (
    DPOConfig,
    # DPOTrainer,
    ModelConfig,
    ScriptArguments,
    TrlParser,
    get_kbit_device_map,
    get_peft_config,
    get_quantization_config,
)

from dpo_trainer import DPOTrainer


if __name__ == "__main__":
    parser = TrlParser((ScriptArguments, DPOConfig, ModelConfig))
    script_args, training_args, model_args = parser.parse_args_and_config()

    ################
    # Model & Tokenizer
    ################
    torch_dtype = (
        model_args.torch_dtype if model_args.torch_dtype in ["auto", None] else getattr(torch, model_args.torch_dtype)
    )
    quantization_config = get_quantization_config(model_args)

    model_kwargs = dict(
        revision=model_args.model_revision,
        attn_implementation=model_args.attn_implementation,
        torch_dtype=torch_dtype,
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
        low_cpu_mem_usage=True,
    )
    model = AutoModelForVision2Seq.from_pretrained(
        model_args.model_name_or_path,
        trust_remote_code=model_args.trust_remote_code,
        **model_kwargs,

    )
    peft_config = get_peft_config(model_args)
    if peft_config is None:
        ref_model = AutoModelForVision2Seq.from_pretrained(
            model_args.model_name_or_path,
            trust_remote_code=model_args.trust_remote_code,
            **model_kwargs,
        )
    else:
        ref_model = None
    processor = AutoProcessor.from_pretrained(
        model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code, do_image_splitting=False
    )
    tokenizer = processor.tokenizer

    # Set up the chat template
    if model.config.model_type == "idefics2":
        pass  # the processor already has a valid chat template
    elif model.config.model_type == "paligemma":
        processor.chat_template = """{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}<|im_start|>{% if message['role'] == 'user' %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message['content'] if item['type'] == 'text' %}{{ item['text'] }}<|im_end|>{% endfor %}{% if message['role'] == 'user' %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}"""
    elif model.config.model_type == "llava":
        processor.chat_template = """{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{% if message['role'] == 'user' %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message['content'] %}{% if item['type'] == 'text' %}{{ item['text'] }}{% elif item['type'] == 'image' %}<image>{% endif %}{% endfor %}{% if message['role'] == 'user' %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}"""

    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    if script_args.ignore_bias_buffers:
        # torch distributed hack
        model._ddp_params_and_buffers_to_ignore = [
            name for name, buffer in model.named_buffers() if buffer.dtype == torch.bool
        ]

    ################
    # Dataset
    ################
    # dataset = load_dataset(script_args.dataset_name, name=script_args.dataset_config)

    def format(examples):
        """
        Convert prompt from "xxx" to [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "xxx"}]}]
        and chosen and rejected from "xxx" to [{"role": "assistant", "content": [{"type": "text", "text": "xxx"}]}].
        Images are wrapped in a list.
        """
        output = {"images": [], "prompt": [], "chosen": [], "rejected": []}
        for image, question, chosen, rejected in zip(examples["image"], examples["question"], examples["chosen"], examples["rejected"]):
            prompt = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": question}]}]
            chosen = [{"role": "assistant", "content": [{"type": "text", "text": chosen}]}]
            rejected = [{"role": "assistant", "content": [{"type": "text", "text": rejected}]}]
            output["images"].append([image])
            output["prompt"].append(prompt)
            output["chosen"].append(chosen)
            output["rejected"].append(rejected)
        return output


    dataset = load_dataset("data/openbmb/RLAIF-V-Dataset", split="train", num_proc=os.cpu_count())
    dataset = dataset.select(range(100))
    cols = dataset.column_names
    print(os.cpu_count())
    dataset = dataset.map(format, batched=True, writer_batch_size=4, batch_size=4, remove_columns=cols, num_proc=24)
    f = dataset.features
    f["images"] = features.Sequence(features.Image(decode=True))  # to avoid bytes
    dataset = dataset.cast(f)
    dataset = dataset.train_test_split(test_size=0.05)



    ################
    # Training
    ################
    trainer = DPOTrainer(
        model,
        ref_model,
        args=training_args,
        train_dataset=dataset[script_args.dataset_train_split],
        eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
        processing_class=processor,
        peft_config=peft_config,
    )

    trainer.train()

    # Save and push to hub
    trainer.save_model(training_args.output_dir)
    if training_args.push_to_hub:
        trainer.push_to_hub(dataset_name=script_args.dataset_name)
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Debug with Arguments",
      "type": "python",
      "request": "launch",
      "program": "${file}", 
      "console": "integratedTerminal",
      "args": [
        "--dataset_name", "../data/openbmb/RLAIF-V-Dataset",
        "--model_name_or_path", "../data/llava-hf/llava-v1.6-vicuna-7b-hf",
        "--per_device_train_batch_size", "1",
        "--gradient_accumulation_steps", "1",
        "--output_dir", "dpo_idefics_rlaif-v",
        "--bf16",
        "--torch_dtype", "bfloat16",
        "--learning_rate", "1e-5",
        "--rpo_alpha", "0.1",
        "--gradient_checkpointing",
        "--use_peft",
        "--lora_target_modules=all-linear",
        "--dataset_num_proc", "1",
        "--attn_implementation", "flash_attention_2",
        "--logging_steps", "1",
        "--output_dir", "results/debug",
      ]
    }
  ]
}

The model is llava-v1.6-vicuna-7b-hf, and the dataset is RLAIF-V-Dataset.

In the dpo_trainer.py, self.train_dataset includes

Dataset({
    features: ['images', 'prompt_input_ids', 'pixel_values', 'chosen_input_ids', 'rejected_input_ids', 'image_sizes'],
    num_rows: 100
})

However, when the program runs to DataCollatorForPreference, pixel_values disappears.
This leads to the fact that in subsequent training, the model cannot receive pixel_values, but it can receive image_sizes, which is very strange.

Expected behavior

I hope to know why the dataset itself contains pixel_values ​​but why it disappears during data_collect

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete
@liuchaohu
Copy link
Author

I know the reason why pixel_values disappears.
We should run the code the param "--remove_unused_columns false", otherwise pixel_values will be eliminated.

@August-murr August-murr added 🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 DPO Related to DPO 👁️ VLM Related to Visual Language Models
Projects
None yet
Development

No branches or pull requests

2 participants