Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最新代码的MiniCPM-V-2_6训练报错 #6655

Closed
1 task done
ML-GCN opened this issue Jan 15, 2025 · 3 comments
Closed
1 task done

最新代码的MiniCPM-V-2_6训练报错 #6655

ML-GCN opened this issue Jan 15, 2025 · 3 comments
Labels
solved This problem has been already solved

Comments

@ML-GCN
Copy link

ML-GCN commented Jan 15, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

root@ecs-50958108:/workspace/train/LLaMA-Factory# llamafactory-cli env
[2025-01-15 08:20:32,880] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • PyTorch version: 2.4.0a0+07cecf4168.nv24.05 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-PCIE-40GB
  • DeepSpeed version: 0.16.2

Reproduction

sh文件

model

model_name_or_path: /dataNfs/pre-trained/MiniCPM-V-2_6
trust_remote_code: true

method

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset

dataset_dir: data
dataset: mllm_demo # video: mllm_video_demo
template: minicpm_v
cutoff_len: 32000
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16
image_resolution: 1003520

output

output_dir: /dataNfs/checkpoint/visual/MiniCPM-V-2_6/test
logging_steps: 500
save_steps: 2000
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

eval

#val_size: 0.1
#per_device_eval_batch_size: 1
#eval_strategy: steps
#eval_steps: 500

报错
image
使用的是给定的mllm_demo数据 是否数据格式不兼容还是什么原因 望解答 谢谢

Others

No response

@ML-GCN ML-GCN added bug Something isn't working pending This problem is yet to be addressed labels Jan 15, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 15, 2025

手动更新模型文件 https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/modeling_minicpmv.py

@hiyouga hiyouga closed this as completed Jan 15, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Jan 15, 2025
@ML-GCN
Copy link
Author

ML-GCN commented Jan 15, 2025

手动更新模型文件 https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/modeling_minicpmv.py

更新后可以了 再请教一下
image
出现这三个警告是否有影响 因为我使用qwen2vl时没有出现过这种警告

@hiyouga
Copy link
Owner

hiyouga commented Jan 15, 2025

没有

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants