Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support InternLM3 Dense 8B Model #6640

Merged
merged 6 commits into from
Jan 14, 2025
Merged

Support InternLM3 Dense 8B Model #6640

merged 6 commits into from
Jan 14, 2025

Conversation

hhaAndroid
Copy link
Contributor

@hhaAndroid hhaAndroid commented Jan 14, 2025

Support InternLM3 Dense 8B Model.

Create a new file examples/train_full/internlm3_full_sft.yaml with the following content:

### model
model_name_or_path: internlm/internlm3-8b-instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset: alpaca_en,alpaca_zh
template: intern3
cutoff_len: 4096
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/interlm3/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-6
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 5000000000
# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_full/internlm3_full_sft.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/internlm3_full_sft.yaml

Note: Internlm3 dense 8b only supports transformers==4.47.1 for now, it is necessary to specify DISABLE_VERSION_CHECK=1.

Verified

This commit was signed with the committer’s verified signature.
jonas-jonas Jonas Hungershausen

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your nice integration, we have left a comment about the template class, could you please resolve it?

@hhaAndroid
Copy link
Contributor Author

hhaAndroid commented Jan 14, 2025

@hiyouga Hello, Why is it no longer necessary to specify efficient_eos=True? and Is there a suitable place to inform users that they must use version 4.47.1? Thank you.

Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga
Copy link
Owner

hiyouga commented Jan 14, 2025

@hiyouga Hello, Why is it no longer necessary to specify efficient_eos=True? and Is there a suitable place to inform users that they must use version 4.47.1? Thank you.

@hhaAndroid The refactored template does not need to change the eos token, instead it uses format_assistant to properly apply chat template.
To notify users to use the newer version, we can let the framework print a warning message in the patcher like

from ..extras.packages import is_transformers_version_greater_than

if getattr(config, "model_type", None) == "internlm3" and not is_transformers_version_greater_than("4.47.1"):
    logger.warning_rank0_once("InternLM3 model requires transformers >= 4.47.1, please upgrade it.")

Or raise an exception:

if getattr(config, "model_type", None) == "internlm3" and not is_transformers_version_greater_than("4.47.1"):
    raise RuntimeError("InternLM3 model requires transformers >= 4.47.1, please upgrade it.")

if getattr(config, "model_type", None) == "qwen":
setattr(config, "use_flash_attn", model_args.flash_attn == "fa2")
for dtype_name, dtype in [("fp16", torch.float16), ("bf16", torch.bfloat16), ("fp32", torch.float32)]:
setattr(config, dtype_name, model_args.compute_dtype == dtype)

Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga hiyouga merged commit deacc00 into hiyouga:main Jan 14, 2025
12 checks passed
@hiyouga hiyouga added the solved This problem has been already solved label Jan 14, 2025
@sebm123
Copy link

sebm123 commented Jan 14, 2025

Support InternLM3 Dense 8B Model.

Create a new file examples/train_full/internlm3_full_sft.yaml with the following content:

### model
model_name_or_path: internlm/internlm3-8b-instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset: alpaca_en,alpaca_zh
template: intern3
cutoff_len: 4096
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/interlm3/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
learning_rate: 1.0e-6
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 5000000000
# 1 gpu
DISABLE_VERSION_CHECK=1 llamafactory-cli train examples/train_full/internlm3_full_sft.yaml
# 1 node
DISABLE_VERSION_CHECK=1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/internlm3_full_sft.yaml

Note: Internlm3 dense 8b only supports transformers==4.47.1 for now, it is necessary to specify DISABLE_VERSION_CHECK=1.

1587causalai pushed a commit to 1587causalai/llama_factory that referenced this pull request Feb 18, 2025
* support internlm3

* update

* update

* update

* add hint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants