-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core
/ tests ] v1 slow tests
#1218
Changes from 31 commits
38b8d90
b2d6f41
10191f1
62743a3
affd0db
51c90b3
770ba3c
c524a65
7d75884
827df67
41a6839
a2dd624
cc15412
4582f4f
d82983e
a0552d5
c06cbd7
8b606d2
4dabf99
8a9a62b
e9fcc94
403d2cd
c7ddba4
33ce5dc
976fd54
210dcaf
9ecd19b
434fb28
9585ce8
4ea24af
37d5596
2823ab9
e16abf2
b2bdcb4
07c491a
83df91c
b4d875e
34e4ee7
ceedd8d
01b8ac0
57d1401
7f9762f
573d8da
350529e
f7cb79b
cc3b430
321bba1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,15 +2,11 @@ name: Slow tests (on push) | |
|
||
on: | ||
push: | ||
branches: [ main ] | ||
paths: | ||
# Run only when python files are modified | ||
- "trl/**.py" | ||
- "examples/**.py" | ||
branches: [ add-slow-tests ] | ||
env: | ||
RUN_SLOW: "yes" | ||
IS_GITHUB_CI: "1" | ||
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }} | ||
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} | ||
|
||
|
||
jobs: | ||
|
@@ -34,18 +30,20 @@ jobs: | |
- name: Pip install | ||
run: | | ||
source activate trl | ||
pip install -e . --no-deps | ||
pip install pytest-reportlog | ||
|
||
- name: Run common tests on single GPU | ||
pip install -e ".[test]" --no-deps | ||
pip install pytest-reportlog parameterized | ||
|
||
- name: Run slow SFT tests on single GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
make tests_common_gpu | ||
make slow_sft_tests | ||
|
||
- name: Run slow tests on single GPU | ||
- name: Run slow DPO tests on single GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
make slow_tests_single_gpu | ||
make slow_dpo_tests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think about just calling that thinking about it more what do you think about renaming/structuring the tests into:
What do you think? at the moment they are a bit intertwined which i find a bit confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, let me push few things There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I modified the make command to have a more global command that tests dpo + SFT. In the future if we want to test let's say KTO, we just need to extend that command |
||
|
||
- name: Generate Report | ||
if: always() | ||
|
@@ -74,24 +72,35 @@ jobs: | |
- name: Pip install | ||
run: | | ||
source activate trl | ||
pip install -e . --no-deps | ||
pip install pytest-reportlog | ||
|
||
- name: Run common tests on single GPU | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I propose to support common tests in a follow up PR, right now many of them fail because of device mismatch problems, I think it is ok to not have them for now as these tests are run anyway on the CI There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught. |
||
pip install -e ".[test]" --no-deps | ||
pip install pytest-reportlog parameterized | ||
|
||
- name: Run slow SFT tests on single GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
make tests_common_gpu | ||
make slow_sft_tests | ||
|
||
- name: Run slow tests on multi GPU | ||
- name: Run slow DPO tests on single GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
make slow_tests_multi_gpu | ||
make slow_dpo_tests | ||
|
||
- name: Run end-to-end SFT examples tests on multi GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
pip install deepspeed | ||
make run_sft_examples | ||
|
||
- name: Run end-to-end DPO examples tests on multi GPU | ||
if: always() | ||
run: | | ||
source activate trl | ||
pip install deepspeed | ||
make run_dpo_examples | ||
|
||
- name: Generate Reports | ||
if: always() | ||
run: | | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#!/bin/bash | ||
# This script runs an SFT example end-to-end on a tiny model using different possible configurations | ||
# but defaults to QLoRA + PEFT | ||
OUTPUT_DIR="test_dpo/" | ||
MODEL_NAME="HuggingFaceM4/tiny-random-LlamaForCausalLM" | ||
MAX_STEPS=5 | ||
BATCH_SIZE=2 | ||
SEQ_LEN=128 | ||
|
||
# Handle extra arguments in case one passes accelerate configs. | ||
EXTRA_ACCELERATE_ARGS="" | ||
EXTRA_TRAINING_ARGS="""--use_peft \ | ||
--load_in_4bit | ||
""" | ||
|
||
# This is a hack to get the number of available GPUs | ||
mapfile -t num_gpus < <(nvidia-smi --format=csv --query-gpu=index | tail -n+2 | wc -l) | ||
NUM_GPUS=${num_gpus[0]} | ||
younesbelkada marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if [[ "${TRL_ACCELERATE_CONFIG}" == "" ]]; then | ||
EXTRA_ACCELERATE_ARGS="" | ||
else | ||
EXTRA_ACCELERATE_ARGS="--config_file $TRL_ACCELERATE_CONFIG" | ||
# For DeepSpeed configs we need to set the `--fp16` flag to comply with our configs exposed | ||
# on `examples/accelerate_configs` and our runners do not support bf16 mixed precision training. | ||
if [[ $TRL_ACCELERATE_CONFIG == *"deepspeed"* ]]; then | ||
EXTRA_TRAINING_ARGS="--fp16" | ||
else | ||
echo "Keeping QLoRA + PEFT" | ||
fi | ||
fi | ||
|
||
|
||
CMD=""" | ||
accelerate launch $EXTRA_ACCELERATE_ARGS \ | ||
--num_processes $NUM_GPUS \ | ||
`pwd`/examples/scripts/dpo.py \ | ||
--model_name_or_path $MODEL_NAME \ | ||
--output_dir $OUTPUT_DIR \ | ||
--max_steps $MAX_STEPS \ | ||
--per_device_train_batch_size $BATCH_SIZE \ | ||
--max_length $SEQ_LEN \ | ||
$EXTRA_TRAINING_ARGS | ||
""" | ||
|
||
echo "Starting program..." | ||
|
||
{ # try | ||
echo $CMD | ||
eval "$CMD" | ||
} || { # catch | ||
# save log for exception | ||
echo "Operation Failed!" | ||
exit 1 | ||
} | ||
exit 0 | ||
Comment on lines
+48
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The bash script will be run on the assumption that : |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#!/bin/bash | ||
# This script runs an SFT example end-to-end on a tiny model using different possible configurations | ||
# but defaults to QLoRA + PEFT | ||
OUTPUT_DIR="test_sft/" | ||
MODEL_NAME="HuggingFaceM4/tiny-random-LlamaForCausalLM" | ||
DATASET_NAME="imdb" | ||
MAX_STEPS=5 | ||
BATCH_SIZE=2 | ||
SEQ_LEN=128 | ||
|
||
# Handle extra arguments in case one passes accelerate configs. | ||
EXTRA_ACCELERATE_ARGS="" | ||
EXTRA_TRAINING_ARGS="""--use_peft \ | ||
--load_in_4bit | ||
""" | ||
|
||
# This is a hack to get the number of available GPUs | ||
mapfile -t num_gpus < <(nvidia-smi --format=csv --query-gpu=index | tail -n+2 | wc -l) | ||
NUM_GPUS=${num_gpus[0]} | ||
|
||
if [[ "${TRL_ACCELERATE_CONFIG}" == "" ]]; then | ||
EXTRA_ACCELERATE_ARGS="" | ||
else | ||
EXTRA_ACCELERATE_ARGS="--config_file $TRL_ACCELERATE_CONFIG" | ||
# For DeepSpeed configs we need to set the `--fp16` flag to comply with our configs exposed | ||
# on `examples/accelerate_configs` and our runners do not support bf16 mixed precision training. | ||
if [[ $TRL_ACCELERATE_CONFIG == *"deepspeed"* ]]; then | ||
EXTRA_TRAINING_ARGS="--fp16" | ||
else | ||
echo "Keeping QLoRA + PEFT" | ||
fi | ||
fi | ||
|
||
|
||
CMD=""" | ||
accelerate launch $EXTRA_ACCELERATE_ARGS \ | ||
--num_processes $NUM_GPUS \ | ||
`pwd`/examples/scripts/sft.py \ | ||
--model_name $MODEL_NAME \ | ||
--dataset_name $DATASET_NAME \ | ||
--output_dir $OUTPUT_DIR \ | ||
--max_steps $MAX_STEPS \ | ||
--batch_size $BATCH_SIZE \ | ||
--seq_length $SEQ_LEN \ | ||
$EXTRA_TRAINING_ARGS | ||
""" | ||
|
||
echo "Starting program..." | ||
|
||
{ # try | ||
echo $CMD | ||
eval "$CMD" | ||
} || { # catch | ||
# save log for exception | ||
echo "Operation Failed!" | ||
exit 1 | ||
} | ||
exit 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,7 +55,7 @@ RUN source activate trl && \ | |
transformers \ | ||
accelerate \ | ||
peft \ | ||
trl | ||
trl[test]@git+https://github.com/huggingface/trl | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes sense to always build TRL from source on our docker images actually |
||
|
||
RUN source activate trl && \ | ||
pip freeze | grep trl | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
compute_environment: LOCAL_MACHINE | ||
debug: false | ||
distributed_type: "NO" | ||
downcast_bf16: 'no' | ||
gpu_ids: all | ||
machine_rank: 0 | ||
main_training_function: main | ||
mixed_precision: 'bf16' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't that also be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After thinking a bit, I think we can keep everything bf16 let me push something |
||
num_machines: 1 | ||
num_processes: 8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isn't that 8 gpus? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes but it gets overwritten in the shell script to keep the config file untouched |
||
rdzv_backend: static | ||
same_network: true | ||
tpu_env: [] | ||
tpu_use_cluster: false | ||
tpu_use_sudo: false | ||
use_cpu: false |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I adapted the DPO script to make sure it supports QLoRA, I feel this feature is quite under-rated today and should be publicized much more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To modify after final pass