Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core / tests ] v1 slow tests #1218

Merged
merged 47 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
38b8d90
v1 slow tests
younesbelkada Jan 11, 2024
b2d6f41
nit
younesbelkada Jan 11, 2024
10191f1
add qlora tests for DPO
younesbelkada Jan 11, 2024
62743a3
add decorator
younesbelkada Jan 11, 2024
affd0db
release memory + log reports
younesbelkada Jan 11, 2024
51c90b3
report to none to avoid seg fault issues
younesbelkada Jan 11, 2024
770ba3c
update setup
younesbelkada Jan 11, 2024
c524a65
fix
younesbelkada Jan 11, 2024
7d75884
add exampel testing
younesbelkada Jan 11, 2024
827df67
fix nit
younesbelkada Jan 11, 2024
41a6839
change temp filename
younesbelkada Jan 11, 2024
a2dd624
add workflow file
younesbelkada Jan 11, 2024
cc15412
fix comment
younesbelkada Jan 11, 2024
4582f4f
add slack push script
younesbelkada Jan 12, 2024
d82983e
more tests for DPO
younesbelkada Jan 12, 2024
a0552d5
add dpo example tests
younesbelkada Jan 12, 2024
c06cbd7
another makefile command
younesbelkada Jan 12, 2024
8b606d2
fix
younesbelkada Jan 12, 2024
4dabf99
add paths + clean up
younesbelkada Jan 12, 2024
8a9a62b
nit
younesbelkada Jan 12, 2024
e9fcc94
Merge remote-tracking branch 'origin/main' into add-slow-tests
younesbelkada Jan 12, 2024
403d2cd
Update slow-tests.yml
younesbelkada Jan 12, 2024
c7ddba4
trigger tests
younesbelkada Jan 12, 2024
33ce5dc
Merge branch 'add-slow-tests' of https://github.com/lvwerra/trl into …
younesbelkada Jan 12, 2024
976fd54
up
younesbelkada Jan 12, 2024
210dcaf
up
younesbelkada Jan 12, 2024
9ecd19b
more fixes
younesbelkada Jan 12, 2024
434fb28
fix
younesbelkada Jan 12, 2024
9585ce8
final fixes
younesbelkada Jan 12, 2024
4ea24af
minor fixes
younesbelkada Jan 12, 2024
37d5596
oops
younesbelkada Jan 12, 2024
2823ab9
add more text
younesbelkada Jan 12, 2024
e16abf2
fix
younesbelkada Jan 12, 2024
b2bdcb4
more
younesbelkada Jan 12, 2024
07c491a
trigger CI
younesbelkada Jan 12, 2024
83df91c
up
younesbelkada Jan 12, 2024
b4d875e
fix
younesbelkada Jan 12, 2024
34e4ee7
remove
younesbelkada Jan 12, 2024
ceedd8d
run the tests on 2 GPUs only
younesbelkada Jan 12, 2024
01b8ac0
final fix SFT
younesbelkada Jan 12, 2024
57d1401
Merge remote-tracking branch 'origin/main' into add-slow-tests
younesbelkada Jan 15, 2024
7f9762f
revert config files + address comments
younesbelkada Jan 15, 2024
573d8da
fix
younesbelkada Jan 15, 2024
350529e
add Phi
younesbelkada Jan 15, 2024
f7cb79b
Merge remote-tracking branch 'origin/main' into add-slow-tests
younesbelkada Jan 17, 2024
cc3b430
final fixes
younesbelkada Jan 17, 2024
321bba1
final fix
younesbelkada Jan 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 16 additions & 26 deletions .github/workflows/slow-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,11 @@ name: Slow tests (on push)

on:
push:
branches: [ main ]
paths:
# Run only when python files are modified
- "trl/**.py"
- "examples/**.py"
branches: [ add-slow-tests ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To modify after final pass

env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}


jobs:
Expand All @@ -34,18 +30,14 @@ jobs:
- name: Pip install
run: |
source activate trl
pip install -e . --no-deps
pip install pytest-reportlog

- name: Run common tests on single GPU
run: |
source activate trl
make tests_common_gpu
pip install -e ".[test]" --no-deps
pip install pytest-reportlog parameterized

- name: Run slow tests on single GPU
- name: Run slow SFT tests on single GPU
if: always()
run: |
source activate trl
make slow_tests_single_gpu
make slow_tests

- name: Generate Report
if: always()
Expand Down Expand Up @@ -74,23 +66,21 @@ jobs:
- name: Pip install
run: |
source activate trl
pip install -e . --no-deps
pip install pytest-reportlog

- name: Run common tests on single GPU
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to support common tests in a follow up PR, right now many of them fail because of device mismatch problems, I think it is ok to not have them for now as these tests are run anyway on the CI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught.

run: |
source activate trl
make tests_common_gpu
pip install -e ".[test]" --no-deps
pip install pytest-reportlog parameterized

- name: Run slow tests on multi GPU
- name: Run slow SFT tests on Multi GPU
if: always()
run: |
source activate trl
make slow_tests_multi_gpu
make slow_tests

- name: Run end-to-end SFT examples tests on multi GPU
- name: Run end-to-end examples tests on multi GPU
if: always()
run: |
source activate trl
make run_sft_examples
pip install deepspeed
make test_examples

- name: Generate Reports
if: always()
Expand Down
24 changes: 23 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
.PHONY: test precommit benchmark_core benchmark_aux
.PHONY: test precommit benchmark_core benchmark_aux common_tests slow_tests test_examples

check_dirs := examples tests trl

ACCELERATE_CONFIG_PATH = `pwd`/examples/accelerate_configs
COMMAND_FILES_PATH = `pwd`/commands

test:
python -m pytest -n auto --dist=loadfile -s -v ./tests/

Expand All @@ -13,3 +16,22 @@ benchmark_core:

benchmark_aux:
bash ./benchmark/benchmark_aux.sh

tests_common_gpu:
python -m pytest tests/test_* $(if $(IS_GITHUB_CI),--report-log "common_tests.log",)
younesbelkada marked this conversation as resolved.
Show resolved Hide resolved

slow_tests:
python -m pytest tests/slow/test_* $(if $(IS_GITHUB_CI),--report-log "slow_tests.log",)

test_examples:
touch temp_results_sft_tests.txt
for file in $(ACCELERATE_CONFIG_PATH)/*.yaml; do \
TRL_ACCELERATE_CONFIG=$${file} bash $(COMMAND_FILES_PATH)/run_sft.sh; \
echo $$?','$${file} >> temp_results_sft_tests.txt; \
done

touch temp_results_dpo_tests.txt
for file in $(ACCELERATE_CONFIG_PATH)/*.yaml; do \
TRL_ACCELERATE_CONFIG=$${file} bash $(COMMAND_FILES_PATH)/run_dpo.sh; \
echo $$?','$${file} >> temp_results_dpo_tests.txt; \
done
56 changes: 56 additions & 0 deletions commands/run_dpo.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash
# This script runs an SFT example end-to-end on a tiny model using different possible configurations
# but defaults to QLoRA + PEFT
OUTPUT_DIR="test_dpo/"
MODEL_NAME="HuggingFaceM4/tiny-random-LlamaForCausalLM"
MAX_STEPS=5
BATCH_SIZE=2
SEQ_LEN=128

# Handle extra arguments in case one passes accelerate configs.
EXTRA_ACCELERATE_ARGS=""
EXTRA_TRAINING_ARGS="""--use_peft \
--load_in_4bit
"""

# This is a hack to get the number of available GPUs
NUM_GPUS=2

if [[ "${TRL_ACCELERATE_CONFIG}" == "" ]]; then
EXTRA_ACCELERATE_ARGS=""
else
EXTRA_ACCELERATE_ARGS="--config_file $TRL_ACCELERATE_CONFIG"
# For DeepSpeed configs we need to set the `--fp16` flag to comply with our configs exposed
# on `examples/accelerate_configs` and our runners do not support bf16 mixed precision training.
if [[ $TRL_ACCELERATE_CONFIG == *"deepspeed"* ]]; then
EXTRA_TRAINING_ARGS="--fp16"
else
echo "Keeping QLoRA + PEFT"
fi
fi


CMD="""
accelerate launch $EXTRA_ACCELERATE_ARGS \
--num_processes $NUM_GPUS \
--mixed_precision 'fp16' \
`pwd`/examples/scripts/dpo.py \
--model_name_or_path $MODEL_NAME \
--output_dir $OUTPUT_DIR \
--max_steps $MAX_STEPS \
--per_device_train_batch_size $BATCH_SIZE \
--max_length $SEQ_LEN \
$EXTRA_TRAINING_ARGS
"""

echo "Starting program..."

{ # try
echo $CMD
eval "$CMD"
} || { # catch
# save log for exception
echo "Operation Failed!"
exit 1
}
exit 0
Comment on lines +48 to +56
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bash script will be run on the assumption that :
if the training fails it will return the exit status 1, else 0 and we use that info with the makefile command to retrieve the exit status of the previous bash command to see if the script failed or not

59 changes: 59 additions & 0 deletions commands/run_sft.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash
# This script runs an SFT example end-to-end on a tiny model using different possible configurations
# but defaults to QLoRA + PEFT
OUTPUT_DIR="test_sft/"
MODEL_NAME="HuggingFaceM4/tiny-random-LlamaForCausalLM"
DATASET_NAME="imdb"
MAX_STEPS=5
BATCH_SIZE=2
SEQ_LEN=128


# Handle extra arguments in case one passes accelerate configs.
EXTRA_ACCELERATE_ARGS=""
EXTRA_TRAINING_ARGS="""--use_peft \
--load_in_4bit
"""

# Set your number of GPUs here
NUM_GPUS=2

if [[ "${TRL_ACCELERATE_CONFIG}" == "" ]]; then
EXTRA_ACCELERATE_ARGS=""
else
EXTRA_ACCELERATE_ARGS="--config_file $TRL_ACCELERATE_CONFIG"
# For DeepSpeed configs we need to set the `--fp16` flag to comply with our configs exposed
# on `examples/accelerate_configs` and our runners do not support bf16 mixed precision training.
if [[ $TRL_ACCELERATE_CONFIG == *"deepspeed"* ]]; then
EXTRA_TRAINING_ARGS="--fp16"
else
echo "Keeping QLoRA + PEFT"
fi
fi


CMD="""
accelerate launch $EXTRA_ACCELERATE_ARGS \
--num_processes $NUM_GPUS \
--mixed_precision 'fp16' \
`pwd`/examples/scripts/sft.py \
--model_name $MODEL_NAME \
--dataset_name $DATASET_NAME \
--output_dir $OUTPUT_DIR \
--max_steps $MAX_STEPS \
--batch_size $BATCH_SIZE \
--seq_length $SEQ_LEN \
$EXTRA_TRAINING_ARGS
"""

echo "Starting program..."

{ # try
echo $CMD
eval "$CMD"
} || { # catch
# save log for exception
echo "Operation Failed!"
exit 1
}
exit 0
2 changes: 1 addition & 1 deletion docker/trl-latest-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ RUN source activate trl && \
transformers \
accelerate \
peft \
trl
trl[test]@git+https://github.com/huggingface/trl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to always build TRL from source on our docker images actually


RUN source activate trl && \
pip freeze | grep trl
Expand Down
16 changes: 16 additions & 0 deletions examples/accelerate_configs/single_gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: "NO"
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: 'bf16'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that also be fp16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit, I think we can keep everything bf16 let me push something

num_machines: 1
num_processes: 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that 8 gpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it gets overwritten in the shell script to keep the config file untouched

rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
53 changes: 46 additions & 7 deletions examples/scripts/dpo.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the DPO script to make sure it supports QLoRA, I feel this feature is quite under-rated today and should be publicized much more

Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@
from typing import Dict, Optional

import torch
from accelerate import PartialState
from datasets import Dataset, load_dataset
from peft import LoraConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser, TrainingArguments
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments

from trl import DPOTrainer
from trl import DPOTrainer, is_xpu_available


# Define and parse arguments.
Expand All @@ -45,6 +46,13 @@ class ScriptArguments:
gradient_accumulation_steps: Optional[int] = field(
default=1, metadata={"help": "the number of gradient accumulation steps"}
)
output_dir: Optional[str] = field(default="output", metadata={"help": "the output directory"})
fp16: Optional[bool] = field(
default=False, metadata={"help": "Whether to activate fp16 mixed precision during training"}
)
bf16: Optional[bool] = field(
default=False, metadata={"help": "Whether to activate bf16 mixed precision during training"}
)
max_length: Optional[int] = field(default=512, metadata={"help": "max length of each sample"})
max_prompt_length: Optional[int] = field(default=128, metadata={"help": "max length of each sample's prompt"})
max_target_length: Optional[int] = field(
Expand Down Expand Up @@ -83,6 +91,9 @@ class ScriptArguments:
"help": "key word arguments to be passed along `torch.utils.checkpoint.checkpoint` method - e.g. `use_reentrant=False`"
},
)
load_in_8bit: Optional[bool] = field(default=False, metadata={"help": "load the model in 8 bits precision"})
load_in_4bit: Optional[bool] = field(default=False, metadata={"help": "load the model in 4 bits precision"})
generate_during_eval: Optional[bool] = field(default=False, metadata={"help": "Generate during evaluation"})


def extract_anthropic_prompt(prompt_and_response):
Expand Down Expand Up @@ -126,16 +137,43 @@ def split_prompt_and_responses(sample) -> Dict[str, str]:
parser = HfArgumentParser(ScriptArguments)
script_args = parser.parse_args_into_dataclasses()[0]

if script_args.load_in_8bit and script_args.load_in_4bit:
raise ValueError("You can't load the model in 8 bits and 4 bits at the same time")
elif script_args.load_in_8bit or script_args.load_in_4bit:
quantization_config = BitsAndBytesConfig(
load_in_8bit=script_args.load_in_8bit, load_in_4bit=script_args.load_in_4bit
)
# Copy the model to each device
device_map = (
{"": f"xpu:{PartialState().local_process_index}"}
if is_xpu_available()
else {"": PartialState().local_process_index}
)
torch_dtype = torch.bfloat16
else:
device_map = None
quantization_config = None
torch_dtype = None

# 1. load a pretrained model
model = AutoModelForCausalLM.from_pretrained(script_args.model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
script_args.model_name_or_path,
device_map=device_map,
quantization_config=quantization_config,
torch_dtype=torch_dtype,
)

if script_args.ignore_bias_buffers:
# torch distributed hack
model._ddp_params_and_buffers_to_ignore = [
name for name, buffer in model.named_buffers() if buffer.dtype == torch.bool
]

model_ref = AutoModelForCausalLM.from_pretrained(script_args.model_name_or_path)
if not script_args.use_peft:
model_ref = AutoModelForCausalLM.from_pretrained(script_args.model_name_or_path)
else:
# If one uses PEFT, there is no need to load a reference model
model_ref = None

tokenizer = AutoTokenizer.from_pretrained(script_args.model_name_or_path)
if tokenizer.pad_token is None:
Expand All @@ -158,11 +196,12 @@ def split_prompt_and_responses(sample) -> Dict[str, str]:
logging_first_step=True,
logging_steps=10, # match results in blog post
eval_steps=500,
output_dir="./test",
output_dir=script_args.output_dir,
optim="rmsprop",
warmup_steps=150,
report_to=script_args.report_to,
bf16=True,
bf16=script_args.bf16,
fp16=script_args.fp16,
gradient_checkpointing=script_args.gradient_checkpointing,
# TODO: uncomment that on the next transformers release
# gradient_checkpointing_kwargs=script_args.gradient_checkpointing_kwargs,
Expand Down Expand Up @@ -190,7 +229,7 @@ def split_prompt_and_responses(sample) -> Dict[str, str]:
max_length=script_args.max_length,
max_target_length=script_args.max_target_length,
max_prompt_length=script_args.max_prompt_length,
generate_during_eval=True,
generate_during_eval=script_args.generate_during_eval,
peft_config=peft_config,
)

Expand Down
7 changes: 5 additions & 2 deletions examples/scripts/sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,16 @@ class ScriptArguments:
peft_lora_r: Optional[int] = field(default=64, metadata={"help": "the r parameter of the LoRA adapters"})
peft_lora_alpha: Optional[int] = field(default=16, metadata={"help": "the alpha parameter of the LoRA adapters"})
logging_steps: Optional[int] = field(default=1, metadata={"help": "the number of logging steps"})
use_auth_token: Optional[bool] = field(default=True, metadata={"help": "Use HF auth token to access the model"})
use_auth_token: Optional[bool] = field(default=False, metadata={"help": "Use HF auth token to access the model"})
num_train_epochs: Optional[int] = field(default=3, metadata={"help": "the number of training epochs"})
max_steps: Optional[int] = field(default=-1, metadata={"help": "the number of training steps"})
save_steps: Optional[int] = field(
default=100, metadata={"help": "Number of updates steps before two checkpoint saves"}
)
save_total_limit: Optional[int] = field(default=10, metadata={"help": "Limits total number of checkpoints."})
push_to_hub: Optional[bool] = field(default=False, metadata={"help": "Push the model to HF Hub"})
fp16: Optional[bool] = field(default=False, metadata={"help": "Whether to activate fp16 mixed precision"})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were missing before

bf16: Optional[bool] = field(default=False, metadata={"help": "Whether to activate bf16 mixed precision"})
gradient_checkpointing: Optional[bool] = field(
default=False, metadata={"help": "Whether to use gradient checkpointing or no"}
)
Expand Down Expand Up @@ -115,7 +117,6 @@ class ScriptArguments:
training_args = TrainingArguments(
output_dir=script_args.output_dir,
per_device_train_batch_size=script_args.batch_size,
gradient_accumulation_steps=script_args.gradient_accumulation_steps,
younesbelkada marked this conversation as resolved.
Show resolved Hide resolved
learning_rate=script_args.learning_rate,
logging_steps=script_args.logging_steps,
num_train_epochs=script_args.num_train_epochs,
Expand All @@ -126,6 +127,8 @@ class ScriptArguments:
push_to_hub=script_args.push_to_hub,
hub_model_id=script_args.hub_model_id,
gradient_checkpointing=script_args.gradient_checkpointing,
fp16=script_args.fp16,
bf16=script_args.bf16,
# TODO: uncomment that on the next release
# gradient_checkpointing_kwargs=script_args.gradient_checkpointing_kwargs,
)
Expand Down
Loading
Loading