[`core` / tests ] v1 slow tests #1218

younesbelkada · 2024-01-11T06:29:01Z

What does this PR do?

This PR makes the current CI stronger and more robust by introducing GPU slow tests in order to catch some issues such as #1216 before each release (happened twice this week!). The goal is to run as many tests as possible to cover almost all usecases of most used classes.

I'll progressively update the PR by

Adding the correct workflow file
Adding the Slack messaging system for getting feedback from the CI (for examples)
Extending the tests with

Example tests with different deepspeed configurations
DPO with multiple GPUs (ref model on GPU 1 and active model on GPU 0

and test everything

cc @lvwerra @lewtun

HuggingFaceDocBuilderDev · 2024-01-11T06:33:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

.github/workflows/slow-tests.yml

…add-slow-tests

younesbelkada

I left some early comments @lvwerra - I would appreciate if you can have a look while I continue roughly testing everything !

younesbelkada · 2024-01-12T11:09:21Z

.github/workflows/slow-tests.yml

-      # Run only when python files are modified
-      - "trl/**.py"
-      - "examples/**.py"
+    branches: [ add-slow-tests ]


To modify after final pass

younesbelkada · 2024-01-12T11:09:59Z

.github/workflows/slow-tests.yml

-          pip install -e . --no-deps
-          pip install pytest-reportlog
-
-      - name: Run common tests on single GPU


I propose to support common tests in a follow up PR, right now many of them fail because of device mismatch problems, I think it is ok to not have them for now as these tests are run anyway on the CI

Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught.

Makefile

commands/run_dpo.sh

younesbelkada · 2024-01-12T11:12:00Z

commands/run_dpo.sh

+{ # try
+    echo $CMD
+    eval "$CMD"
+} || { # catch
+    # save log for exception 
+    echo "Operation Failed!"
+    exit 1
+}
+exit 0


The bash script will be run on the assumption that :
if the training fails it will return the exit status 1, else 0 and we use that info with the makefile command to retrieve the exit status of the previous bash command to see if the script failed or not

younesbelkada · 2024-01-12T11:12:17Z

docker/trl-latest-gpu/Dockerfile

@@ -55,7 +55,7 @@ RUN source activate trl && \
    transformers \
    accelerate \
    peft \
-    trl
+    trl[test]@git+https://github.com/huggingface/trl


It makes sense to always build TRL from source on our docker images actually

examples/accelerate_configs/deepspeed_zero1.yaml

younesbelkada · 2024-01-12T11:13:10Z

examples/scripts/dpo.py

I adapted the DPO script to make sure it supports QLoRA, I feel this feature is quite under-rated today and should be publicized much more

younesbelkada · 2024-01-12T11:13:27Z

examples/scripts/sft.py

@@ -63,6 +63,8 @@ class ScriptArguments:
    )
    save_total_limit: Optional[int] = field(default=10, metadata={"help": "Limits total number of checkpoints."})
    push_to_hub: Optional[bool] = field(default=False, metadata={"help": "Push the model to HF Hub"})
+    fp16: Optional[bool] = field(default=False, metadata={"help": "Whether to activate fp16 mixed precision"})


Those were missing before

younesbelkada · 2024-01-12T11:13:45Z

scripts/log_reports.py

This is mostly copied and adapted from the one from peft

lvwerra

Thanks a lot for working on this @younesbelkada, it's awesome! Left a few small comments. Main question about how to structure and organize the test suite a bit. At the moment I am a bit confused since the DPO/SFT tests are actually not on the test suite but on the example scripts.

lvwerra · 2024-01-15T11:08:25Z

.github/workflows/slow-tests.yml

-          pip install -e . --no-deps
-          pip install pytest-reportlog
-
-      - name: Run common tests on single GPU


Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught.

examples/accelerate_configs/deepspeed_zero1.yaml

lvwerra · 2024-01-15T11:25:07Z

examples/accelerate_configs/single_gpu.yaml

+main_training_function: main
+mixed_precision: 'bf16'
+num_machines: 1
+num_processes: 8


isn't that 8 gpus?

yes but it gets overwritten in the shell script to keep the config file untouched

lvwerra · 2024-01-15T14:36:21Z

examples/accelerate_configs/single_gpu.yaml

+gpu_ids: all
+machine_rank: 0
+main_training_function: main
+mixed_precision: 'bf16'


shouldn't that also be fp16

After thinking a bit, I think we can keep everything bf16 let me push something

lvwerra · 2024-01-15T14:44:12Z

tests/slow/testing_constants.py

+# limitations under the License.
+
+# TODO: push them under trl-org
+MODELS_TO_TEST = ["HuggingFaceM4/tiny-random-LlamaForCausalLM", "HuggingFaceM4/tiny-random-MistralForCausalLM"]


Maybe two other archs could be Mixtral and Phi

I can add Phi, however the Mixtral tinu checkpoint is quite large (~200MB) making it longer to test it :/

lvwerra · 2024-01-15T14:54:43Z

.github/workflows/slow-tests.yml

-          make slow_tests_single_gpu
+          make slow_dpo_tests


What do you think about just calling that slow_tests or slow_accelerator_tests and do both SFT and DPO together rather than separate? if we add more and more trainer to the tests makes it a bit confusing.

thinking about it more what do you think about renaming/structuring the tests into:

slow_tests: they actually run the test suite and test small self-contained parts of the code

script_tests: they run the example scripts with different options which tests end to end the pipeline

What do you think? at the moment they are a bit intertwined which i find a bit confusing.

I agree, let me push few things

I modified the make command to have a more global command that tests dpo + SFT. In the future if we want to test let's say KTO, we just need to extend that command

lvwerra

Looks good, just a few small questions!

Makefile

examples/scripts/sft.py

younesbelkada · 2024-01-17T09:17:33Z

Thanks @lvwerra for all your time reviewing this big PR !

* v1 slow tests * nit * add qlora tests for DPO * add decorator * release memory + log reports * report to none to avoid seg fault issues * update setup * fix * add exampel testing * fix nit * change temp filename * add workflow file * fix comment * add slack push script * more tests for DPO * add dpo example tests * another makefile command * fix * add paths + clean up * nit * Update slow-tests.yml * trigger tests * up * up * more fixes * fix * final fixes * minor fixes * oops * add more text * fix * more * trigger CI * up * fix * remove * run the tests on 2 GPUs only * final fix SFT * revert config files + address comments * fix * add Phi * final fixes * final fix

younesbelkada added 2 commits January 11, 2024 06:25

v1 slow tests

38b8d90

nit

b2d6f41

younesbelkada added 15 commits January 11, 2024 06:35

add qlora tests for DPO

10191f1

add decorator

62743a3

release memory + log reports

affd0db

report to none to avoid seg fault issues

51c90b3

update setup

770ba3c

fix

c524a65

add exampel testing

7d75884

fix nit

827df67

change temp filename

41a6839

add workflow file

a2dd624

fix comment

cc15412

add slack push script

4582f4f

more tests for DPO

d82983e

add dpo example tests

a0552d5

another makefile command

c06cbd7

younesbelkada commented Jan 12, 2024

View reviewed changes

.github/workflows/slow-tests.yml Outdated Show resolved Hide resolved

younesbelkada commented Jan 12, 2024

View reviewed changes

.github/workflows/slow-tests.yml Outdated Show resolved Hide resolved

younesbelkada added 2 commits January 12, 2024 08:14

fix

8b606d2

add paths + clean up

4dabf99

younesbelkada mentioned this pull request Jan 12, 2024

Add slow test workflow file #1223

Merged

younesbelkada and others added 7 commits January 12, 2024 08:27

nit

8a9a62b

Merge remote-tracking branch 'origin/main' into add-slow-tests

e9fcc94

Update slow-tests.yml

403d2cd

trigger tests

c7ddba4

Merge branch 'add-slow-tests' of https://github.com/lvwerra/trl into …

33ce5dc

…add-slow-tests

up

976fd54

up

210dcaf

younesbelkada added 2 commits January 12, 2024 10:43

minor fixes

4ea24af

oops

37d5596

younesbelkada commented Jan 12, 2024

View reviewed changes

younesbelkada added 9 commits January 12, 2024 11:17

add more text

2823ab9

fix

e16abf2

more

b2bdcb4

trigger CI

07c491a

up

83df91c

fix

b4d875e

remove

34e4ee7

run the tests on 2 GPUs only

ceedd8d

final fix SFT

01b8ac0

younesbelkada requested review from lewtun and lvwerra January 12, 2024 16:18

lvwerra reviewed Jan 15, 2024

View reviewed changes

younesbelkada added 4 commits January 15, 2024 16:05

Merge remote-tracking branch 'origin/main' into add-slow-tests

57d1401

revert config files + address comments

7f9762f

fix

573d8da

add Phi

350529e

younesbelkada requested a review from lvwerra January 15, 2024 16:25

lvwerra approved these changes Jan 16, 2024

View reviewed changes

Makefile Outdated Show resolved Hide resolved

examples/scripts/sft.py Outdated Show resolved Hide resolved

younesbelkada added 3 commits January 17, 2024 08:44

Merge remote-tracking branch 'origin/main' into add-slow-tests

f7cb79b

final fixes

cc3b430

final fix

321bba1

younesbelkada marked this pull request as ready for review January 17, 2024 09:08

younesbelkada merged commit ef209e3 into main Jan 17, 2024
9 checks passed

younesbelkada deleted the add-slow-tests branch January 17, 2024 09:17

younesbelkada mentioned this pull request Jan 17, 2024

Issue with FSDP on DPO trainer: ref_model was not placed on cuda #1147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core` / tests ] v1 slow tests #1218

[`core` / tests ] v1 slow tests #1218

younesbelkada commented Jan 11, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 11, 2024

younesbelkada left a comment

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

lvwerra Jan 15, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

lvwerra left a comment

lvwerra Jan 15, 2024

lvwerra Jan 15, 2024

younesbelkada Jan 15, 2024

lvwerra Jan 15, 2024

younesbelkada Jan 15, 2024

lvwerra Jan 15, 2024

younesbelkada Jan 15, 2024

lvwerra Jan 15, 2024

younesbelkada Jan 15, 2024

younesbelkada Jan 15, 2024

lvwerra left a comment

younesbelkada commented Jan 17, 2024

[core / tests ] v1 slow tests #1218

[core / tests ] v1 slow tests #1218

Conversation

younesbelkada commented Jan 11, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jan 11, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

younesbelkada commented Jan 17, 2024

[`core` / tests ] v1 slow tests #1218

[`core` / tests ] v1 slow tests #1218

younesbelkada commented Jan 11, 2024 •

edited

Loading