Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core / tests ] v1 slow tests #1218

Merged
merged 47 commits into from
Jan 17, 2024
Merged

[core / tests ] v1 slow tests #1218

merged 47 commits into from
Jan 17, 2024

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Jan 11, 2024

What does this PR do?

This PR makes the current CI stronger and more robust by introducing GPU slow tests in order to catch some issues such as #1216 before each release (happened twice this week!). The goal is to run as many tests as possible to cover almost all usecases of most used classes.

I'll progressively update the PR by

  • Adding the correct workflow file

  • Adding the Slack messaging system for getting feedback from the CI (for examples)

  • Extending the tests with

  • Example tests with different deepspeed configurations
  • DPO with multiple GPUs (ref model on GPU 1 and active model on GPU 0

and test everything

cc @lvwerra @lewtun

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor Author

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some early comments @lvwerra - I would appreciate if you can have a look while I continue roughly testing everything !

# Run only when python files are modified
- "trl/**.py"
- "examples/**.py"
branches: [ add-slow-tests ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To modify after final pass

pip install -e . --no-deps
pip install pytest-reportlog

- name: Run common tests on single GPU
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to support common tests in a follow up PR, right now many of them fail because of device mismatch problems, I think it is ok to not have them for now as these tests are run anyway on the CI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught.

Makefile Outdated Show resolved Hide resolved
commands/run_dpo.sh Outdated Show resolved Hide resolved
Comment on lines +48 to +56
{ # try
echo $CMD
eval "$CMD"
} || { # catch
# save log for exception
echo "Operation Failed!"
exit 1
}
exit 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bash script will be run on the assumption that :
if the training fails it will return the exit status 1, else 0 and we use that info with the makefile command to retrieve the exit status of the previous bash command to see if the script failed or not

@@ -55,7 +55,7 @@ RUN source activate trl && \
transformers \
accelerate \
peft \
trl
trl[test]@git+https://github.com/huggingface/trl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to always build TRL from source on our docker images actually

examples/accelerate_configs/deepspeed_zero1.yaml Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the DPO script to make sure it supports QLoRA, I feel this feature is quite under-rated today and should be publicized much more

@@ -63,6 +63,8 @@ class ScriptArguments:
)
save_total_limit: Optional[int] = field(default=10, metadata={"help": "Limits total number of checkpoints."})
push_to_hub: Optional[bool] = field(default=False, metadata={"help": "Push the model to HF Hub"})
fp16: Optional[bool] = field(default=False, metadata={"help": "Whether to activate fp16 mixed precision"})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were missing before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly copied and adapted from the one from peft

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this @younesbelkada, it's awesome! Left a few small comments. Main question about how to structure and organize the test suite a bit. At the moment I am a bit confused since the DPO/SFT tests are actually not on the test suite but on the example scripts.

pip install -e . --no-deps
pip install pytest-reportlog

- name: Run common tests on single GPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that would indeed be a nice addition, since we had issues in the past with device mismatch which this would have caught.

examples/accelerate_configs/deepspeed_zero1.yaml Outdated Show resolved Hide resolved
main_training_function: main
mixed_precision: 'bf16'
num_machines: 1
num_processes: 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that 8 gpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it gets overwritten in the shell script to keep the config file untouched

gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: 'bf16'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that also be fp16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit, I think we can keep everything bf16 let me push something

# limitations under the License.

# TODO: push them under trl-org
MODELS_TO_TEST = ["HuggingFaceM4/tiny-random-LlamaForCausalLM", "HuggingFaceM4/tiny-random-MistralForCausalLM"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe two other archs could be Mixtral and Phi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add Phi, however the Mixtral tinu checkpoint is quite large (~200MB) making it longer to test it :/

Comment on lines 48 to 46
make slow_tests_single_gpu
make slow_dpo_tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about just calling that slow_tests or slow_accelerator_tests and do both SFT and DPO together rather than separate? if we add more and more trainer to the tests makes it a bit confusing.

thinking about it more what do you think about renaming/structuring the tests into:

  • slow_tests: they actually run the test suite and test small self-contained parts of the code
  • script_tests: they run the example scripts with different options which tests end to end the pipeline

What do you think? at the moment they are a bit intertwined which i find a bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let me push few things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the make command to have a more global command that tests dpo + SFT. In the future if we want to test let's say KTO, we just need to extend that command

@younesbelkada younesbelkada requested a review from lvwerra January 15, 2024 16:25
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small questions!

Makefile Outdated Show resolved Hide resolved
examples/scripts/sft.py Outdated Show resolved Hide resolved
@younesbelkada younesbelkada marked this pull request as ready for review January 17, 2024 09:08
@younesbelkada
Copy link
Contributor Author

Thanks @lvwerra for all your time reviewing this big PR !

@younesbelkada younesbelkada merged commit ef209e3 into main Jan 17, 2024
9 checks passed
@younesbelkada younesbelkada deleted the add-slow-tests branch January 17, 2024 09:17
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* v1 slow tests

* nit

* add qlora tests for DPO

* add decorator

* release memory + log reports

* report to none to avoid seg fault issues

* update setup

* fix

* add exampel testing

* fix nit

* change temp filename

* add workflow file

* fix comment

* add slack push script

* more tests for DPO

* add dpo example tests

* another makefile command

* fix

* add paths + clean up

* nit

* Update slow-tests.yml

* trigger tests

* up

* up

* more fixes

* fix

* final fixes

* minor fixes

* oops

* add more text

* fix

* more

* trigger CI

* up

* fix

* remove

* run the tests on 2 GPUs only

* final fix SFT

* revert config files + address comments

* fix

* add Phi

* final fixes

* final fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants