Remove tyro #1176

vwxyzjn · 2024-01-04T15:07:06Z

The command to run is simply python examples/scripts/ppo.py --log_with wandb. There is no evidence of significant regression in the new refactor, though the learning curves appear less smooth.

This was the learning curve before

HuggingFaceDocBuilderDev · 2024-01-04T15:11:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

Thanks a lot for working on this @vwxyzjn ! I think it all looks good, modulo a question I have about ppo_config.py: shouldn't we also be removing the tyro import and related bits about tyro.conf.Suppress?

e.g. here https://github.com/huggingface/trl/pull/1176/files#diff-070022da4e4a14f0782200ebeab47af78224f10933e97853a0475d26007d4a1eR22-R31

benchmark/trl.slurm_template

lewtun · 2024-01-08T13:19:43Z

examples/scripts/ppo.py

    trust_remote_code: bool = field(default=False, metadata={"help": "Enable `trust_remote_code`"})

+    # LoraConfig
+    use_peft: bool = field(default=False, metadata={"help": "whether to use peft"})
+    lora_alpha: Optional[float] = field(default=16, metadata={"help": "the lora alpha parameter"})


Not for this PR, but I think we may want to have something like a ModelArguments or ModelConfig class that collects all hyperparams associated with model loading since this tends to be pretty similar across SFT/RM/DPO etc

lewtun · 2024-01-08T13:20:54Z

examples/scripts/reward_modeling.py

-        ),
+    model_name: str = field(default="facebook/opt-350m", metadata={"help": "the model name"})
+    dataset_name: str = field(default="Anthropic/hh-rlhf", metadata={"help": "the dataset name"})
+    dataset_text_field: str = field(default="text", metadata={"help": "the text field of the dataset"})


Not for this PR, but I think we should refactor the RewardTrainer to be similar to the SFTTrainer so that all these args can be captured in RewardConfig and the preprocessing done on the fly

examples/scripts/ddpo.py

examples/scripts/reward_modeling.py

vwxyzjn · 2024-01-09T13:47:00Z

examples/scripts/ddpo.py

+    ddpo_config.project_kwargs = {
+        "logging_dir": "./logs",
+        "automatic_checkpoint_naming": True,
+        "total_limit": 5,
+        "project_dir": "./save",
+    }


required. Otherwise errors out.

vwxyzjn · 2024-01-09T13:47:17Z

examples/scripts/reward_modeling.py

-args.reward_config.evaluation_strategy = "steps" if args.eval_split != "none" else "no"
+parser = HfArgumentParser((ScriptArguments, RewardConfig))
+args, reward_config = parser.parse_args_into_dataclasses()
+reward_config.gradient_checkpointing_kwargs = dict(use_reentrant=False)


required. otherwise errors out.

vwxyzjn · 2024-01-09T14:00:12Z

Scripts seem to run fine :)

sft.py: https://wandb.ai/costa-huang/huggingface/runs/2j2k3r64
reward_modeling.py: https://wandb.ai/costa-huang/huggingface/runs/yoedblvi
ddpo.py: https://wandb.ai/costa-huang/stable_diffusion_training/runs/lcdaz49b

vwxyzjn · 2024-01-09T14:07:32Z

Also removed duplicate / slightly different doc strings to make things more aligned.

vwxyzjn · 2024-01-09T14:31:42Z

Not necessarily in this PR but we can also do

parser = HfArgumentParser((TrainingArguments, BitsAndBytesConfig, LoraConfig))

but currently

parser = HfArgumentParser((LoraConfig))

errors out. The reason is as follows @younesbelkada. So not sure if we need to change peft or HfArgumentParser.

from dataclasses import dataclass, field
from typing import Literal
from transformers import HfArgumentParser

@dataclass
class TestConfig: 
    init_lora_weights: bool | Literal["gaussian", "loftq"] = field(
        default=True,
        metadata={
            "help": (
                "How to initialize the weights of the LoRA layers. Passing True (default) results in the default "
                "initialization from the reference implementation from Microsoft. Passing 'gaussian' results "
                "in Gaussian initialization scaled by the LoRA rank for linear and layers. Setting the initialization "
                "to False leads to completely random initialization and is discouraged."
                "Pass `'loftq'` to use LoftQ initialization"
            ),
        },
    )
parser = HfArgumentParser((TestConfig))
args = parser.parse_args_into_dataclasses()

python g.py --help
Traceback (most recent call last):
  File "/fsx/costa/trl/g.py", line 21, in <module>
    parser = HfArgumentParser((TestConfig))
  File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 136, in __init__
    self._add_dataclass_arguments(dtype)
  File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 263, in _add_dataclass_arguments
    self._parse_dataclass_field(parser, field)
  File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 159, in _parse_dataclass_field
    raise ValueError(
ValueError: Only `Union[X, NoneType]` (i.e., `Optional[X]`) is allowed for `Union` because the argument parser only supports one type per argument. Problem encountered in field 'init_lora_weights'.

lewtun · 2024-01-10T04:02:25Z

Not necessarily in this PR but we can also do

parser = HfArgumentParser((TrainingArguments, BitsAndBytesConfig, LoraConfig))

but currently

parser = HfArgumentParser((LoraConfig))

errors out. The reason is as follows @younesbelkada. So not sure if we need to change peft or HfArgumentParser.

from dataclasses import dataclass, field
from typing import Literal
from transformers import HfArgumentParser

@dataclass
class TestConfig: 
    init_lora_weights: bool | Literal["gaussian", "loftq"] = field(
        default=True,
        metadata={
            "help": (
                "How to initialize the weights of the LoRA layers. Passing True (default) results in the default "
                "initialization from the reference implementation from Microsoft. Passing 'gaussian' results "
                "in Gaussian initialization scaled by the LoRA rank for linear and layers. Setting the initialization "
                "to False leads to completely random initialization and is discouraged."
                "Pass `'loftq'` to use LoftQ initialization"
            ),
        },
    )
parser = HfArgumentParser((TestConfig))
args = parser.parse_args_into_dataclasses()

python g.py --help
Traceback (most recent call last):
 File "/fsx/costa/trl/g.py", line 21, in <module>
   parser = HfArgumentParser((TestConfig))
 File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 136, in __init__
   self._add_dataclass_arguments(dtype)
 File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 263, in _add_dataclass_arguments
   self._parse_dataclass_field(parser, field)
 File "/fsx/costa/pyenv/versions/mambaforge-22.9.0-3/envs/trl-310/lib/python3.10/site-packages/transformers/hf_argparser.py", line 159, in _parse_dataclass_field
   raise ValueError(
ValueError: Only `Union[X, NoneType]` (i.e., `Optional[X]`) is allowed for `Union` because the argument parser only supports one type per argument. Problem encountered in field 'init_lora_weights'.

If I'm not mistaken, we can encapsulate the BnB and LoRA configs into a single ModelConfig class that contains everything associated with model loading. This is what we do in the handbook here, although admittedly it only covers the args we found most useful. This way we can reuse the same config class across all the scripts (except possibly PPO which requires one to also specify the reward model kwargs)

younesbelkada · 2024-01-10T07:32:58Z

@vwxyzjn hmmm indeed maybe the fix for that specific problem should go in PEFT, maybe as a quick workaround we can go for @lewtun 's solution?

encapsulate the BnB and LoRA configs into a single ModelConfig class that contains everything associated with model loading

vwxyzjn · 2024-01-10T15:24:36Z

@lewtun that's a great idea. Let me refactor things using ModelArguments then!

vwxyzjn · 2024-01-11T14:32:44Z

So everything looks good, maybe except the DPO train accuracies. It's approaching 100% which seems a bit off because reward_modeling.py has like 65% accuracy on the same dataset...

lewtun

Great job adding the ModelConfig @vwxyzjn ! This is looking really good and I left a few small comments on things we can remove / tidy up

Regarding the DPO training curves, the better metric to track for overfitting is the train/val loss IMO but in general DPO does overfit quite quickly and this turns out not to matter much in practice

trl/trainer/model_config.py

examples/scripts/dpo.py

Co-authored-by: lewtun <[email protected]>

vwxyzjn · 2024-01-24T17:33:56Z

@younesbelkada there were some merge conflicts in dpo.py such as

With this PR we can keep the code as is: the users can do --bf16 if they want to and it will be picked up by the TrainingArguments :)

vwxyzjn · 2024-01-24T18:06:07Z

The CI failed but it seems unrelated to the change in this PR. See #1273

younesbelkada

Very nice clean up @vwxyzjn ! Thanks a lot for the refactor - just as a sanity check before merging, could you run make test_examples to make sure the example scripts will not fail on the GPU CI? Only DPO with DS-1 , 2 & 3 should fail but all other configurations should pass

trl/trainer/utils.py

younesbelkada · 2024-01-26T09:36:53Z

docs/source/sft_trainer.mdx

@@ -426,13 +426,13 @@ To use Flash Attention 2, first install the latest `flash-attn` package:
 pip install -U flash-attn
 ```

-And add `use_flash_attention_2=True` when calling `from_pretrained`:
+And add `attn_implementation="flash_attention_2"` when calling `from_pretrained`:


Could you add a documentation section here to explain what ModelConfig does and how to use it together with the other utility methods that you have exposed on main init? 🙏

trl/trainer/model_config.py

younesbelkada · 2024-01-26T09:40:10Z

examples/scripts/sft.py

+
+
+if __name__ == "__main__":
+    parser = HfArgumentParser((ScriptArguments, TrainingArguments, ModelConfig))


Just to confirm - --bf16 is passed directly through the terminal command and parsed thanks to the HfArgumentParser?

that's correct.

Co-authored-by: Younes Belkada <[email protected]>

Co-authored-by: lewtun <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

vwxyzjn · 2024-01-26T15:13:37Z

Thanks @younesbelkada for the comment! make test_examples seems to launch 4 experiments and 2 of them failed with

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
[2024-01-26 14:57:32,456] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2212929 closing signal SIGTERM

Is this expected?

younesbelkada

Huge work @vwxyzjn ! Thanks so much for this refactor !

* refactor * Remove tyro in `ppo.py` * quick update * update default args * quick push * precommit * refactor * quick change * remove tyro * quick change * precommit * quick change * fix hello_world * remove docstring diffences * add `module load cuda/12.1` * push changes * precommit * make dpo runnable * fix circular import * quick fix * refactor * quick update * path change * update plots * fix docs * quick change * Update trl/trainer/model_config.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/model_config.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/utils.py Co-authored-by: lewtun <[email protected]> * Update examples/scripts/dpo.py Co-authored-by: lewtun <[email protected]> * address comments. use attn_implementation * precommit * remove duplicate code * update peft.py * fix test no op dep * Update trl/trainer/utils.py Co-authored-by: Younes Belkada <[email protected]> * Apply suggestions from code review Co-authored-by: lewtun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> * precommit * add docs --------- Co-authored-by: lewtun <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

vwxyzjn added 6 commits December 8, 2023 20:40

refactor

c88e5f6

Remove tyro in ppo.py

0ebeb38

quick update

527d96f

update default args

789108a

Merge branch 'main' into refactor3

e7f9580

quick push

d7e66c3

vwxyzjn requested a review from lewtun January 4, 2024 15:07

precommit

fa2a4c8

refactor

67a8f53

lewtun reviewed Jan 8, 2024

View reviewed changes

vwxyzjn added 5 commits January 9, 2024 13:15

quick change

fe3f766

remove tyro

34ea7d7

quick change

af724f4

precommit

800b9fd

quick change

c257d85

vwxyzjn commented Jan 9, 2024

View reviewed changes

fix hello_world

7960804

remove docstring diffences

782fdd3

add module load cuda/12.1

3c1f3a0

vwxyzjn requested a review from lvwerra January 9, 2024 16:01

vwxyzjn changed the title ~~Remove tyro from PPO~~ Remove tyro Jan 10, 2024

vwxyzjn added 2 commits January 10, 2024 19:48

push changes

da745c8

precommit

61506ff

path change

0022af6

lvwerra requested a review from younesbelkada January 11, 2024 12:04

update plots

cca6940

vwxyzjn added 3 commits January 11, 2024 09:33

Merge branch 'main' into refactor3

7b5e7b2

fix docs

bfde066

quick change

a6addfd

lewtun reviewed Jan 11, 2024

View reviewed changes

vwxyzjn and others added 8 commits January 12, 2024 09:48

Update trl/trainer/model_config.py

040e32d

Co-authored-by: lewtun <[email protected]>

Update trl/trainer/model_config.py

348faf0

Co-authored-by: lewtun <[email protected]>

Update trl/trainer/utils.py

e646352

Co-authored-by: lewtun <[email protected]>

Update examples/scripts/dpo.py

02e3f3c

Co-authored-by: lewtun <[email protected]>

Merge branch 'main' into refactor3

f4d0840

address comments. use attn_implementation

d355e38

precommit

87fe921

remove duplicate code

bc1a7aa

vwxyzjn added 2 commits January 24, 2024 17:35

update peft.py

161a102

fix test no op dep

884aac1

younesbelkada reviewed Jan 26, 2024

View reviewed changes

vwxyzjn and others added 4 commits January 26, 2024 06:45

Update trl/trainer/utils.py

5a496f7

Co-authored-by: Younes Belkada <[email protected]>

Apply suggestions from code review

bd5ab8c

Co-authored-by: lewtun <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

precommit

455157e

add docs

74834d2

younesbelkada approved these changes Jan 26, 2024

View reviewed changes

vwxyzjn merged commit 9a71e67 into huggingface:main Jan 26, 2024
6 of 9 checks passed

younesbelkada mentioned this pull request Feb 8, 2024

Kbit_device_map import error. DPO #1319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove tyro #1176

Remove tyro #1176

vwxyzjn commented Jan 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 4, 2024

lewtun left a comment

lewtun Jan 8, 2024

lewtun Jan 8, 2024

vwxyzjn Jan 9, 2024

vwxyzjn Jan 9, 2024

vwxyzjn commented Jan 9, 2024 •

edited

Loading

vwxyzjn commented Jan 9, 2024

vwxyzjn commented Jan 9, 2024

lewtun commented Jan 10, 2024

younesbelkada commented Jan 10, 2024

vwxyzjn commented Jan 10, 2024

vwxyzjn commented Jan 11, 2024 •

edited

Loading

lewtun left a comment

vwxyzjn commented Jan 24, 2024

vwxyzjn commented Jan 24, 2024

younesbelkada left a comment

younesbelkada Jan 26, 2024

vwxyzjn Jan 26, 2024

younesbelkada Jan 26, 2024

vwxyzjn Jan 26, 2024

vwxyzjn commented Jan 26, 2024

younesbelkada left a comment



		if __name__ == "__main__":
		parser = HfArgumentParser((ScriptArguments, TrainingArguments, ModelConfig))

Remove tyro #1176

Remove tyro #1176

Conversation

vwxyzjn commented Jan 4, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 4, 2024

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vwxyzjn commented Jan 9, 2024 • edited Loading

vwxyzjn commented Jan 9, 2024

vwxyzjn commented Jan 9, 2024

lewtun commented Jan 10, 2024

younesbelkada commented Jan 10, 2024

vwxyzjn commented Jan 10, 2024

vwxyzjn commented Jan 11, 2024 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

vwxyzjn commented Jan 24, 2024

vwxyzjn commented Jan 24, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vwxyzjn commented Jan 26, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

vwxyzjn commented Jan 4, 2024 •

edited

Loading

vwxyzjn commented Jan 9, 2024 •

edited

Loading

vwxyzjn commented Jan 11, 2024 •

edited

Loading