Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader #2340

Open
Debolena7 opened this issue Nov 10, 2024 · 3 comments
Open

Multiple Errors with PPOTrainer. error in ppo_trainer.dataloader #2340

Debolena7 opened this issue Nov 10, 2024 · 3 comments
Labels
🐛 bug Something isn't working 🏋 PPO Related to PPO

Comments

@Debolena7
Copy link

All the notebooks are giving a lot of errors. The PPOTrainer class also has a lot of other arguments that are required to be passed. for example, 'processing_class' instead of 'tokenizer', 'policy', 'ref_policy', 'reward_model', 'value_model', 'train_dataset' and NO 'optimizer'. I resolved all of these by mentioning the correct arguments. But now i am stuck at a new error:


Traceback (most recent call last):
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 782, in convert_to_tensors
    tensor = as_tensor(value)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 738, in as_tensor
    return torch.tensor(value)
ValueError: too many dimensions 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/u/student/2020/ai20resch11003/RLHF_new/hf_example.py", line 235, in <module>
    for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader)):
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/accelerate/data_loader.py", line 552, in __iter__
    current_batch = next(dataloader_iter)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/data/data_collator.py", line 271, in __call__
    batch = pad_without_fast_tokenizer_warning(
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/data/data_collator.py", line 66, in pad_without_fast_tokenizer_warning
    padded = tokenizer.pad(*pad_args, **pad_kwargs)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3548, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 240, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/u/student/2020/ai20resch11003/miniconda3/envs/rlhf_new/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 798, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`filename` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

I am trying this notebook: https://github.com/huggingface/trl/blob/main/examples/research_projects/toxicity/scripts/gpt-j-6b-toxicity.py

if these notebooks are outdated, please mention the correct 'trl' and 'transformers' package versions that are supposed to be installed before using these. that would help a lot.

please help :(

@qgallouedec
Copy link
Member

qgallouedec commented Nov 10, 2024

We know that a lot of notebooks/docs are outdated. Sorry for the inconvenience.
It was a deliberate choice that has allowed us to move faster on the lib evolution. For more information, see #2174 (comment). But you can be sure that it will soon be completely up to date.
Most doc and notebooks should work with trl==0.11
I agree with you that the notebooks should mention it. Feel free to open a PR it that sense if you wan't to contribute

@qgallouedec qgallouedec added 🐛 bug Something isn't working 🏋 PPO Related to PPO labels Nov 10, 2024
@Debolena7
Copy link
Author

Debolena7 commented Nov 10, 2024

Thank you so much for your prompt reply. Changing the package trl version resolved the errors. I have been trying several code examples of rlhf from huggingface and also from youtube for a week now, and all had multiple issues. Was stuck for so many days. Thanks again..

@Mrinh212375
Copy link

Mrinh212375 commented Nov 14, 2024

@Debolena7 @qgallouedec ...

config = PPOConfig(
    #model_name="google/gemma-2-2b-it",
    learning_rate=1.41e-5,
    mini_batch_size=5,
    batch_size=20,
    output_dir='/kaggle/working/'
)

ppo_trainer = PPOTrainer(config=config,
                         processing_class = 'PreTrainedTokenizerBase' ,
                         policy = model,
                         ref_policy = ref_model,
                         reward_model = rm_model,
                         #tokenizer=tokenizer,
                         train_dataset=ppo_training_dataset,
                         data_collator=collator)

when I'm trying to run the above code snippet, I'm getting the following error -

image
How to pass the module from the HF preTrainedWrapper class ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 PPO Related to PPO
Projects
None yet
Development

No branches or pull requests

3 participants