Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid combnation of arguments related to empty batch #676

Open
SoumiDas opened this issue Sep 21, 2024 · 3 comments
Open

Invalid combnation of arguments related to empty batch #676

SoumiDas opened this issue Sep 21, 2024 · 3 comments

Comments

@SoumiDas
Copy link

Hi,

I have been trying doing a DP based finetuning on a dataset using Pythia 1B model. I receive the following error at epoch 5 when I Increase the dataset size to around 1000.

TypeError: zeros() received an invalid combination of arguments - got (tuple, dtype=type), but expected one of:

  • (tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
  • (tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

This is arising from line 60-61 of opacus/data_loader.py which checks if len(batch) > 0 and tries to collate. Where am I going wrong or what can be the workaround to it?

Please help!

P.S. The configrations I use are number of epochs = 5, training set = 1000, batch size 8, and I am using BatchManager with max_physical_batch_size as 8.

@EnayatUllah
Copy link
Contributor

Are you using Poisson subsampling ? Also, can you use the bug report Colab so that we look at your code and reproduce the issue?

@kanchanchy
Copy link

kanchanchy commented Oct 13, 2024

@SoumiDas Looks like a problem with batch size although it looks very strange. I was facing the exact same issue with batch size 8. Later, I changed the batch size to 12 and the problem was resolved.

@EnayatUllah
Copy link
Contributor

@kanchanchy, thanks for sharing this! I tried to reproduce the issue with the setting mentioned in the post: number of epochs = 5, training set = 1000, batch size = max_physical_batch_size = 8, with BatchMemoryManager (with noise_multiplier = 0.1 and max_grad_norm =1.0), on a toy dataset and model, but I wasn't able to.

Its possible that I am missing something so it would be great if one of you (@kanchanchy or @SoumiDas) could reproduce this in the bug report Colab, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants