Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails when the last batch has only one sample. #15

Open
glicerico opened this issue Feb 12, 2021 · 2 comments
Open

Training fails when the last batch has only one sample. #15

glicerico opened this issue Feb 12, 2021 · 2 comments

Comments

@glicerico
Copy link
Contributor

If the reminder of the size of the training/validation/test over the batch size is 1.
In my usecase, the validation set has 18753 elements, so using a batch size of 16 leaves only one element in the lsat batch, and the following error occurs:

Epoch 0: 100%|█████████▉| 12208/12209 [1:55:15<00:00,  1.77it/s, loss=1.960, v_num=gsq1]Traceback (most recent call last):                                                   
  File "main.py", line 53, in <module> [04:24<00:00,  4.44it/s]                                                                                                              
    trainer.fit(model)                                                                                                                                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit                                          
    results = self.accelerator_backend.train()                                                                                                                               
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train                            
    results = self.train_or_test()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test                        
    results = self.trainer.train()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 493, in train                                        
    self.train_loop.run_training_epoch()           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 589, in run_training_epoch
    self.trainer.run_evaluation(test_mode=False)                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)         
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)                                                                                                 [52/1614]
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 87, in validation_step
    output = self.__validation_step(args)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 95, in __validation_step
    output = self.trainer.model.validation_step(*args)
  File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 82, in validation_step
    loss = F.cross_entropy(logits, targets)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2260, in nll_loss
    if input.size(0) != target.size(0):
IndexError: dimension specified as 0 but tensor has no dimensions
@glicerico
Copy link
Contributor Author

A workaround is to remove one element from my dataset.
A more complete solution is to ask the dataloader to ignore the last batch if it's not complete (as explained here). It shouldn't hurt too much in large datasets, but may not be convenient with small ones.

@glicerico
Copy link
Contributor Author

Proposing a solution in PR #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant