[BUG] I think TensorDict doesn't work with pin_memory in a dataloader #679

andersonbcdefg · 2024-02-18T04:17:18Z

Describe the bug

It seems like the batch size goes missing when PyTorch attempts to pin it.

To Reproduce

Use a TensorDict as the dataset (or in my case, the TensorDict is inside a more complex IterableDataset class), and feed to a PyTorch dataloader with pin_memory=True. I think this happens because the memory-pinning function tries to create a new TensorDict and doesn't pass the batch size.

ValueError: Caught ValueError in pin memory thread for device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tensordict/_td.py", line 1234, in _parse_batch_size
    return torch.Size(batch_size)
TypeError: 'NoneType' object is not iterable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/pin_memory.py", line 36, in do_one_step
    data = pin_memory(data, device)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/pin_memory.py", line 72, in pin_memory
    return type(data)([pin_memory(sample, device) for sample in data])  # type: ignore[call-arg]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/pin_memory.py", line 72, in <listcomp>
    return type(data)([pin_memory(sample, device) for sample in data])  # type: ignore[call-arg]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/pin_memory.py", line 62, in pin_memory
    return type(data)({k: pin_memory(sample, device) for k, sample in data.items()})  # type: ignore[call-arg]
  File "/usr/local/lib/python3.9/dist-packages/tensordict/_td.py", line 223, in __init__
    self._batch_size = self._parse_batch_size(source, batch_size)
  File "/usr/local/lib/python3.9/dist-packages/tensordict/_td.py", line 1240, in _parse_batch_size
    raise ValueError(
ValueError: batch size was not specified when creating the TensorDict instance and it could not be retrieved from source

Expected behavior

Pinning memory just works and doesn't cause an exception.

System info

Installed from pip, 0.3.0, used with NVIDIA A6000 and Torch 2.2, Python 3.9.16

Describe the characteristic of your environment:

Describe how the library was installed (pip, source, ...)
Python version
Versions of any other relevant libraries

import tensordict, numpy, sys, torch
print(tensordict.__version__, numpy.__version__, sys.version, sys.platform, torch.__version__)

Checklist

[x ] I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

The text was updated successfully, but these errors were encountered:

vmoens · 2024-02-18T07:29:40Z

This is somewhat similar to huggingface/accelerate#2405
There are two things we can do here: 1. You could call tensordict.pin_memory within the collate_fn and (2) PyTorch should use PyTree within the dataloader pin_memory.
I will make a PR for (2)

…memory` and `collate_fn` (#120553) For the user-defined `Mapping` type, it may contain some metadata (e.g., pytorch/tensordict#679, #120195 (comment)). Simply use `type(mapping)({k: v for k, v in mapping.items()})` do not take this metadata into account. This PR uses `copy.copy(mapping)` to create a clone of the original collection and iteratively updates the elements in the cloned collection. This preserves the metadata in the original collection via `copy.copy(...)` rather than relying on the `__init__` method in the user-defined classes. Reference: - pytorch/tensordict#679 - #120195 Closes #120195 Pull Request resolved: #120553 Approved by: https://github.com/vmoens

vmoens · 2024-03-06T13:53:18Z

This will now work on torch nightlies!

andersonbcdefg · 2024-03-06T18:14:19Z

wooo thanks vincent!

…

On Wed, Mar 6, 2024 at 5:53 AM Vincent Moens ***@***.***> wrote: This will now work on torch nightlies! — Reply to this email directly, view it on GitHub <#679 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEDJ3R4VMXDYGAJR2VVBWILYW4NVVAVCNFSM6AAAAABDN2YAWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQHEZDKMZWGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

SamGalanakis · 2024-08-06T15:08:50Z

Hello @vmoens , having a possibly related issue. I have a custom collate function that returns a TensorDict and I have pin_memory = True on the Dataloader, I am seeing this warning:

/opt/conda/lib/python3.11/site-packages/tensordict/tensorclass.py:1108: UserWarning:

The method <bound method TensorDictBase.pin_memory of TensorDict(
    fields={
    },
    batch_size=torch.Size([]),
    device=None,
    is_shared=False)> wasn't explicitly implemented for tensorclass. This fallback will be deprecated in future releases because it is inefficient and non-compilable. Please raise an issue in tensordict repo to support this method!

Any ideas?
I am using the latest nightly from: ghcr.io/pytorch/pytorch-nightly:2.5.0.dev20240806-cuda12.1-cudnn9-devel

vmoens · 2024-08-06T18:05:54Z

I'll fix that thanks for reporting

haithamkhedr · 2024-08-18T06:02:12Z

I'll fix that thanks for reporting
@vmoens Is this fixed already? Thanks

Mxbonn · 2024-09-20T15:28:48Z

I'll fix that thanks for reporting

any update on this?

vmoens · 2024-09-20T18:55:14Z

I think it is! Trying to make a release asap with this and other fixes

andersonbcdefg added the bug Something isn't working label Feb 18, 2024

andersonbcdefg assigned vmoens Feb 18, 2024

vmoens mentioned this issue Feb 19, 2024

[Refactor] Use PyTree in pin_memory pytorch/pytorch#120195

Closed

XuehaiPan mentioned this issue Feb 24, 2024

Preserve metadata for MutableMapping and MutableSequence in pin_memory and collate_fn pytorch/pytorch#120553

Closed

vmoens closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] I think TensorDict doesn't work with pin_memory in a dataloader #679

[BUG] I think TensorDict doesn't work with pin_memory in a dataloader #679

andersonbcdefg commented Feb 18, 2024

vmoens commented Feb 18, 2024

vmoens commented Mar 6, 2024

andersonbcdefg commented Mar 6, 2024 via email

SamGalanakis commented Aug 6, 2024

vmoens commented Aug 6, 2024

haithamkhedr commented Aug 18, 2024

Mxbonn commented Sep 20, 2024 •

edited

Loading

vmoens commented Sep 20, 2024

[BUG] I think TensorDict doesn't work with pin_memory in a dataloader #679

[BUG] I think TensorDict doesn't work with pin_memory in a dataloader #679

Comments

andersonbcdefg commented Feb 18, 2024

Describe the bug

To Reproduce

Expected behavior

System info

Checklist

vmoens commented Feb 18, 2024

vmoens commented Mar 6, 2024

andersonbcdefg commented Mar 6, 2024 via email

SamGalanakis commented Aug 6, 2024

vmoens commented Aug 6, 2024

haithamkhedr commented Aug 18, 2024

Mxbonn commented Sep 20, 2024 • edited Loading

vmoens commented Sep 20, 2024

Mxbonn commented Sep 20, 2024 •

edited

Loading