You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While working on #490, I found that if I have bitsandbytes installed in a GPU-enabled environment, I get an error when running test_adaptive_compression, which happens to be the only test that uses TrainingAverager under the hood.
I dug into it a bit, and the failure seems to be caused by CUDA error: initialization error from PyTorch, which AFAIK emerges when we're trying to initialize the CUDA context twice. More specifically, it appears when we are trying to initialize the optimizer states in TrainingAverager. My guess is that the context is created when importing bitsandbytes first and then when using something (anything?) from GPU-enabled PyTorch later. We are sunsetting the support for TrainingAverager anyway, but to me it's not obvious how to correctly migrate from this class in a given test.
To Reproduce
Install the environment in a GPU-enabled system, try running CUDA_LAUNCH_BLOCKING=1 pytest -s --full-trace tests/test_compression.py. Then uninstall bitsandbytes, comment out the parts in test_compression that rely on it (mostly test_tensor_compression), run the same command.
Describe the bug
While working on #490, I found that if I have bitsandbytes installed in a GPU-enabled environment, I get an error when running test_adaptive_compression, which happens to be the only test that uses TrainingAverager under the hood.
I dug into it a bit, and the failure seems to be caused by
CUDA error: initialization error
from PyTorch, which AFAIK emerges when we're trying to initialize the CUDA context twice. More specifically, it appears when we are trying to initialize the optimizer states in TrainingAverager. My guess is that the context is created when importing bitsandbytes first and then when using something (anything?) from GPU-enabled PyTorch later. We are sunsetting the support for TrainingAverager anyway, but to me it's not obvious how to correctly migrate from this class in a given test.To Reproduce
Install the environment in a GPU-enabled system, try running
CUDA_LAUNCH_BLOCKING=1 pytest -s --full-trace tests/test_compression.py
. Then uninstall bitsandbytes, comment out the parts in test_compression that rely on it (mostlytest_tensor_compression
), run the same command.Environment
The text was updated successfully, but these errors were encountered: