You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When I am using a PipelineModule in a ray trainer after deepspeed.initialize, I always encounter the runtime error RuntimeError: 0 active drivers ([]). There should only be one.. But when I can checking the driver and gpu status on the same process, the driver is there:
Triton Information
What version of Triton are you using?
3.2.0
The full suffix i can see is triton-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Are you using the Triton container or did you build it yourself?
The triton was installed by installing deepspeed 0.14.4
To Reproduce
Steps to reproduce the behavior.
Start a ray cluster, run the forward path in the training step and the error shows up.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
The original model I am using is the pytorch version of Gemma 2 27b model in (https://github.com/google/gemma_pytorch/tree/main).
I wrapped it with my Casual definition and a "piped" version of
Description
When I am using a
PipelineModule
in a ray trainer afterdeepspeed.initialize
, I always encounter the runtime errorRuntimeError: 0 active drivers ([]). There should only be one.
. But when I can checking the driver and gpu status on the same process, the driver is there:Triton Information
What version of Triton are you using?
3.2.0
The full suffix i can see is triton-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Are you using the Triton container or did you build it yourself?
The triton was installed by installing deepspeed 0.14.4
To Reproduce
Steps to reproduce the behavior.
Start a ray cluster, run the forward path in the training step and the error shows up.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
The original model I am using is the pytorch version of Gemma 2 27b model in (https://github.com/google/gemma_pytorch/tree/main).
I wrapped it with my Casual definition and a "piped" version of
Then I created a piped module with:
Expected behavior
A clear and concise description of what you expected to happen.
The model should go through and finish the forward pass
The text was updated successfully, but these errors were encountered: