Python Backend support implicit state management for Sequence Inference #8006

zhuichao001 · 2025-02-12T07:47:14Z

Is your feature request related to a problem? Please describe.
The current Python backend does not yet support sequence inference. When will this be supported?

Why we need this feature
We are from antgroup ai-infra. The majority of our models are implemented using the Python backend. Currently, there is a critical demand for real-time streaming services, such as speech-to-text conversion. After reviewing the Triton community documentation, we found that the server framework, along with backends like ONNX Runtime, Pytorch and TensorRT, already supports this feature. However, it is surprising that the Python backend, which is the most widely used in our business, has not yet implemented this functionality. We are curious to know if the product team has any plans to address this in the future.

Future benefits:
If this feature is also supported on the Python backend, our vLLM will leverage sequence inference to achieve further performance optimizations.

zhuichao001 changed the title ~~Python Backend support Sequence Inference~~ Python Backend support implicit state management for Sequence Inference Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Backend support implicit state management for Sequence Inference #8006

Python Backend support implicit state management for Sequence Inference #8006

zhuichao001 commented Feb 12, 2025 •

edited

Loading

Python Backend support implicit state management for Sequence Inference #8006

Python Backend support implicit state management for Sequence Inference #8006

Comments

zhuichao001 commented Feb 12, 2025 • edited Loading

zhuichao001 commented Feb 12, 2025 •

edited

Loading