This PR introduces an engine
system to swap Whisper engines between faster-whisper
and TensorRT-LLM
.
API
- MIT License!
- Added the ability to swap the Whipser "engine" from the default faster-whisper to TensorRT-LLM, which is much faster. #285
- Added support for distil models like
distil-large-v2
anddistil-large-v3
. These work with the TensorRT-LLM engine. - Added a
batch_size
parameter for the endpoints. It doesn't do anything yet, but the TensorRT-LLM engine supports batch processing of files, and the idea is add this feature along with dynamic batch. - Overall tighter control over dependencies, and various dependency updates.
Diarization
- Started work implementing Nvidia NeMo's new long-form diarization class. Currently it still consumes too much memory.
Documentation
Thanks to contributors @aleksandr-smechov and for the work from the NeMo team, and the WhisperS2T project for the initial code for the TensorRT-LLM backend, and by extension, TensorRT-LLM's Whisper example.