v0.5.3

Latest

Latest

aleksandr-smechov released this 01 Apr 21:21

· 36 commits to main since this release

This PR introduces an engine system to swap Whisper engines between faster-whisper and TensorRT-LLM.

API

MIT License!
Added the ability to swap the Whipser "engine" from the default faster-whisper to TensorRT-LLM, which is much faster. #285
Added support for distil models like distil-large-v2 and distil-large-v3. These work with the TensorRT-LLM engine.
Added a batch_size parameter for the endpoints. It doesn't do anything yet, but the TensorRT-LLM engine supports batch processing of files, and the idea is add this feature along with dynamic batch.
Overall tighter control over dependencies, and various dependency updates.

Diarization

Started work implementing Nvidia NeMo's new long-form diarization class. Currently it still consumes too much memory.

Documentation

Add examples for offline models for various backends #288 #289

Thanks to contributors @aleksandr-smechov and for the work from the NeMo team, and the WhisperS2T project for the initial code for the TensorRT-LLM backend, and by extension, TensorRT-LLM's Whisper example.

Contributors

aleksandr-smechov

Assets 2