01 Apr 21:21

aleksandr-smechov

v0.5.3

419278c

v0.5.3 Latest

Latest

This PR introduces an engine system to swap Whisper engines between faster-whisper and TensorRT-LLM.

API

MIT License!
Added the ability to swap the Whipser "engine" from the default faster-whisper to TensorRT-LLM, which is much faster. #285
Added support for distil models like distil-large-v2 and distil-large-v3. These work with the TensorRT-LLM engine.
Added a batch_size parameter for the endpoints. It doesn't do anything yet, but the TensorRT-LLM engine supports batch processing of files, and the idea is add this feature along with dynamic batch.
Overall tighter control over dependencies, and various dependency updates.

Diarization

Started work implementing Nvidia NeMo's new long-form diarization class. Currently it still consumes too much memory.

Documentation

Add examples for offline models for various backends #288 #289

Thanks to contributors @aleksandr-smechov and for the work from the NeMo team, and the WhisperS2T project for the initial code for the TensorRT-LLM backend, and by extension, TensorRT-LLM's Whisper example.

Contributors

aleksandr-smechov

Assets 2

12 Oct 12:19

chainyo

v0.5.2

8c967bd

v0.5.2

This PR introduces a lot of things to allow remote execution and single-service deployment.

API

Added the possibility to make RemoteExecution or LocalExecution for transcription and diarization services #258
Implemented single-service deployment with the new only_transcription and only_diarization asr types #261
Added new endpoints to manage remote execution servers #263
Allow the user to auto-switch between local and remote execution for all services #266

Diarization

Adjusted VAD speech padding and updated the diarization logic #271

Bug and Fixes

Fixed a newly introduced dual_channel bug #268
Added a fix to avoid empty utterances #273

Documentation

Added new documentation to the project via mkdocs-material and GitHub pages #269

Thanks to contributors @aleksandr-smechov @chainyo

Contributors

aleksandr-smechov and chainyo

Assets 2

25 Sep 10:50

chainyo

v0.5.1

bcc7e98

v0.5.1

API

Added the offset_start and offset_end parameters to the API
Add Live transcription #241 #247

Transcription

Updated dual_channel to multi_channel with auto-detection feature #239 #244 #248
Let faster-whisper handle the model path for the custom model hosted on HF #251

Bugs and Fixes

Fixed the python 3.8 compat #235
Fixed the hms format for timestamps #237
Fixed the extra_languages config #235

Assets 2

01 Sep 16:10

chainyo

v0.5.0

7e08cc9

v0.5.0

This release is a significant change from poetry to hatch with many improvements to CI, tests, local development, and dependencies handling.

Huge project updates #227
Updated tests, dependencies, config defaults #229

API

Added a warmup for inference #201
Added repetion_penalty parameter #207
Added num_speakers parameter #195
Improved the time_and_tell function #213
Updated the API schemas #188
Added transcription parameters for control #213

Transcription

Added bfloat16 to compute types #209

Diarization

Added empty audio catch during diarization #223 #225
Reimplemented the entire diarization module to skip NeMo module installation #186 #202

CI

Added concurrency on CI tests #191

Contributors:
@aleksandr-smechov @chainyo

Contributors

aleksandr-smechov and chainyo

Assets 2

02 Aug 13:10

chainyo

v0.4.0

ad689f4

v0.4.0

This release includes a lot of improvements and a new License starting with the v0.4.0 of wordcab-transcribe (inspired by the HFOIL).

The new License WTLv0.1

The new License prevents anyone from using this project after v0.4.0 (included) to sell a self-hosted version of this software without any agreements from Wordcab.

But you can still use the project for research, personal use, or even as a backend tool for your projects.

API

Fixed CortexResponse for Svix size limit #101
Made alignment non-critical if the process fails #105
Added multi-GPU support for transcription, alignment, and diarization #114
Added the audio_duration (in seconds) in the API response #127
Added a catch for invalid or empty audio file #128
Added a log about the number of detected and used GPUs at launch #138
Updated pydantic to v2 #157
Added an audio file global download queue #168
Added the new WTL v0.1 License #177 #183 #184

Transcription

Added the vocab feature #124
Added an internal_vad parameter that helps with empty utterances #142 #173
Added a new fallback for empty segments during transcription #149
Added the float32 compute type for the transcription model #157

Diarization

Decomposed the diarization process into sub-modules and optimized diarization inference #180

Alignment

Added new cs, in, sl and th alignment models #164

Post-processing

Improved the post-processing strategy #136 #157
Fix word_timestamps parameter for dual_channel #152

Instructions

Improvement of the contributions instructions #131

Deploy

Update error payload for Svix in cortex endpoint #118
Docker image updated to cuda:11.7.1 #133
Update Svix payload in cortex endpoint #144
Add a configuration file using Nginx for custom deploy #146

Need improvements / Not fully working

Added the possibility to use extra transcription models for specific languages #110

Contributors:
@chainyo @aleksandr-smechov @jissagn

Contributors

jissagn, aleksandr-smechov, and chainyo

Assets 2

07 Jun 19:55

chainyo

v0.3.1

d63237c

v0.3.1

TL;DR: Transcription is now on steroids. 2x faster than the actual faster-whisper implementation.

API

Add time_and_tell decorator on specific functions to time individual processes on debug=True #77
Add a LoggingMiddleware on debug=True #77
Add a fallback for dual_channel if the audio file is not stereo #87

Transcription

Add quality metrics for the batch process and fallback if the quality is under defined thresholds #89
Implement word_timestamps for the batch process #91

Post-processing

Fix timestamps format during the post-processing step #86

@chainyo

Contributors

chainyo

Assets 2

02 Jun 07:20

chainyo

v0.3.0

c015930

v0.3.0

Documentation

Improve .env readability for an easier API configuration #52
Add README instructions for profiling container #72

API

Add authentication when the API is not in debug mode #56
Fix the audio file endpoint inputs #59
All submitted files are converted into .wav 16kHz for consistency #60
Reworked and more coherent Request/Response models for the API endpoints #60
Streamline the post-process functions (with or without alignment/diarization) #63
Simplify timestamps conversion in outputs #63
Fix blocking non-async functions #67
Huge API rework for handling concurrent requests better #71
Fix Exception/Error returns through the API -> raised errors should be more transparent for user #72
VAD use now onnx and faster-whisper implementation #72

AI models

Add alignment (from whisperX) as a new possible step #51
Fix alignment for fr, de, es, and it models #59
Add dual_channel transcription process for stereo audio file #60
Add the choice to use diarization or not #63
Implement Batch request process for transcription #72

Deploy

Docker is aligned with the local setup now #55
Improve Dockerfile and commands to use cache for models #55

Contributors:
@aleksandr-smechov @chainyo

Contributors

aleksandr-smechov and chainyo

Assets 2

10 May 13:57

chainyo

v0.2.0

dea46b9

v0.2.0

Changes from #31 @chainyo

Replace diarization with NVIDIA NeMo asr toolkit
Update config.py and add validators for necessary config settings
Update the Docker image with the latest from NVIDIA
Fix dependencies and versions
Fix the Python version to 3.9 locally and on Docker
New available timestamps format: ms. Now user can choose between hms, s (default) and ms.
Remove unused num_speakers parameter.

Contributors

chainyo

Assets 2

31 Mar 15:42

chainyo

v0.1.0

a58fd59

v0.1.0

First official release:

Open-source
Fast
Easy to deploy
Batch requests
Cost-effective
Easy-to-use

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API

Diarization

Documentation

Contributors

API

Diarization

Bug and Fixes

Documentation

Contributors

API

Transcription

Bugs and Fixes

API

Transcription

Diarization

CI

Contributors

The new License WTLv0.1

API

Transcription

Diarization

Alignment

Post-processing

Instructions

Deploy

Need improvements / Not fully working

Contributors

API

Transcription

Post-processing

Contributors

Documentation

API

AI models

Deploy

Contributors

Contributors

Releases: Wordcab/wordcab-transcribe

v0.5.3

API

Diarization

Documentation

Contributors

v0.5.2

API

Diarization

Bug and Fixes

Documentation

Contributors

v0.5.1

API

Transcription

Bugs and Fixes

v0.5.0

API

Transcription

Diarization

CI

Contributors

v0.4.0

The new License WTLv0.1

API

Transcription

Diarization

Alignment

Post-processing

Instructions

Deploy

Need improvements / Not fully working

Contributors

v0.3.1

API

Transcription

Post-processing

Contributors

v0.3.0

Documentation

API

AI models

Deploy

Contributors

v0.2.0

Contributors

v0.1.0