Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

OpenNMT / CTranslate2 Public

Notifications You must be signed in to change notification settings
Fork 303
Star 3.4k

Code
Issues 163
Pull requests 27
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: OpenNMT/CTranslate2

Releases · OpenNMT/CTranslate2

CTranslate2 3.9.1

18 Mar 08:21

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.9.1

Fixes and improvements

Fix missing alignments in the Whisper.align result due to a bug in the DTW implementation
Fix error when converting a Whisper model from a path

Assets 2

Loading

All reactions

CTranslate2 3.9.0

15 Mar 15:37

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.9.0

New features

Support BLOOM language models
Add method Whisper.align to return the text/audio alignment and implement word-level timestamps

Fixes and improvements

Do not force intra_threads to 1 when loading a model on the GPU as some ops may still run on the CPU
Disable multithreading when copying a batch of small arrays

Assets 2

Loading

All reactions

CTranslate2 3.8.0

06 Mar 15:39

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.8.0

New features

Experimental support of AVX512 in manually vectorized functions: this code path is not enabled by default but can be enabled by setting the environment variable CT2_FORCE_CPU_ISA=AVX512
Add Transformers converter option copy_files to copy any files from the Hugging Face model to the converted model directory
Expose some Whisper parameters:
- max_initial_timestamp_index
- suppress_blank
- suppress_tokens

Fixes and improvements

Reduce conversion time for large models by skipping some weights comparisons
Reduce maximum memory usage when converting Transformers models with --quantization float16
Set FP32 compute type for FP16 convolutions to match the PyTorch behavior and accuracy
Update oneDNN to 3.0.1

Assets 2

Loading

All reactions

CTranslate2 3.7.0

23 Feb 10:18

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.7.0

Changes

Rename the "float" compute type to "float32" for clarity. "float" is still accepted for backward compatibility.

New features

Add the environment variable CT2_CUDA_TRUE_FP16_GEMM. This flag is enabled by default so that FP16 GEMMs are running in full FP16. When disabled, the compute type of FP16 GEMMs is set to FP32, which is what PyTorch and TensorFlow do by default.

Fixes and improvements

Improve the numerical precision of Whisper models running in FP16 by setting the FP32 compute type for GEMMs (same behavior as PyTorch)
Improve support for running the Whisper models with INT16 quantization
Ensure the Whisper decoding does not continue past max_length, which could previously happen when the prompt was longer than max_length/2
Include the EOS score in the score returned by Whisper during greedy search

Assets 2

Loading

All reactions

CTranslate2 3.6.0

16 Feb 16:23

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.6.0

New features

Build the Windows Python wheels with cuDNN to enable GPU execution of Whisper models
Add the model attribute Whisper.is_multilingual

Fixes and improvements

Reduce the beam search memory usage by not duplicating the decoder states that are the same in each beam (e.g. the projected memory keys and values)
Optimize the dot product attention during beam search by moving the query beam dimension to the time dimension
Fix support of English-only Whisper models
Include the prefix tokens (if they exist) in the output of Whisper.generate
Log a warning when the model weights are implicitly converted to another type

Assets 2

Loading

All reactions

CTranslate2 3.5.1

13 Feb 19:16

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.5.1

Fixes and improvements

Whisper: fix an incorrect timestamp rule that prevented timestamps to be generated in pairs
Whisper: ignore the EOS token when applying the length penalty to match the original implementation

Assets 2

Loading

All reactions

CTranslate2 3.5.0

10 Feb 10:57

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.5.0

New features

Add a patience factor for beam search to continue decoding until beam_size * patience hypotheses are finished, as described in Kasai et al. 2022
Implement all GELU variants and select them accordingly when converting models:
- Tanh approximation (already implemented)
- Sigmoid approximation
- Reference implementation based on the CDF

Fixes and improvements

Fix incorrect outputs of T5 models due to a bug in the CUDA kernel of the RMS normalization
Raise an error if the Whisper input shape is incorrect
Optimize the transposition operator used in the multi-head attention when running on GPU
Remove the upper limit in python_requires to facilitate the package installation with tools like Poetry and PDM

Assets 2

Loading

All reactions

CTranslate2 3.4.0

03 Feb 09:48

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.4.0

Fixes and improvements

Fix incorrect vocabulary in M2M100 models after conversion with transformers>=4.24
Fix incorrect model outputs when executing with very large batch sizes on GPU
Fix memory error in biased decoding: the vector of divergence was read and updated past its length
Allow setting prefix_bias_beta > 0 with beam_size == 1
Prevent timestamps from decreasing during Whisper generation
Make some error messages more helpful when implementing a custom converter

Assets 2

Loading

All reactions

CTranslate2 3.3.0

02 Jan 12:21

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.3.0

New features

Support T5 models, including the variants T5v1.1 and mT5
Support loading the model files from memory:
- Python: see the files argument in the constructor of classes loading models
- C++: see the models::ModelMemoryReader class

Fixes and improvements

Improve the quantization accuracy of OPT models by applying the SmoothQuant technique during conversion (pre-computed activation scales should be passed to the converter option --activation_scales)
Fix conversion of BART-like models from HuggingFace that are using a different number of encoder and decoder layers
Fix compilation when no BLAS CPU backend is selected
Remove no longer relevant CMake warning when the project is compiled without oneDNN
Update oneDNN to 3.0
Update oneMKL to 2023.0

Assets 2

Loading

All reactions

CTranslate2 3.2.0

12 Dec 17:18

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 3.2.0

New features

Add decoding option suppress_sequences to prevent specific sequences of tokens from being generated
Add decoding option end_token to stop the decoding on a different token than the model EOS token
Allow returning multiple random hypotheses from greedy search + random sampling when setting num_hypotheses > 1

Fixes and improvements

Improve support for batch generation with the Whisper model:
- Improve performance of batch generation with a context (we only require the prompts to have the same length, which is easily done by adapting the number of previous text tokens)
- Support batch mode for option return_no_speech_prob
- Support cases where some prompts in the batch have the token <|notimestamps|> but not others
Enable the Conv1D layer in more Python wheels:
- macOS x64 (using oneDNN)
- macOS ARM64 (using a custom implementation)
- Linux AArch64 (using a custom implementation)
Update the OpenNMT-py converter to support the latest checkpoint structure
Generalize the TransformerSpec constructor to accept arbitrary encoder and decoder specifications
Remove the global compilation flag -ffast-math which introduces unwanted side effects and enable it only for the layer norm CPU kernel where it is actually useful
Fix CMake error on Windows when setting -DOPENMP_RUNTIME=COMP

Assets 2

Loading

trihutama reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Previous 1 2 3 4 5 6 … 12 13 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.