Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v1.6.0
What's Changed
- feat: support multiple backends at the same time by @OlivierDehaene in #440
- feat: GTE classification head by @kozistr in #441
- feat: Implement GTE model to support the non-flash-attn version by @kozistr in #446
- feat: Implement MPNet model (#363) by @kozistr in #447
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- Download
model.onnx_data
by @kozistr in #343 - Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
- fix: add serde default for truncation direction by @drbh in #399
- fix: metrics unbounded memory by @OlivierDehaene in #409
- Fix to allow health check w/o auth by @kozistr in #360
- Update
ort
crate version to2.0.0-rc.4
to support onnx IR version 10 by @kozistr in #361 - adds curl to fix healthcheck by @WissamAntoun in #376
- fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
- fix: use status code 400 when batch is empty by @OlivierDehaene in #413
- fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
- feat: auto limit string if truncate is set by @OlivierDehaene in #428
New Contributors
- @Wauplin made their first contribution in #342
- @XciD made their first contribution in #345
- @WissamAntoun made their first contribution in #376
Full Changelog: v1.5.0...v1.5.1
v1.5.0
Notable Changes
- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
- Add
/similarity
route
What's Changed
- tokenizer max limit on input size by @ErikKaum in #324
- docs: air-gapped deployments by @OlivierDehaene in #326
- feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
- feat: add
/similarity
route by @OlivierDehaene in #331 - fix(ort): fix mean pooling by @OlivierDehaene in #332
- chore(candle): update flash attn by @OlivierDehaene in #335
- v1.5.0 by @OlivierDehaene in #336
New Contributors
Full Changelog: v1.4.0...v1.5.0
v1.4.0
Notable Changes
- Cuda support for the Qwen2 model architecture
What's Changed
- feat(candle): support Qwen2 on Cuda by @OlivierDehaene in #316
- fix(candle): fix last token pooling
Full Changelog: v1.3.0...v1.4.0
v1.3.0
Notable changes
- New truncation direction parameter
- Cuda support for JinaCode model architecture
- Cuda support for Mistral model architecture
- Cuda support for Alibaba GTE model architecture
- New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.
What's Changed
- Ci migration to K8s by @glegendre01 in #269
- chore: map compute_cap from GPU name by @haixiw in #276
- chore: cover Nvidia T4/L4 GPU by @haixiw in #284
- feat(ci): add trufflehog secrets detection by @McPatate in #286
- Community contribution code of conduct by @LysandreJik in #291
- Update README.md by @michaelfeil in #277
- Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in #266
- Add env for OTLP service name by @kozistr in #285
- Fix CI build timeout by @fxmarty in #296
- fix(router): payload limit was not correctly applied by @OlivierDehaene in #298
- feat(candle): better cuda error by @OlivierDehaene in #300
- feat(router): add truncation direction parameter by @OlivierDehaene in #299
- Support for Jina Code model by @patricebechard in #292
- feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in #301
- fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in #302
- fix: use malloc_trim to cleanup pages by @OlivierDehaene in #307
- feat(candle): add FlashMistral by @OlivierDehaene in #308
- feat(candle): add flash gte by @OlivierDehaene in #310
- feat: add default prompts by @OlivierDehaene in #312
- Add optional CORS allow any option value in http server cli by @kir-gadjello in #260
- Update
HUGGING_FACE_HUB_TOKEN
toHF_API_TOKEN
in README by @kevinhu in #263 - v1.3.0 by @OlivierDehaene in #313
New Contributors
- @haixiw made their first contribution in #276
- @McPatate made their first contribution in #286
- @LysandreJik made their first contribution in #291
- @michaelfeil made their first contribution in #277
- @scriptator made their first contribution in #266
- @fxmarty made their first contribution in #296
- @patricebechard made their first contribution in #292
- @kir-gadjello made their first contribution in #260
- @kevinhu made their first contribution in #263
Full Changelog: v1.2.3...v1.3.0
v1.2.3
What's Changed
- fix limit peak memory to build cuda-all docker image by @OlivierDehaene in #246
Full Changelog: v1.2.2...v1.2.3
v1.2.2
What's Changed
- fix(gke): accept null values for vertex env vars by @OlivierDehaene in #243
- fix: fix cpu image to not default on the sagemaker entrypoint
Full Changelog: v1.2.1...v1.2.2
v1.2.1
TEI is now Apache 2.0!
What's Changed
- Document how to send batched inputs by @osanseviero in #222
- feat: add auto-truncate arg by @OlivierDehaene in #224
- feat: add PredictPair to proto by @OlivierDehaene in #225
- fix: fix auto_truncate for openai by @OlivierDehaene in #228
- Change license to Apache 2.0 by @OlivierDehaene in #231
- feat: Amazon SageMaker compatible images by @JGalego in #103
- fix(CI): fix build all by @OlivierDehaene in #236
- fix: fix cuda-all image by @OlivierDehaene in #239
- Add SageMaker CPU images and validate by @philschmid in #240
New Contributors
- @osanseviero made their first contribution in #222
- @JGalego made their first contribution in #103
- @philschmid made their first contribution in #240
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's Changed
- add cuda all image to facilitate deployment by @OlivierDehaene in #186
- add splade pooling to Bert by @OlivierDehaene in #187
- support vertex api endpoint by @drbh in #184
- readme examples by @plaggy in #180
- add_pooling_layer for bert classification by @OlivierDehaene in #190
- add /embed_sparse route by @OlivierDehaene in #191
- Applying
Cargo.toml
optimization options by @somehowchris in #201 - Add Dockerfile-arm64 to allow docker builds on Apple M1/M2 architecture by @iandoe in #209
- configurable payload limit by @OlivierDehaene in #210
- add api_key for request authorization by @OlivierDehaene in #211
- add all methods to vertex API by @OlivierDehaene in #192
- add
/decode
route by @OlivierDehaene in #212 - Input Types Compatibility with OpenAI's API (#112) by @OlivierDehaene in #214
New Contributors
- @drbh made their first contribution in #184
- @plaggy made their first contribution in #180
- @somehowchris made their first contribution in #201
- @iandoe made their first contribution in #209
Full Changelog: v1.1.0...v1.2.0
v1.1.0
Highlights
- Splade pooling
What's Changed
- Update Dockerfile to install curl by @jpbalarini in #117
- fix loading of bert classification models by @OlivierDehaene in #173
- splade pooling by @OlivierDehaene in #174
New Contributors
- @jpbalarini made their first contribution in #117
Full Changelog: v1.0.0...v.1.1.0