-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/StarlightSearch/EmbedAnything
- Loading branch information
Showing
3 changed files
with
38 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
draft: false | ||
date: 2025-01-01 | ||
authors: | ||
- sonam | ||
- akshay | ||
slug: modernBERT | ||
title: version 0.5 | ||
--- | ||
|
||
We are thrilled to share that EmbedAnything version 0.5 is out now and comprise of insane development like support for ModernBert and ReRanker models. Along with Ingestion pipeline support for DocX, and HTML let’s get in details. | ||
|
||
The best of all have been support for late-interaction model, both ColPali and ColBERT on onnx. | ||
|
||
1. **ModernBert** Support: Well it made quite a splash, and we were obliged to add it, in the fastest inference engine, embedanything. In addition to being faster and more accurate, ModernBERT also increases context length to 8k tokens (compared to just 512 for most encoders), and is the first encoder-only model that includes a large amount of code in its training data. | ||
2. **ColPali- Onnx** : Running the ColPali model directly on a local machine might not always be feasible. To address this, we developed a **quantized version of ColPali**. Find it on our hugging face, link [here](https://huggingface.co/starlight-ai/colpali-v1.2-merged-onnx). You could also run it both on Candle and on ONNX. | ||
3. **ColBERT**: ColBERT is a *fast* and *accurate* retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds. | ||
4. **ReRankers:** EmbedAnything recently contributed for the support of reranking models to Candle so as to add it in our own library. It can support any kind of reranking models. Precision meets performance! Use reranking models to refine your retrieval results for even greater accuracy. | ||
5. **Jina V3:** Also contributed to V3 models, for Jina can seamlessly integrate any V3 model. | ||
6. **𝗗𝗢𝗖𝗫 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴** | ||
|
||
Effortlessly extract text from .docx files and convert it into embeddings. Simplify your document workflows like never before! | ||
|
||
7. **𝗛𝗧𝗠𝗟 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴:** | ||
|
||
Parsing and embedding HTML documents just got easier! | ||
|
||
✅ Extract rich metadata with embeddings | ||
✅ Handle code blocks separately for better context | ||
|
||
Supercharge your documentation retrieval with these advanced capabilities. |