rerankers in FastEmbed

qdrant · Jan 28, 2025 · 4365098 · 4365098
1 parent 50efda6
commit 4365098
Show file tree

Hide file tree

Showing 3 changed files with 255 additions and 3 deletions.
diff --git a/qdrant-landing/content/documentation/advanced-tutorials/pdf-retrieval-at-scale.md b/qdrant-landing/content/documentation/advanced-tutorials/pdf-retrieval-at-scale.md
@@ -250,15 +250,15 @@ mask = processed_images.input_ids[0] == model_processor.image_token_id
 #   and reshape them to (x_patches, y_patches, dim)
 
 # (x_patches, y_patches, 128)
-image_tokens = image_embedding[mask].view(x_patches, y_patches, model.dim)
+image_patch_embeddings = image_embedding[mask].view(x_patches, y_patches, model.dim)
 
 # Now we can apply mean pooling by rows and columns
 
 # (x_patches, 128)
-pooled_by_rows = image_tokens.mean(dim=0)
+pooled_by_rows = image_patch_embeddings.mean(dim=0)
 
 # (y_patches, 128)
-pooled_by_columns = image_tokens.mean(dim=1)
+pooled_by_columns = image_patch_embeddings.mean(dim=1)
 
 # [Optionally] we can also concatenate special tokens to the pooled representations, 
 # For ColPali, it's only postfix

diff --git a/qdrant-landing/content/documentation/concepts/vectors.md b/qdrant-landing/content/documentation/concepts/vectors.md
@@ -66,6 +66,7 @@ Sparse vectors are a special type of vectors.
 Mathematically, they are the same as dense vectors, but they contain many zeros so they are stored in a special format.
 
 Sparse vectors in Qdrant don't have a fixed length, as it is dynamically allocated during vector insertion.
+The amount of non-zero values in sparse vectors is currently limited to u32 datatype range (4294967295). 
 
 In order to define a sparse vector, you need to provide a list of non-zero elements and their indexes.
 

diff --git a/qdrant-landing/content/documentation/fastembed/fastembed-rerankers.md b/qdrant-landing/content/documentation/fastembed/fastembed-rerankers.md
@@ -0,0 +1,251 @@
+---
+title: Reranking with FastEmbed
+weight: 8
+---
+
+# How to use rerankers with FastEmbed
+
+## Rerankers
+
+A reranker is a model that improves the ordering of search results. A subset of documents is initially prefetched using a fast, simple method (e.g., BM25 or dense embeddings). Then, a reranker, a more powerful & precise but slower & heavier model, re-evaluates this subset to refine document relevance to a query.
+
+Rerankers analyze in-depth token-level interactions between the query and each document, making them expensive in usage but precise in defining relevancy. They trade off speed for accuracy, so they are best used on **a limited candidate set** rather than the entire corpus.
+
+## Goal of this Tutorial 
+
+It's common to use [cross-enconder]((https://sbert.net/examples/applications/cross-encoder/README.html)) models as rerankers. This tutorial uses [Jina Reranker v2 Base Multilingual](https://jina.ai/news/jina-reranker-v2-for-agentic-rag-ultra-fast-multilingual-function-calling-and-code-search/) -- cross-encoder reranker supported in FastEmbed. 
+
+We use the `all-MiniLM-L6-v2` dense embedding model (also supported in FastEmbed) as a first-stage retriever and then refine results with `Jina Reranker v2`.
+
+## Setup
+
+Install `fastembed`.
+
+```python
+pip install fastembed
+```
+
+Imports cross-encoders and text embeddings for the first-stage retrieval.
+
+```python
+from fastembed import TextEmbedding
+from fastembed.rerank.cross_encoder import TextCrossEncoder
+```
+You can list which cross encoder rerankers are supported in FastEmbed.
+
+```python
+TextCrossEncoder.list_supported_models()
+```
+
+This command displays the available models. The output shows details about the model, including output embedding dimensions, model description, model size, model sources, and model file.
+
+<details>
+<summary> <span style="background-color: gray; color: black;"> Avaliable models </span> </summary>
+
+
+```python
+[{'model': 'Xenova/ms-marco-MiniLM-L-6-v2',
+  'size_in_GB': 0.08,
+  'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-6-v2'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'MiniLM-L-6-v2 model optimized for re-ranking tasks.',
+  'license': 'apache-2.0'},
+ {'model': 'Xenova/ms-marco-MiniLM-L-12-v2',
+  'size_in_GB': 0.12,
+  'sources': {'hf': 'Xenova/ms-marco-MiniLM-L-12-v2'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'MiniLM-L-12-v2 model optimized for re-ranking tasks.',
+  'license': 'apache-2.0'},
+ {'model': 'BAAI/bge-reranker-base',
+  'size_in_GB': 1.04,
+  'sources': {'hf': 'BAAI/bge-reranker-base'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'BGE reranker base model for cross-encoder re-ranking.',
+  'license': 'mit'},
+ {'model': 'jinaai/jina-reranker-v1-tiny-en',
+  'size_in_GB': 0.13,
+  'sources': {'hf': 'jinaai/jina-reranker-v1-tiny-en'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'Designed for blazing-fast re-ranking with 8K context length and fewer parameters than jina-reranker-v1-turbo-en.',
+  'license': 'apache-2.0'},
+ {'model': 'jinaai/jina-reranker-v1-turbo-en',
+  'size_in_GB': 0.15,
+  'sources': {'hf': 'jinaai/jina-reranker-v1-turbo-en'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'Designed for blazing-fast re-ranking with 8K context length.',
+  'license': 'apache-2.0'},
+ {'model': 'jinaai/jina-reranker-v2-base-multilingual',
+  'size_in_GB': 1.11,
+  'sources': {'hf': 'jinaai/jina-reranker-v2-base-multilingual'},
+  'model_file': 'onnx/model.onnx',
+  'description': 'A multi-lingual reranker model for cross-encoder re-ranking with 1K context length and sliding window',
+  'license': 'cc-by-nc-4.0'}]
+```
+</details>
+
+
+Now, load the first stage retriever and reranker.
+
+```python
+dense_embedding_model = TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
+reranker = TextCrossEncoder(model_name='jinaai/jina-reranker-v2-base-multilingual')
+```
+
+Models' files will be fetched and downloaded, with progress showing.
+
+## Embed & index data for the first-stage retrieval
+
+We will vectorize a toy movie description dataset with the `all-MiniLM-L6-v2` model & save embeddings in Qdrant to use them for the first-stage retrieval.
+
+Then, we will use a cross-econder reranking model to rerank a small subset of data retrieved in the first stage.
+
+<details>
+<summary> <span style="background-color: gray; color: black;"> Movie description dataset </span> </summary>
+
+```python
+descriptions = ["In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions.",
+ "A film projectionist longs to be a detective, and puts his meagre skills to work when he is framed by a rival for stealing his girlfriend's father's pocketwatch.",
+ "A group of high-end professional thieves start to feel the heat from the LAPD when they unknowingly leave a clue at their latest heist.",
+ "A petty thief with an utter resemblance to a samurai warlord is hired as the lord's double. When the warlord later dies the thief is forced to take up arms in his place.",
+ "A young boy named Kubo must locate a magical suit of armour worn by his late father in order to defeat a vengeful spirit from the past.",
+ "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre.",
+ "When a machine that allows therapists to enter their patients' dreams is stolen, all hell breaks loose. Only a young female therapist, Paprika, can stop it.",
+ "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop.",
+ "A story that revolves around drug abuse in the affluent north Indian State of Punjab and how the youth there have succumbed to it en-masse resulting in a socio-economic decline.",
+ "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent.",
+ "Concurrent theatrical ending of the TV series Neon Genesis Evangelion (1995).",
+ "During World War II, a rebellious U.S. Army Major is assigned a dozen convicted murderers to train and lead them into a mass assassination mission of German officers.",
+ "The toys are mistakenly delivered to a day-care center instead of the attic right before Andy leaves for college, and it's up to Woody to convince the other toys that they weren't abandoned and to return home.",
+ "A soldier fighting aliens gets to relive the same day over and over again, the day restarting every time he dies.",
+ "After two male musicians witness a mob hit, they flee the state in an all-female band disguised as women, but further complications set in.",
+ "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household.",
+ "A renegade reporter trailing a young runaway heiress for a big story joins her on a bus heading from Florida to New York, and they end up stuck with each other when the bus leaves them behind at one of the stops.",
+ "Story of 40-man Turkish task force who must defend a relay station.",
+ "Spinal Tap, one of England's loudest bands, is chronicled by film director Marty DiBergi on what proves to be a fateful tour.",
+ "Oskar, an overlooked and bullied boy, finds love and revenge through Eli, a beautiful but peculiar girl."]
+```
+</details>
+
+```python
+descriptions_embeddings = list(
+    dense_embedding_model.embed(descriptions)
+)
+```
+
+Let's upload embeddings to Qdrant.
+
+Install `qdrant-client`
+
+```python
+pip install qdrant-client
+```
+
+Qdrant Client has a simple in-memory mode that allows you to experiment locally on small data volumes. 
+Alternatively, you could use for experiments [a free cluster](https://qdrant.tech/documentation/cloud/create-cluster/#create-a-cluster) in Qdrant Cloud.
+
+```python
+from qdrant_client import QdrantClient, models
+
+qdrant_client = QdrantClient(":memory:") # Qdrant is running from RAM.
+```
+
+Let's create a [collection](https://qdrant.tech/documentation/concepts/collections/) with our movie data.
+
+```python
+qdrant_client.create_collection(
+    collection_name="movies",
+    vectors_config={
+        "embedding": models.VectorParams(
+            size=384, #size of `all-MiniLM-L6-v2` embeddings
+            distance=models.Distance.COSINE
+        )
+    }
+)
+```
+
+And upload embeddings to it.
+
+```python
+qdrant_client.upload_points(
+    collection_name="movies",
+    points=[
+        models.PointStruct(
+            id=idx,
+            payload={
+                "description": description
+            },
+            vector={
+                "embedding": vector
+            }
+        )
+        for idx, (description, vector) in enumerate(zip(descriptions, 
+                                                        descriptions_embeddings))
+    ],
+)
+```
+
+## First-stage retrieval
+
+Let's see how relevant will be results using only an `all-MiniLM-L6-v2`-based dense retriever. 
+
+```python
+query = '''A story about a strong historically significant female figure.'''
+query_embedded = list(dense_embedding_model.query_embed(query))[0]
+
+initial_retrieval = qdrant_client.query_points(
+    collection_name="movies",
+    using="embedding",
+    query=query_embedded,
+    with_payload=True,
+    limit=10
+)
+
+description_hits = []
+for i, hit in enumerate(initial_retrieval.points):
+    print(f'''Result number {i+1} is \"{hit.payload["description"]}\"''')
+    description_hits.append(hit.payload["description"])
+```
+
+The result is the following:
+
+```bash
+Result number 1 is "A world-weary political journalist picks up the story of a woman's search for her son, who was taken away from her decades ago after she became pregnant and was forced to live in a convent."
+Result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."
+...
+Result number 9 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."
+Result number 10 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."
+```
+
+We can see that the description of *"The Messenger: The Story of Joan of Arc"* movie, which is the most fitting, is 10th in the results. 
+
+Let's try refining the retrieved subset's order with `Jina Reranker v2`. It takes as an input query and a set of documents (movie descriptions) and calculates the relevance score based on token-level interaction between the query and each document. 
+
+```python
+new_scores = list(reranker.rerank(query, description_hits)) #returns scores between query and each document
+
+ranking = [(i, score) for i, score in enumerate(new_scores)] #saving document indices
+ranking.sort(key=lambda x: x[1], reverse=True) #sorting them in order of relevance defined by reranker
+
+for i, rank in enumerate(ranking):
+    print(f'''Reranked result number {i+1} is \"{description_hits[rank[0]]}\"''')
+```
+
+The reranker puts the desired movie in the first position by relevance.
+
+```bash
+Reranked result number 1 is "In 1431, Jeanne d'Arc is placed on trial on charges of heresy. The ecclesiastical jurists attempt to force Jeanne to recant her claims of holy visions."
+Reranked result number 2 is "Exiled into the dangerous forest by her wicked stepmother, a princess is rescued by seven dwarf miners who make her part of their household."
+...
+Reranked result number 9 is "An ordinary word processor has the worst night of his life after he agrees to visit a girl in Soho whom he met that evening at a coffee shop."
+Reranked result number 10 is "A biopic detailing the 2 decades that Punjabi Sikh revolutionary Udham Singh spent planning the assassination of the man responsible for the Jallianwala Bagh massacre."
+```
+
+
+## Conclusion
+
+Rerankers help refine search results by reordering retrieved candidates based on deeper semantic analysis. They should be applied **only to a small subset of retrieved results** for efficiency.  
+
+Balance speed and accuracy in search using the power of rerankers!
+
+
+