From e3a5682393b1912cc96593f40b1ef26f845bfa95 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Wed, 20 Mar 2024 18:38:38 +0530 Subject: [PATCH 01/10] Re-organize docs --- .../Binary Quantization with Qdrant.ipynb | 15 ++++++++++++++- .../Retrieval_with_FastEmbed.ipynb | 0 .../Usage_With_Qdrant.ipynb | 0 .../Binary Quantization from Scratch.ipynb | 17 +---------------- 4 files changed, 15 insertions(+), 17 deletions(-) rename docs/{experimental => With Qdrant}/Binary Quantization with Qdrant.ipynb (96%) rename docs/{examples => With Qdrant}/Retrieval_with_FastEmbed.ipynb (100%) rename docs/{examples => With Qdrant}/Usage_With_Qdrant.ipynb (100%) diff --git a/docs/experimental/Binary Quantization with Qdrant.ipynb b/docs/With Qdrant/Binary Quantization with Qdrant.ipynb similarity index 96% rename from docs/experimental/Binary Quantization with Qdrant.ipynb rename to docs/With Qdrant/Binary Quantization with Qdrant.ipynb index 89e11eea..8bcd16b2 100644 --- a/docs/experimental/Binary Quantization with Qdrant.ipynb +++ b/docs/With Qdrant/Binary Quantization with Qdrant.ipynb @@ -4,7 +4,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Binary Quantization with Qdrant\n", + "# Binary Quantization with Qdrant & OpenAI Embedding\n", + "\n", + "---\n", + "In the world of large-scale data retrieval and processing, efficiency is crucial. With the exponential growth of data, the ability to retrieve information quickly and accurately can significantly affect system performance. This blog post explores a technique known as binary quantization applied to OpenAI embeddings, demonstrating how it can enhance **retrieval latency by 20x** or more.\n", + "\n", + "## What Are OpenAI Embeddings?\n", + "OpenAI embeddings are numerical representations of textual information. They transform text into a vector space where semantically similar texts are mapped close together. This mathematical representation enables computers to understand and process human language more effectively.\n", + "\n", + "## Binary Quantization\n", + "Binary quantization is a method which converts continuous numerical values into binary values (0 or 1). It simplifies the data structure, allowing faster computations. Here's a brief overview of the binary quantization process applied to OpenAI embeddings:\n", + "\n", + "1. **Load Embeddings**: OpenAI embeddings are loaded from parquet files.\n", + "2. **Binary Transformation**: The continuous valued vectors are converted into binary form. Here, values greater than 0 are set to 1, and others remain 0.\n", + "3. **Comparison & Retrieval**: Binary vectors are used for comparison using logical XOR operations and other efficient algorithms.\n", "\n", "Binary Quantization is a promising approach to improve retrieval speeds and reduce memory footprint of vector search engines. In this notebook we will show how to use Qdrant to perform binary quantization of vectors and perform fast similarity search on the resulting index.\n", "\n", diff --git a/docs/examples/Retrieval_with_FastEmbed.ipynb b/docs/With Qdrant/Retrieval_with_FastEmbed.ipynb similarity index 100% rename from docs/examples/Retrieval_with_FastEmbed.ipynb rename to docs/With Qdrant/Retrieval_with_FastEmbed.ipynb diff --git a/docs/examples/Usage_With_Qdrant.ipynb b/docs/With Qdrant/Usage_With_Qdrant.ipynb similarity index 100% rename from docs/examples/Usage_With_Qdrant.ipynb rename to docs/With Qdrant/Usage_With_Qdrant.ipynb diff --git a/docs/experimental/Binary Quantization from Scratch.ipynb b/docs/experimental/Binary Quantization from Scratch.ipynb index cb4f5c8d..5a3369cd 100644 --- a/docs/experimental/Binary Quantization from Scratch.ipynb +++ b/docs/experimental/Binary Quantization from Scratch.ipynb @@ -3,22 +3,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "# Binary Quantization of OpenAI Embedding\n", - "---\n", - "\n", - "In the world of large-scale data retrieval and processing, efficiency is crucial. With the exponential growth of data, the ability to retrieve information quickly and accurately can significantly affect system performance. This blog post explores a technique known as binary quantization applied to OpenAI embeddings, demonstrating how it can enhance **retrieval latency by 20x** or more.\n", - "\n", - "## What Are OpenAI Embeddings?\n", - "OpenAI embeddings are numerical representations of textual information. They transform text into a vector space where semantically similar texts are mapped close together. This mathematical representation enables computers to understand and process human language more effectively.\n", - "\n", - "## Binary Quantization\n", - "Binary quantization is a method which converts continuous numerical values into binary values (0 or 1). It simplifies the data structure, allowing faster computations. Here's a brief overview of the binary quantization process applied to OpenAI embeddings:\n", - "\n", - "1. **Load Embeddings**: OpenAI embeddings are loaded from parquet files.\n", - "2. **Binary Transformation**: The continuous valued vectors are converted into binary form. Here, values greater than 0 are set to 1, and others remain 0.\n", - "3. **Comparison & Retrieval**: Binary vectors are used for comparison using logical XOR operations and other efficient algorithms." - ] + "source": [] }, { "cell_type": "markdown", From da0a9cb2116c214a574c301dd5a2cf4348cad282 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Wed, 20 Mar 2024 19:09:27 +0530 Subject: [PATCH 02/10] Rename notebooks --- .../Binary_Quantization_with_Qdrant.ipynb} | 0 docs/{With Qdrant => qdrant}/Retrieval_with_FastEmbed.ipynb | 0 docs/{With Qdrant => qdrant}/Usage_With_Qdrant.ipynb | 0 3 files changed, 0 insertions(+), 0 deletions(-) rename docs/{With Qdrant/Binary Quantization with Qdrant.ipynb => qdrant/Binary_Quantization_with_Qdrant.ipynb} (100%) rename docs/{With Qdrant => qdrant}/Retrieval_with_FastEmbed.ipynb (100%) rename docs/{With Qdrant => qdrant}/Usage_With_Qdrant.ipynb (100%) diff --git a/docs/With Qdrant/Binary Quantization with Qdrant.ipynb b/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb similarity index 100% rename from docs/With Qdrant/Binary Quantization with Qdrant.ipynb rename to docs/qdrant/Binary_Quantization_with_Qdrant.ipynb diff --git a/docs/With Qdrant/Retrieval_with_FastEmbed.ipynb b/docs/qdrant/Retrieval_with_FastEmbed.ipynb similarity index 100% rename from docs/With Qdrant/Retrieval_with_FastEmbed.ipynb rename to docs/qdrant/Retrieval_with_FastEmbed.ipynb diff --git a/docs/With Qdrant/Usage_With_Qdrant.ipynb b/docs/qdrant/Usage_With_Qdrant.ipynb similarity index 100% rename from docs/With Qdrant/Usage_With_Qdrant.ipynb rename to docs/qdrant/Usage_With_Qdrant.ipynb From e980e78defca06ef57d9e7f4d4dc437d3d852053 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Thu, 21 Mar 2024 09:21:07 +0530 Subject: [PATCH 03/10] Move nbs --- docs/examples/Hybrid_Search.ipynb | 1084 +++++++++++++++++++++++++++++ docs/how-to/Hybrid_Search.ipynb | 1075 ++++++++++++++++++++++++++++ 2 files changed, 2159 insertions(+) create mode 100644 docs/examples/Hybrid_Search.ipynb create mode 100644 docs/how-to/Hybrid_Search.ipynb diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb new file mode 100644 index 00000000..5319a5f1 --- /dev/null +++ b/docs/examples/Hybrid_Search.ipynb @@ -0,0 +1,1084 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Hybrid Search with FastEmbed & Qdrant\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What will we do?\n", + "This notebook demonstrates the usage of Hybrid Search with FastEmbed & Qdrant. \n", + "\n", + "1. Setup: Download and install the required dependencies\n", + "2. Preview data: Load and preview the data\n", + "3. Create Sparse Embeddings: Create SPLADE++ embeddings for the data\n", + "4. Create Dense Embeddings: Create BGE-Base-en-v1.5 embeddings for the data\n", + "5. Indexing: Index the embeddings using Qdrant\n", + "6. Search: Perform Hybrid Search using FastEmbed & Qdrant\n", + "\n", + "## Setup\n", + "\n", + "In order to get started, you need only two dependencies, and we'll install them next:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# !pip install -qU qdrant-client fastembed datasets transformers" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'0.2.5'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "from typing import List\n", + "\n", + "from datasets import load_dataset\n", + "from qdrant_client import QdrantClient\n", + "from transformers import AutoTokenizer\n", + "from qdrant_client.http.models import VectorParams, SparseVectorParams, Distance, SparseIndexParams, PointStruct\n", + "\n", + "import fastembed\n", + "from fastembed.sparse.sparse_text_embedding import SparseEmbedding, SparseTextEmbedding\n", + "from fastembed.text.text_embedding import TextEmbedding\n", + "\n", + "fastembed.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "dataset = load_dataset(\"tasksource/esci\")\n", + "# We'll select the first 100 examples for this demo\n", + "dataset = dataset[\"train\"].select(range(100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preview Data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", + "
" + ], + "text/plain": [ + " example_id query query_id product_id \\\n", + "0 0 revent 80 cfm 0 B000MOO21W \n", + "1 291891 bathroom fan without light 13723 B000MOO21W \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW \n", + "\n", + " product_locale esci_label small_version large_version \\\n", + "0 us Irrelevant 0 1 \n", + "1 us Exact 1 1 \n", + "2 us Exact 0 1 \n", + "3 us Exact 0 1 \n", + "4 us Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "\n", + " product_description \\\n", + "0 None \n", + "1 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "1 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "1 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = dataset.to_pandas()\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", + "
" + ], + "text/plain": [ + " example_id query query_id product_id \\\n", + "0 0 revent 80 cfm 0 B000MOO21W \n", + "1 291891 bathroom fan without light 13723 B000MOO21W \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW \n", + "\n", + " product_locale esci_label small_version large_version \\\n", + "0 us Irrelevant 0 1 \n", + "1 us Exact 1 1 \n", + "2 us Exact 0 1 \n", + "3 us Exact 0 1 \n", + "4 us Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "\n", + " product_description \\\n", + "0 None \n", + "1 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "1 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "1 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = df[df.product_locale == \"us\"]\n", + "df = df[df.product_text.notna()]\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Sparse Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aabde43c3c3043248e4a1eff6755a17b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 9 files: 0%| | 0/9 [00:00\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", + "" + ], + "text/plain": [ + " example_id query query_id product_id \\\n", + "0 0 revent 80 cfm 0 B000MOO21W \n", + "1 291891 bathroom fan without light 13723 B000MOO21W \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW \n", + "\n", + " product_locale esci_label small_version large_version \\\n", + "0 us Irrelevant 0 1 \n", + "1 us Exact 1 1 \n", + "2 us Exact 0 1 \n", + "3 us Exact 0 1 \n", + "4 us Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "\n", + " product_description \\\n", + "0 None \n", + "1 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "1 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "1 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = dataset.to_pandas()\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", + "
" + ], + "text/plain": [ + " example_id query query_id product_id \\\n", + "0 0 revent 80 cfm 0 B000MOO21W \n", + "1 291891 bathroom fan without light 13723 B000MOO21W \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW \n", + "\n", + " product_locale esci_label small_version large_version \\\n", + "0 us Irrelevant 0 1 \n", + "1 us Exact 1 1 \n", + "2 us Exact 0 1 \n", + "3 us Exact 0 1 \n", + "4 us Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "\n", + " product_description \\\n", + "0 None \n", + "1 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "1 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "1 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = df[df.product_locale == \"us\"]\n", + "df = df[df.product_text.notna()]\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Sparse Embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "aabde43c3c3043248e4a1eff6755a17b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 9 files: 0%| | 0/9 [00:00 Date: Thu, 21 Mar 2024 10:36:14 +0530 Subject: [PATCH 04/10] Working Sparse and Dense Search --- docs/how-to/Hybrid_Search.ipynb | 956 ++++++++++++++------------------ 1 file changed, 406 insertions(+), 550 deletions(-) diff --git a/docs/how-to/Hybrid_Search.ipynb b/docs/how-to/Hybrid_Search.ipynb index 031bd088..5a150d9c 100644 --- a/docs/how-to/Hybrid_Search.ipynb +++ b/docs/how-to/Hybrid_Search.ipynb @@ -56,10 +56,22 @@ "import json\n", "from typing import List\n", "\n", + "import numpy as np\n", + "import pandas as pd\n", "from datasets import load_dataset\n", "from qdrant_client import QdrantClient\n", + "from qdrant_client.http.models import (\n", + " Distance,\n", + " NamedSparseVector,\n", + " NamedVector,\n", + " SparseVector,\n", + " PointStruct,\n", + " SearchRequest,\n", + " SparseIndexParams,\n", + " SparseVectorParams,\n", + " VectorParams,\n", + ")\n", "from transformers import AutoTokenizer\n", - "from qdrant_client.http.models import VectorParams, SparseVectorParams, Distance, SparseIndexParams, PointStruct\n", "\n", "import fastembed\n", "from fastembed.sparse.sparse_text_embedding import SparseEmbedding, SparseTextEmbedding\n", @@ -72,216 +84,39 @@ "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [], - "source": [ - "dataset = load_dataset(\"tasksource/esci\")\n", - "# We'll select the first 100 examples for this demo\n", - "dataset = dataset[\"train\"].select(range(100))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Preview Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, "outputs": [ { "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", - "
" - ], "text/plain": [ - " example_id query query_id product_id \\\n", - "0 0 revent 80 cfm 0 B000MOO21W \n", - "1 291891 bathroom fan without light 13723 B000MOO21W \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW \n", - "\n", - " product_locale esci_label small_version large_version \\\n", - "0 us Irrelevant 0 1 \n", - "1 us Exact 1 1 \n", - "2 us Exact 0 1 \n", - "3 us Exact 0 1 \n", - "4 us Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "\n", - " product_description \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "1 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "1 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + "Dataset({\n", + " features: ['example_id', 'query', 'query_id', 'product_id', 'product_locale', 'esci_label', 'small_version', 'large_version', 'product_title', 'product_description', 'product_bullet_point', 'product_brand', 'product_color', 'product_text'],\n", + " num_rows: 919\n", + "})" ] }, - "execution_count": 5, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df = dataset.to_pandas()\n", - "df.head()" + "dataset = load_dataset(\"tasksource/esci\")\n", + "# We'll select the first 1000 examples for this demo\n", + "dataset = dataset[\"train\"].select(range(1000))\n", + "dataset = dataset.filter(lambda x: x['product_locale'] == \"us\")\n", + "dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preview Data" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -340,23 +175,6 @@ " Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...\n", " \n", " \n", - " 1\n", - " 291891\n", - " bathroom fan without light\n", - " 13723\n", - " B000MOO21W\n", - " us\n", - " Exact\n", - " 1\n", - " 1\n", - " Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...\n", - " None\n", - " WhisperCeiling fans feature a totally enclosed...\n", - " Panasonic\n", - " White\n", - " Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...\n", - " \n", - " \n", " 2\n", " 1\n", " revent 80 cfm\n", @@ -401,343 +219,164 @@ " 0\n", " 1\n", " Delta Electronics RAD80L BreezRadiance 80 CFM ...\n", - " This pre-owned or refurbished product has been...\n", - " Quiet operation at 1.5 sones\\nBuilt-in thermos...\n", - " DELTA ELECTRONICS (AMERICAS) LTD.\n", - " White\n", - " Delta Electronics RAD80L BreezRadiance 80 CFM ...\n", - " \n", - " \n", - "\n", - "" - ], - "text/plain": [ - " example_id query query_id product_id \\\n", - "0 0 revent 80 cfm 0 B000MOO21W \n", - "1 291891 bathroom fan without light 13723 B000MOO21W \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW \n", - "\n", - " product_locale esci_label small_version large_version \\\n", - "0 us Irrelevant 0 1 \n", - "1 us Exact 1 1 \n", - "2 us Exact 0 1 \n", - "3 us Exact 0 1 \n", - "4 us Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "\n", - " product_description \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "1 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "1 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = df[df.product_locale == \"us\"]\n", - "df = df[df.product_text.notna()]\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "100" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(df)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Sparse Embeddings" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aabde43c3c3043248e4a1eff6755a17b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Fetching 9 files: 0%| | 0/9 [00:00This pre-owned or refurbished product has been...\n", + " Quiet operation at 1.5 sones\\nBuilt-in thermos...\n", + " DELTA ELECTRONICS (AMERICAS) LTD.\n", + " White\n", + " Delta Electronics RAD80L BreezRadiance 80 CFM ...\n", + " \n", + " \n", + " 5\n", + " 4\n", + " revent 80 cfm\n", + " 0\n", + " B07QJ7WYFQ\n", + " us\n", + " Exact\n", + " 0\n", + " 1\n", + " Panasonic FV-08VRE2 Ventilation Fan with Reces...\n", + " None\n", + " The design solution for Fan/light combinations...\n", + " Panasonic\n", + " White\n", + " Panasonic FV-08VRE2 Ventilation Fan with Reces...\n", + " \n", + " \n", + "\n", + "" + ], "text/plain": [ - "model.onnx: 0%| | 0.00/1.34G [00:00 List[PointStruct]:\n", + " sparse_vectors = df[\"sparse_embedding\"].tolist()\n", + " product_texts = df[\"combined_text\"].tolist()\n", + " dense_vectors = df[\"dense_embedding\"].tolist()\n", + " points = []\n", + " for idx, (text, sparse_vector, dense_vector) in enumerate(zip(product_texts, sparse_vectors, dense_vectors)):\n", + " # print(sparse_vector)\n", + " sparse_vector = SparseVector(indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist())\n", + " point = PointStruct(\n", + " id=idx,\n", + " payload={\"text\": text}, # Add any additional payload if necessary\n", + " vector={\n", + " \"text-sparse\": sparse_vector,\n", + " \"text-dense\": dense_vector.tolist(),\n", + " },\n", + " )\n", + " points.append(point)\n", + " return points\n", + "\n", + "\n", + "points: List[PointStruct] = make_points(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "UpdateResult(operation_id=0, status=)" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "client.upsert(collection_name, points)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[ScoredPoint(id=0, version=0, score=0.7569294929688511, payload={'text': 'Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic\\nWhite\\nNone\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=11, version=0, score=0.7182552708165841, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=7, version=0, score=0.7113100728799315, payload={'text': 'Panasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=4, version=0, score=0.7076588242072893, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=6, version=0, score=0.7005667734073653, payload={'text': 'Panasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=5, version=0, score=0.6900119765792, payload={'text': 'Panasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=9, version=0, score=0.6695256112486503, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\"}, vector=None, shard_key=None),\n", + " ScoredPoint(id=1, version=0, score=0.6501192046745539, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\"}, vector=None, shard_key=None),\n", + " ScoredPoint(id=3, version=0, score=0.6466917604403921, payload={'text': 'Delta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDelta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nThis pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products: A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee.\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=2, version=0, score=0.6462960006277003, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited'}, vector=None, shard_key=None)],\n", + " [ScoredPoint(id=0, version=0, score=22.959232330322266, payload={'text': 'Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic\\nWhite\\nNone\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=5, version=0, score=20.864572525024414, payload={'text': 'Panasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=6, version=0, score=20.659486770629883, payload={'text': 'Panasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=4, version=0, score=20.283283233642578, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=7, version=0, score=20.057729721069336, payload={'text': 'Panasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=11, version=0, score=20.000377655029297, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=9, version=0, score=8.690065383911133, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\"}, vector=None, shard_key=None),\n", + " ScoredPoint(id=2, version=0, score=8.60843563079834, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited'}, vector=None, shard_key=None),\n", + " ScoredPoint(id=1, version=0, score=8.450508117675781, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\"}, vector=None, shard_key=None),\n", + " ScoredPoint(id=14, version=0, score=8.03786563873291, payload={'text': 'Broan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan-NuTone\\nWhite\\nNone\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home'}, vector=None, shard_key=None)]]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "query_text = \"panasonic fans\"\n", + "\n", + "# # Compute sparse and dense vectors\n", + "query_sparse_vectors: List[SparseEmbedding] = make_sparse_embedding(query_text)\n", + "query_dense_vector: List[np.ndarray] = make_dense_embedding(query_text)\n", + "\n", + "client.search_batch(\n", + " collection_name=collection_name,\n", + " requests=[\n", + " SearchRequest(\n", + " vector=NamedVector(\n", + " name=\"text-dense\",\n", + " vector=query_dense_vector[0],\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " SearchRequest(\n", + " vector=NamedSparseVector(\n", + " name=\"text-sparse\",\n", + " vector=SparseVector(\n", + " indices=query_sparse_vectors[0].indices.tolist(),\n", + " values=query_sparse_vectors[0].values.tolist(),\n", + " ),\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " ],\n", + ")" + ] + }, { "cell_type": "code", "execution_count": null, From 0d7700b9b15b948c680431fa10161ade76473e63 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Fri, 22 Mar 2024 15:15:27 +0530 Subject: [PATCH 05/10] Add RRF --- docs/how-to/Hybrid_Search.ipynb | 402 +++++++++++++++++++++++++++----- 1 file changed, 342 insertions(+), 60 deletions(-) diff --git a/docs/how-to/Hybrid_Search.ipynb b/docs/how-to/Hybrid_Search.ipynb index 5a150d9c..12d217c5 100644 --- a/docs/how-to/Hybrid_Search.ipynb +++ b/docs/how-to/Hybrid_Search.ipynb @@ -18,9 +18,12 @@ "1. Setup: Download and install the required dependencies\n", "2. Preview data: Load and preview the data\n", "3. Create Sparse Embeddings: Create SPLADE++ embeddings for the data\n", - "4. Create Dense Embeddings: Create BGE-Base-en-v1.5 embeddings for the data\n", + "4. Create Dense Embeddings: Create BGE-Large-en-v1.5 embeddings for the data\n", "5. Indexing: Index the embeddings using Qdrant\n", "6. Search: Perform Hybrid Search using FastEmbed & Qdrant\n", + "7. Ranking: Rank the search results with Reciprocal Rank Fusion (RRF)\n", + "8. Evaluation: Evaluate the search results\n", + "9. Conclusion: Summarize the results\n", "\n", "## Setup\n", "\n", @@ -38,7 +41,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 41, "metadata": {}, "outputs": [ { @@ -47,7 +50,7 @@ "'0.2.5'" ] }, - "execution_count": 2, + "execution_count": 41, "metadata": {}, "output_type": "execute_result" } @@ -70,6 +73,7 @@ " SparseIndexParams,\n", " SparseVectorParams,\n", " VectorParams,\n", + " ScoredPoint,\n", ")\n", "from transformers import AutoTokenizer\n", "\n", @@ -784,7 +788,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 47, "metadata": {}, "outputs": [], "source": [ @@ -792,13 +796,14 @@ " sparse_vectors = df[\"sparse_embedding\"].tolist()\n", " product_texts = df[\"combined_text\"].tolist()\n", " dense_vectors = df[\"dense_embedding\"].tolist()\n", + " rows = df.to_dict(orient=\"records\")\n", " points = []\n", " for idx, (text, sparse_vector, dense_vector) in enumerate(zip(product_texts, sparse_vectors, dense_vectors)):\n", " # print(sparse_vector)\n", " sparse_vector = SparseVector(indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist())\n", " point = PointStruct(\n", " id=idx,\n", - " payload={\"text\": text}, # Add any additional payload if necessary\n", + " payload={\"text\": text, \"product_id\": rows[idx]['product_id']}, # Add any additional payload if necessary\n", " vector={\n", " \"text-sparse\": sparse_vector,\n", " \"text-dense\": dense_vector.tolist(),\n", @@ -813,7 +818,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 48, "metadata": {}, "outputs": [ { @@ -822,7 +827,7 @@ "UpdateResult(operation_id=0, status=)" ] }, - "execution_count": 23, + "execution_count": 48, "metadata": {}, "output_type": "execute_result" } @@ -831,80 +836,357 @@ "client.upsert(collection_name, points)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Search" + ] + }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "def search(query_text: str):\n", + " # # Compute sparse and dense vectors\n", + " query_sparse_vectors: List[SparseEmbedding] = make_sparse_embedding(query_text)\n", + " query_dense_vector: List[np.ndarray] = make_dense_embedding(query_text)\n", + "\n", + " search_results = client.search_batch(\n", + " collection_name=collection_name,\n", + " requests=[\n", + " SearchRequest(\n", + " vector=NamedVector(\n", + " name=\"text-dense\",\n", + " vector=query_dense_vector[0],\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " SearchRequest(\n", + " vector=NamedSparseVector(\n", + " name=\"text-sparse\",\n", + " vector=SparseVector(\n", + " indices=query_sparse_vectors[0].indices.tolist(),\n", + " values=query_sparse_vectors[0].values.tolist(),\n", + " ),\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " ],\n", + " )\n", + "\n", + " return search_results\n", + "\n", + "query_text = \"panasonic fans\"\n", + "search_results = search(query_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ranking" + ] + }, + { + "cell_type": "code", + "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "[[ScoredPoint(id=0, version=0, score=0.7569294929688511, payload={'text': 'Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic\\nWhite\\nNone\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=11, version=0, score=0.7182552708165841, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=7, version=0, score=0.7113100728799315, payload={'text': 'Panasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=4, version=0, score=0.7076588242072893, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=6, version=0, score=0.7005667734073653, payload={'text': 'Panasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=5, version=0, score=0.6900119765792, payload={'text': 'Panasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=9, version=0, score=0.6695256112486503, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\"}, vector=None, shard_key=None),\n", - " ScoredPoint(id=1, version=0, score=0.6501192046745539, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\"}, vector=None, shard_key=None),\n", - " ScoredPoint(id=3, version=0, score=0.6466917604403921, payload={'text': 'Delta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDelta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nThis pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products: A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee.\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=2, version=0, score=0.6462960006277003, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited'}, vector=None, shard_key=None)],\n", - " [ScoredPoint(id=0, version=0, score=22.959232330322266, payload={'text': 'Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceiling Mounted Fan\\nPanasonic\\nWhite\\nNone\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp\\nWhisperCeiling fans feature a totally enclosed condenser motor and a double-tapered, dolphin-shaped bladed blower wheel to quietly move air\\nDesigned to give you continuous, trouble-free operation for many years thanks in part to its high-quality components and permanently lubricated motors which wear at a slower pace\\nDetachable adaptors, firmly secured duct ends, adjustable mounting brackets (up to 26-in), fan/motor units that detach easily from the housing and uncomplicated wiring all lend themselves to user-friendly installation\\nThis Panasonic fan has a built-in damper to prevent backdraft, which helps to prevent outside air from coming through the fan\\n0.35 amp'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=5, version=0, score=20.864572525024414, payload={'text': 'Panasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic FV-0511VQ1 WhisperCeiling DC Ventilation Fan, Speed Selector, SmartFlow Technology, Quiet,White\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a 4-inch or 6-inch duct adaptor ideal for new construction and renovations\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=6, version=0, score=20.659486770629883, payload={'text': 'Panasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic FV-0510VSL1 WhisperValue DC Ventilation Fan with Light, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nArchitectural Design: Architectural grade light fixture that gives powerful, yet quiet ventilation\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=4, version=0, score=20.283283233642578, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=7, version=0, score=20.057729721069336, payload={'text': 'Panasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic FV-0510VS1 WhisperValue DC Ventilation Fan, 50-80-100 CFM\\nPanasonic\\nWhite\\nNone\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nInstallation: Features a low profile can ideal for residential construction; celiing or wall mount\\nPrecision Spot Ventilation: Quiet and powerful ventilation while removing moisture and pollution\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 50, 80, 100 CFM\\nSlimmest Design: With a 3-3/8-Inch housing depth, WhisperValue DC is slimmest design available\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=11, version=0, score=20.000377655029297, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=9, version=0, score=8.690065383911133, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\"}, vector=None, shard_key=None),\n", - " ScoredPoint(id=2, version=0, score=8.60843563079834, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited'}, vector=None, shard_key=None),\n", - " ScoredPoint(id=1, version=0, score=8.450508117675781, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\"}, vector=None, shard_key=None),\n", - " ScoredPoint(id=14, version=0, score=8.03786563873291, payload={'text': 'Broan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan-NuTone\\nWhite\\nNone\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home'}, vector=None, shard_key=None)]]" + "[('A', 0.033465871107430434),\n", + " ('B', 0.033465871107430434),\n", + " ('D', 0.03320985472238179),\n", + " ('C', 0.03294544435749548),\n", + " ('E', 0.01775980832584606)]" ] }, - "execution_count": 24, + "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "query_text = \"panasonic fans\"\n", + "def rrf(rank_lists, alpha=60, default_rank=1000):\n", + " \"\"\"\n", + " Optimized Reciprocal Rank Fusion (RRF) using NumPy for large rank lists.\n", "\n", - "# # Compute sparse and dense vectors\n", - "query_sparse_vectors: List[SparseEmbedding] = make_sparse_embedding(query_text)\n", - "query_dense_vector: List[np.ndarray] = make_dense_embedding(query_text)\n", - "\n", - "client.search_batch(\n", - " collection_name=collection_name,\n", - " requests=[\n", - " SearchRequest(\n", - " vector=NamedVector(\n", - " name=\"text-dense\",\n", - " vector=query_dense_vector[0],\n", - " ),\n", - " limit=10,\n", - " with_payload=True,\n", - " ),\n", - " SearchRequest(\n", - " vector=NamedSparseVector(\n", - " name=\"text-sparse\",\n", - " vector=SparseVector(\n", - " indices=query_sparse_vectors[0].indices.tolist(),\n", - " values=query_sparse_vectors[0].values.tolist(),\n", - " ),\n", - " ),\n", - " limit=10,\n", - " with_payload=True,\n", - " ),\n", - " ],\n", - ")" + " :param rank_lists: A list of rank lists. Each rank list should be a list of (item, rank) tuples.\n", + " :param alpha: The parameter alpha used in the RRF formula. Default is 60.\n", + " :param default_rank: The default rank assigned to items not present in a rank list. Default is 1000.\n", + " :return: Sorted list of items based on their RRF scores.\n", + " \"\"\"\n", + " # Consolidate all unique items from all rank lists\n", + " all_items = set(item for rank_list in rank_lists for item, _ in rank_list)\n", + "\n", + " # Create a mapping of items to indices\n", + " item_to_index = {item: idx for idx, item in enumerate(all_items)}\n", + "\n", + " # Initialize a matrix to hold the ranks, filled with the default rank\n", + " rank_matrix = np.full((len(all_items), len(rank_lists)), default_rank)\n", + "\n", + " # Fill in the actual ranks from the rank lists\n", + " for list_idx, rank_list in enumerate(rank_lists):\n", + " for item, rank in rank_list:\n", + " rank_matrix[item_to_index[item], list_idx] = rank\n", + "\n", + " # Calculate RRF scores using NumPy operations\n", + " rrf_scores = np.sum(1.0 / (alpha + rank_matrix), axis=1)\n", + "\n", + " # Sort items based on RRF scores\n", + " sorted_indices = np.argsort(-rrf_scores) # Negative for descending order\n", + "\n", + " # Retrieve sorted items\n", + " sorted_items = [(list(item_to_index.keys())[idx], rrf_scores[idx]) for idx in sorted_indices]\n", + "\n", + " return sorted_items\n", + "\n", + "# Example usage\n", + "rank_list1 = [('A', 1), ('B', 2), ('C', 3)]\n", + "rank_list2 = [('B', 1), ('C', 2), ('D', 3)]\n", + "rank_list3 = [('A', 2), ('D', 1), ('E', 3)]\n", + "\n", + "# Combine the rank lists\n", + "sorted_items = rrf([rank_list1, rank_list2, rank_list3])\n", + "sorted_items" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on this, let's convert our sparse and dense results into rank lists. And then, we'll use the Reciprocal Rank Fusion (RRF) algorithm to combine them." + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "def rank_list(search_result: List[ScoredPoint]):\n", + " return [(point.id, rank+1) for rank, point in enumerate(search_result)]\n", + "\n", + "dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 52, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "rrf_rank_list = rrf([dense_rank_list, sparse_rank_list])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation\n", + "\n", + "Unlike a traditional IR dataset, we've ESCI labels: Exact, Substitute, Complementary, and Irrrelevant. \n", + "\n", + "To give us a sense of how \"good\" our search is performing, we'll measure the number of \"Exact\" labels in the top-k search results." + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", + "
" + ], + "text/plain": [ + " example_id query query_id product_id \\\n", + "0 0 revent 80 cfm 0 B000MOO21W \n", + "1 291891 bathroom fan without light 13723 B000MOO21W \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW \n", + "\n", + " product_locale esci_label small_version large_version \\\n", + "0 us Irrelevant 0 1 \n", + "1 us Exact 1 1 \n", + "2 us Exact 0 1 \n", + "3 us Exact 0 1 \n", + "4 us Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "\n", + " product_description \\\n", + "0 None \n", + "1 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "1 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "1 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "source_df.head()" + ] } ], "metadata": { From 3745a87f913ccb747b119a79f3382f5c02b21654 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Wed, 27 Mar 2024 18:51:43 +0530 Subject: [PATCH 06/10] Refactor code to improve performance and readability --- docs/examples/Hybrid_Search.ipynb | 1039 +++++++----------------- docs/how-to/Hybrid_Search.ipynb | 1213 ----------------------------- 2 files changed, 263 insertions(+), 1989 deletions(-) delete mode 100644 docs/how-to/Hybrid_Search.ipynb diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb index 5319a5f1..6a0bcf9f 100644 --- a/docs/examples/Hybrid_Search.ipynb +++ b/docs/examples/Hybrid_Search.ipynb @@ -18,9 +18,10 @@ "1. Setup: Download and install the required dependencies\n", "2. Preview data: Load and preview the data\n", "3. Create Sparse Embeddings: Create SPLADE++ embeddings for the data\n", - "4. Create Dense Embeddings: Create BGE-Base-en-v1.5 embeddings for the data\n", + "4. Create Dense Embeddings: Create BGE-Large-en-v1.5 embeddings for the data\n", "5. Indexing: Index the embeddings using Qdrant\n", "6. Search: Perform Hybrid Search using FastEmbed & Qdrant\n", + "7. Ranking: Rank the search results with Reciprocal Rank Fusion (RRF)\n", "\n", "## Setup\n", "\n", @@ -56,10 +57,23 @@ "import json\n", "from typing import List\n", "\n", + "import numpy as np\n", + "import pandas as pd\n", "from datasets import load_dataset\n", "from qdrant_client import QdrantClient\n", + "from qdrant_client.http.models import (\n", + " Distance,\n", + " NamedSparseVector,\n", + " NamedVector,\n", + " SparseVector,\n", + " PointStruct,\n", + " SearchRequest,\n", + " SparseIndexParams,\n", + " SparseVectorParams,\n", + " VectorParams,\n", + " ScoredPoint,\n", + ")\n", "from transformers import AutoTokenizer\n", - "from qdrant_client.http.models import VectorParams, SparseVectorParams, Distance, SparseIndexParams, PointStruct\n", "\n", "import fastembed\n", "from fastembed.sparse.sparse_text_embedding import SparseEmbedding, SparseTextEmbedding\n", @@ -75,8 +89,10 @@ "outputs": [], "source": [ "dataset = load_dataset(\"tasksource/esci\")\n", - "# We'll select the first 100 examples for this demo\n", - "dataset = dataset[\"train\"].select(range(100))" + "# We'll select the first 1000 examples for this demo\n", + "dataset = dataset[\"train\"].select(range(1000))\n", + "dataset = dataset.filter(lambda x: x['product_locale'] == \"us\")\n", + "dataset" ] }, { @@ -88,407 +104,39 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", - "
" - ], - "text/plain": [ - " example_id query query_id product_id \\\n", - "0 0 revent 80 cfm 0 B000MOO21W \n", - "1 291891 bathroom fan without light 13723 B000MOO21W \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW \n", - "\n", - " product_locale esci_label small_version large_version \\\n", - "0 us Irrelevant 0 1 \n", - "1 us Exact 1 1 \n", - "2 us Exact 0 1 \n", - "3 us Exact 0 1 \n", - "4 us Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "\n", - " product_description \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "1 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "1 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "df = dataset.to_pandas()\n", + "source_df = dataset.to_pandas()\n", + "df = source_df.drop_duplicates(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n", + "df = df.dropna(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n", "df.head()" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", - "
" - ], - "text/plain": [ - " example_id query query_id product_id \\\n", - "0 0 revent 80 cfm 0 B000MOO21W \n", - "1 291891 bathroom fan without light 13723 B000MOO21W \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW \n", - "\n", - " product_locale esci_label small_version large_version \\\n", - "0 us Irrelevant 0 1 \n", - "1 us Exact 1 1 \n", - "2 us Exact 0 1 \n", - "3 us Exact 0 1 \n", - "4 us Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "\n", - " product_description \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "1 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "1 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "df = df[df.product_locale == \"us\"]\n", - "df = df[df.product_text.notna()]\n", - "df.head()" + "print(f\"Catalog Item Count: {len(df)}\\nQueries: {len(source_df)}\")" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "100" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], + "source": [ + "df[\"combined_text\"] = df[\"product_title\"] + \"\\n\" + df[\"product_text\"] + \"\\n\" + df[\"product_bullet_point\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "len(df)" ] @@ -502,262 +150,9 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "aabde43c3c3043248e4a1eff6755a17b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Fetching 9 files: 0%| | 0/9 [00:00 List[PointStruct]:\n", + " sparse_vectors = df[\"sparse_embedding\"].tolist()\n", + " product_texts = df[\"combined_text\"].tolist()\n", + " dense_vectors = df[\"dense_embedding\"].tolist()\n", + " rows = df.to_dict(orient=\"records\")\n", + " points = []\n", + " for idx, (text, sparse_vector, dense_vector) in enumerate(zip(product_texts, sparse_vectors, dense_vectors)):\n", + " # print(sparse_vector)\n", + " sparse_vector = SparseVector(indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist())\n", + " point = PointStruct(\n", + " id=idx,\n", + " payload={\"text\": text, \"product_id\": rows[idx]['product_id']}, # Add any additional payload if necessary\n", + " vector={\n", + " \"text-sparse\": sparse_vector,\n", + " \"text-dense\": dense_vector.tolist(),\n", + " },\n", + " )\n", + " points.append(point)\n", + " return points\n", + "\n", + "\n", + "points: List[PointStruct] = make_points(df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "client.upsert(collection_name, points)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Search" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def search(query_text: str):\n", + " # # Compute sparse and dense vectors\n", + " query_sparse_vectors: List[SparseEmbedding] = make_sparse_embedding(query_text)\n", + " query_dense_vector: List[np.ndarray] = make_dense_embedding(query_text)\n", + "\n", + " search_results = client.search_batch(\n", + " collection_name=collection_name,\n", + " requests=[\n", + " SearchRequest(\n", + " vector=NamedVector(\n", + " name=\"text-dense\",\n", + " vector=query_dense_vector[0],\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " SearchRequest(\n", + " vector=NamedSparseVector(\n", + " name=\"text-sparse\",\n", + " vector=SparseVector(\n", + " indices=query_sparse_vectors[0].indices.tolist(),\n", + " values=query_sparse_vectors[0].values.tolist(),\n", + " ),\n", + " ),\n", + " limit=10,\n", + " with_payload=True,\n", + " ),\n", + " ],\n", + " )\n", + "\n", + " return search_results\n", + "\n", + "query_text = \"panasonic fans\"\n", + "search_results = search(query_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ranking" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def rrf(rank_lists, alpha=60, default_rank=1000):\n", + " \"\"\"\n", + " Optimized Reciprocal Rank Fusion (RRF) using NumPy for large rank lists.\n", + "\n", + " :param rank_lists: A list of rank lists. Each rank list should be a list of (item, rank) tuples.\n", + " :param alpha: The parameter alpha used in the RRF formula. Default is 60.\n", + " :param default_rank: The default rank assigned to items not present in a rank list. Default is 1000.\n", + " :return: Sorted list of items based on their RRF scores.\n", + " \"\"\"\n", + " # Consolidate all unique items from all rank lists\n", + " all_items = set(item for rank_list in rank_lists for item, _ in rank_list)\n", + "\n", + " # Create a mapping of items to indices\n", + " item_to_index = {item: idx for idx, item in enumerate(all_items)}\n", + "\n", + " # Initialize a matrix to hold the ranks, filled with the default rank\n", + " rank_matrix = np.full((len(all_items), len(rank_lists)), default_rank)\n", + "\n", + " # Fill in the actual ranks from the rank lists\n", + " for list_idx, rank_list in enumerate(rank_lists):\n", + " for item, rank in rank_list:\n", + " rank_matrix[item_to_index[item], list_idx] = rank\n", + "\n", + " # Calculate RRF scores using NumPy operations\n", + " rrf_scores = np.sum(1.0 / (alpha + rank_matrix), axis=1)\n", + "\n", + " # Sort items based on RRF scores\n", + " sorted_indices = np.argsort(-rrf_scores) # Negative for descending order\n", + "\n", + " # Retrieve sorted items\n", + " sorted_items = [(list(item_to_index.keys())[idx], rrf_scores[idx]) for idx in sorted_indices]\n", + "\n", + " return sorted_items\n", + "\n", + "# Example usage\n", + "rank_list1 = [('A', 1), ('B', 2), ('C', 3)]\n", + "rank_list2 = [('B', 1), ('C', 2), ('D', 3)]\n", + "rank_list3 = [('A', 2), ('D', 1), ('E', 3)]\n", + "\n", + "# Combine the rank lists\n", + "sorted_items = rrf([rank_list1, rank_list2, rank_list3])\n", + "sorted_items" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on this, let's convert our sparse and dense results into rank lists. And then, we'll use the Reciprocal Rank Fusion (RRF) algorithm to combine them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def rank_list(search_result: List[ScoredPoint]):\n", + " return [(point.id, rank+1) for rank, point in enumerate(search_result)]\n", + "\n", + "dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rrf_rank_list = rrf([dense_rank_list, sparse_rank_list])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "rrf_rank_list" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/docs/how-to/Hybrid_Search.ipynb b/docs/how-to/Hybrid_Search.ipynb deleted file mode 100644 index 12d217c5..00000000 --- a/docs/how-to/Hybrid_Search.ipynb +++ /dev/null @@ -1,1213 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Hybrid Search with FastEmbed & Qdrant\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## What will we do?\n", - "This notebook demonstrates the usage of Hybrid Search with FastEmbed & Qdrant. \n", - "\n", - "1. Setup: Download and install the required dependencies\n", - "2. Preview data: Load and preview the data\n", - "3. Create Sparse Embeddings: Create SPLADE++ embeddings for the data\n", - "4. Create Dense Embeddings: Create BGE-Large-en-v1.5 embeddings for the data\n", - "5. Indexing: Index the embeddings using Qdrant\n", - "6. Search: Perform Hybrid Search using FastEmbed & Qdrant\n", - "7. Ranking: Rank the search results with Reciprocal Rank Fusion (RRF)\n", - "8. Evaluation: Evaluate the search results\n", - "9. Conclusion: Summarize the results\n", - "\n", - "## Setup\n", - "\n", - "In order to get started, you need only two dependencies, and we'll install them next:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !pip install -qU qdrant-client fastembed datasets transformers" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'0.2.5'" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import json\n", - "from typing import List\n", - "\n", - "import numpy as np\n", - "import pandas as pd\n", - "from datasets import load_dataset\n", - "from qdrant_client import QdrantClient\n", - "from qdrant_client.http.models import (\n", - " Distance,\n", - " NamedSparseVector,\n", - " NamedVector,\n", - " SparseVector,\n", - " PointStruct,\n", - " SearchRequest,\n", - " SparseIndexParams,\n", - " SparseVectorParams,\n", - " VectorParams,\n", - " ScoredPoint,\n", - ")\n", - "from transformers import AutoTokenizer\n", - "\n", - "import fastembed\n", - "from fastembed.sparse.sparse_text_embedding import SparseEmbedding, SparseTextEmbedding\n", - "from fastembed.text.text_embedding import TextEmbedding\n", - "\n", - "fastembed.__version__" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Dataset({\n", - " features: ['example_id', 'query', 'query_id', 'product_id', 'product_locale', 'esci_label', 'small_version', 'large_version', 'product_title', 'product_description', 'product_bullet_point', 'product_brand', 'product_color', 'product_text'],\n", - " num_rows: 919\n", - "})" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dataset = load_dataset(\"tasksource/esci\")\n", - "# We'll select the first 1000 examples for this demo\n", - "dataset = dataset[\"train\"].select(range(1000))\n", - "dataset = dataset.filter(lambda x: x['product_locale'] == \"us\")\n", - "dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Preview Data" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
54revent 80 cfm0B07QJ7WYFQusExact01Panasonic FV-08VRE2 Ventilation Fan with Reces...NoneThe design solution for Fan/light combinations...PanasonicWhitePanasonic FV-08VRE2 Ventilation Fan with Reces...
\n", - "
" - ], - "text/plain": [ - " example_id query query_id product_id product_locale \\\n", - "0 0 revent 80 cfm 0 B000MOO21W us \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V us \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ us \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW us \n", - "5 4 revent 80 cfm 0 B07QJ7WYFQ us \n", - "\n", - " esci_label small_version large_version \\\n", - "0 Irrelevant 0 1 \n", - "2 Exact 0 1 \n", - "3 Exact 0 1 \n", - "4 Exact 0 1 \n", - "5 Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "5 Panasonic FV-08VRE2 Ventilation Fan with Reces... \n", - "\n", - " product_description \\\n", - "0 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "5 None \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "5 The design solution for Fan/light combinations... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "5 Panasonic White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "5 Panasonic FV-08VRE2 Ventilation Fan with Reces... " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "source_df = dataset.to_pandas()\n", - "df = source_df.drop_duplicates(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n", - "df = df.dropna(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Catalog Item Count: 176\n", - "Queries: 919\n" - ] - } - ], - "source": [ - "print(f\"Catalog Item Count: {len(df)}\\nQueries: {len(source_df)}\")" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "df[\"combined_text\"] = df[\"product_title\"] + \"\\n\" + df[\"product_text\"] + \"\\n\" + df[\"product_bullet_point\"]" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "176" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(df)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Sparse Embeddings" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "e0432a19066144718055687feed872d7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Fetching 9 files: 0%| | 0/9 [00:00 List[PointStruct]:\n", - " sparse_vectors = df[\"sparse_embedding\"].tolist()\n", - " product_texts = df[\"combined_text\"].tolist()\n", - " dense_vectors = df[\"dense_embedding\"].tolist()\n", - " rows = df.to_dict(orient=\"records\")\n", - " points = []\n", - " for idx, (text, sparse_vector, dense_vector) in enumerate(zip(product_texts, sparse_vectors, dense_vectors)):\n", - " # print(sparse_vector)\n", - " sparse_vector = SparseVector(indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist())\n", - " point = PointStruct(\n", - " id=idx,\n", - " payload={\"text\": text, \"product_id\": rows[idx]['product_id']}, # Add any additional payload if necessary\n", - " vector={\n", - " \"text-sparse\": sparse_vector,\n", - " \"text-dense\": dense_vector.tolist(),\n", - " },\n", - " )\n", - " points.append(point)\n", - " return points\n", - "\n", - "\n", - "points: List[PointStruct] = make_points(df)" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "UpdateResult(operation_id=0, status=)" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "client.upsert(collection_name, points)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Search" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [], - "source": [ - "def search(query_text: str):\n", - " # # Compute sparse and dense vectors\n", - " query_sparse_vectors: List[SparseEmbedding] = make_sparse_embedding(query_text)\n", - " query_dense_vector: List[np.ndarray] = make_dense_embedding(query_text)\n", - "\n", - " search_results = client.search_batch(\n", - " collection_name=collection_name,\n", - " requests=[\n", - " SearchRequest(\n", - " vector=NamedVector(\n", - " name=\"text-dense\",\n", - " vector=query_dense_vector[0],\n", - " ),\n", - " limit=10,\n", - " with_payload=True,\n", - " ),\n", - " SearchRequest(\n", - " vector=NamedSparseVector(\n", - " name=\"text-sparse\",\n", - " vector=SparseVector(\n", - " indices=query_sparse_vectors[0].indices.tolist(),\n", - " values=query_sparse_vectors[0].values.tolist(),\n", - " ),\n", - " ),\n", - " limit=10,\n", - " with_payload=True,\n", - " ),\n", - " ],\n", - " )\n", - "\n", - " return search_results\n", - "\n", - "query_text = \"panasonic fans\"\n", - "search_results = search(query_text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Ranking" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('A', 0.033465871107430434),\n", - " ('B', 0.033465871107430434),\n", - " ('D', 0.03320985472238179),\n", - " ('C', 0.03294544435749548),\n", - " ('E', 0.01775980832584606)]" - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "def rrf(rank_lists, alpha=60, default_rank=1000):\n", - " \"\"\"\n", - " Optimized Reciprocal Rank Fusion (RRF) using NumPy for large rank lists.\n", - "\n", - " :param rank_lists: A list of rank lists. Each rank list should be a list of (item, rank) tuples.\n", - " :param alpha: The parameter alpha used in the RRF formula. Default is 60.\n", - " :param default_rank: The default rank assigned to items not present in a rank list. Default is 1000.\n", - " :return: Sorted list of items based on their RRF scores.\n", - " \"\"\"\n", - " # Consolidate all unique items from all rank lists\n", - " all_items = set(item for rank_list in rank_lists for item, _ in rank_list)\n", - "\n", - " # Create a mapping of items to indices\n", - " item_to_index = {item: idx for idx, item in enumerate(all_items)}\n", - "\n", - " # Initialize a matrix to hold the ranks, filled with the default rank\n", - " rank_matrix = np.full((len(all_items), len(rank_lists)), default_rank)\n", - "\n", - " # Fill in the actual ranks from the rank lists\n", - " for list_idx, rank_list in enumerate(rank_lists):\n", - " for item, rank in rank_list:\n", - " rank_matrix[item_to_index[item], list_idx] = rank\n", - "\n", - " # Calculate RRF scores using NumPy operations\n", - " rrf_scores = np.sum(1.0 / (alpha + rank_matrix), axis=1)\n", - "\n", - " # Sort items based on RRF scores\n", - " sorted_indices = np.argsort(-rrf_scores) # Negative for descending order\n", - "\n", - " # Retrieve sorted items\n", - " sorted_items = [(list(item_to_index.keys())[idx], rrf_scores[idx]) for idx in sorted_indices]\n", - "\n", - " return sorted_items\n", - "\n", - "# Example usage\n", - "rank_list1 = [('A', 1), ('B', 2), ('C', 3)]\n", - "rank_list2 = [('B', 1), ('C', 2), ('D', 3)]\n", - "rank_list3 = [('A', 2), ('D', 1), ('E', 3)]\n", - "\n", - "# Combine the rank lists\n", - "sorted_items = rrf([rank_list1, rank_list2, rank_list3])\n", - "sorted_items" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Based on this, let's convert our sparse and dense results into rank lists. And then, we'll use the Reciprocal Rank Fusion (RRF) algorithm to combine them." - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [], - "source": [ - "def rank_list(search_result: List[ScoredPoint]):\n", - " return [(point.id, rank+1) for rank, point in enumerate(search_result)]\n", - "\n", - "dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [], - "source": [ - "rrf_rank_list = rrf([dense_rank_list, sparse_rank_list])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Evaluation\n", - "\n", - "Unlike a traditional IR dataset, we've ESCI labels: Exact, Substitute, Complementary, and Irrrelevant. \n", - "\n", - "To give us a sense of how \"good\" our search is performing, we'll measure the number of \"Exact\" labels in the top-k search results." - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
1291891bathroom fan without light13723B000MOO21WusExact11Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
\n", - "
" - ], - "text/plain": [ - " example_id query query_id product_id \\\n", - "0 0 revent 80 cfm 0 B000MOO21W \n", - "1 291891 bathroom fan without light 13723 B000MOO21W \n", - "2 1 revent 80 cfm 0 B07X3Y6B1V \n", - "3 2 revent 80 cfm 0 B07WDM7MQQ \n", - "4 3 revent 80 cfm 0 B07RH6Z8KW \n", - "\n", - " product_locale esci_label small_version large_version \\\n", - "0 us Irrelevant 0 1 \n", - "1 us Exact 1 1 \n", - "2 us Exact 0 1 \n", - "3 us Exact 0 1 \n", - "4 us Exact 0 1 \n", - "\n", - " product_title \\\n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", - "\n", - " product_description \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 This pre-owned or refurbished product has been... \n", - "\n", - " product_bullet_point \\\n", - "0 WhisperCeiling fans feature a totally enclosed... \n", - "1 WhisperCeiling fans feature a totally enclosed... \n", - "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", - "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", - "\n", - " product_brand product_color \\\n", - "0 Panasonic White \n", - "1 Panasonic White \n", - "2 Homewerks 80 CFM \n", - "3 Homewerks White \n", - "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", - "\n", - " product_text \n", - "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "1 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", - "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", - "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", - "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... " - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "source_df.head()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "fst", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.13" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} From 035d33d84d4202cfc45df07895aa5476508be01a Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Thu, 28 Mar 2024 14:40:40 +0530 Subject: [PATCH 07/10] Add ESCI label for the RRF results --- docs/examples/Hybrid_Search.ipynb | 716 +++++++++++++++++++++++++++--- 1 file changed, 658 insertions(+), 58 deletions(-) diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb index 6a0bcf9f..5ba8f731 100644 --- a/docs/examples/Hybrid_Search.ipynb +++ b/docs/examples/Hybrid_Search.ipynb @@ -5,7 +5,8 @@ "metadata": {}, "source": [ "# Hybrid Search with FastEmbed & Qdrant\n", - "\n" + "\n", + "Author: [Nirant Kasliwal](https://twitter.com/nirantk)" ] }, { @@ -39,7 +40,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -48,14 +49,14 @@ "'0.2.5'" ] }, - "execution_count": 3, + "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", - "from typing import List\n", + "from typing import List, Tuple\n", "\n", "import numpy as np\n", "import pandas as pd\n", @@ -86,12 +87,26 @@ "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['example_id', 'query', 'query_id', 'product_id', 'product_locale', 'esci_label', 'small_version', 'large_version', 'product_title', 'product_description', 'product_bullet_point', 'product_brand', 'product_color', 'product_text'],\n", + " num_rows: 919\n", + "})" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "dataset = load_dataset(\"tasksource/esci\")\n", "# We'll select the first 1000 examples for this demo\n", "dataset = dataset[\"train\"].select(range(1000))\n", - "dataset = dataset.filter(lambda x: x['product_locale'] == \"us\")\n", + "dataset = dataset.filter(lambda x: x[\"product_locale\"] == \"us\")\n", "dataset" ] }, @@ -104,9 +119,192 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
example_idqueryquery_idproduct_idproduct_localeesci_labelsmall_versionlarge_versionproduct_titleproduct_descriptionproduct_bullet_pointproduct_brandproduct_colorproduct_text
00revent 80 cfm0B000MOO21WusIrrelevant01Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...NoneWhisperCeiling fans feature a totally enclosed...PanasonicWhitePanasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil...
21revent 80 cfm0B07X3Y6B1VusExact01Homewerks 7141-80 Bathroom Fan Integrated LED ...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...Homewerks80 CFMHomewerks 7141-80 Bathroom Fan Integrated LED ...
32revent 80 cfm0B07WDM7MQQusExact01Homewerks 7140-80 Bathroom Fan Ceiling Mount E...NoneOUTSTANDING PERFORMANCE: This Homewerk's bath ...HomewerksWhiteHomewerks 7140-80 Bathroom Fan Ceiling Mount E...
43revent 80 cfm0B07RH6Z8KWusExact01Delta Electronics RAD80L BreezRadiance 80 CFM ...This pre-owned or refurbished product has been...Quiet operation at 1.5 sones\\nBuilt-in thermos...DELTA ELECTRONICS (AMERICAS) LTD.WhiteDelta Electronics RAD80L BreezRadiance 80 CFM ...
54revent 80 cfm0B07QJ7WYFQusExact01Panasonic FV-08VRE2 Ventilation Fan with Reces...NoneThe design solution for Fan/light combinations...PanasonicWhitePanasonic FV-08VRE2 Ventilation Fan with Reces...
\n", + "
" + ], + "text/plain": [ + " example_id query query_id product_id product_locale \\\n", + "0 0 revent 80 cfm 0 B000MOO21W us \n", + "2 1 revent 80 cfm 0 B07X3Y6B1V us \n", + "3 2 revent 80 cfm 0 B07WDM7MQQ us \n", + "4 3 revent 80 cfm 0 B07RH6Z8KW us \n", + "5 4 revent 80 cfm 0 B07QJ7WYFQ us \n", + "\n", + " esci_label small_version large_version \\\n", + "0 Irrelevant 0 1 \n", + "2 Exact 0 1 \n", + "3 Exact 0 1 \n", + "4 Exact 0 1 \n", + "5 Exact 0 1 \n", + "\n", + " product_title \\\n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "5 Panasonic FV-08VRE2 Ventilation Fan with Reces... \n", + "\n", + " product_description \\\n", + "0 None \n", + "2 None \n", + "3 None \n", + "4 This pre-owned or refurbished product has been... \n", + "5 None \n", + "\n", + " product_bullet_point \\\n", + "0 WhisperCeiling fans feature a totally enclosed... \n", + "2 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "3 OUTSTANDING PERFORMANCE: This Homewerk's bath ... \n", + "4 Quiet operation at 1.5 sones\\nBuilt-in thermos... \n", + "5 The design solution for Fan/light combinations... \n", + "\n", + " product_brand product_color \\\n", + "0 Panasonic White \n", + "2 Homewerks 80 CFM \n", + "3 Homewerks White \n", + "4 DELTA ELECTRONICS (AMERICAS) LTD. White \n", + "5 Panasonic White \n", + "\n", + " product_text \n", + "0 Panasonic FV-20VQ3 WhisperCeiling 190 CFM Ceil... \n", + "2 Homewerks 7141-80 Bathroom Fan Integrated LED ... \n", + "3 Homewerks 7140-80 Bathroom Fan Ceiling Mount E... \n", + "4 Delta Electronics RAD80L BreezRadiance 80 CFM ... \n", + "5 Panasonic FV-08VRE2 Ventilation Fan with Reces... " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "source_df = dataset.to_pandas()\n", "df = source_df.drop_duplicates(subset=[\"product_text\", \"product_title\", \"product_bullet_point\", \"product_brand\"])\n", @@ -116,16 +314,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Catalog Item Count: 176\n", + "Queries: 919\n" + ] + } + ], "source": [ "print(f\"Catalog Item Count: {len(df)}\\nQueries: {len(source_df)}\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, "outputs": [], "source": [ @@ -134,9 +341,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "176" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "len(df)" ] @@ -150,9 +368,38 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "950d0fba9dc848e29e563ed378f7e21e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 9 files: 0%| | 0/9 [00:00 List[PointStruct]:\n", + "def make_points(df: pd.DataFrame) -> List[PointStruct]:\n", " sparse_vectors = df[\"sparse_embedding\"].tolist()\n", " product_texts = df[\"combined_text\"].tolist()\n", " dense_vectors = df[\"dense_embedding\"].tolist()\n", @@ -374,7 +816,7 @@ " sparse_vector = SparseVector(indices=sparse_vector.indices.tolist(), values=sparse_vector.values.tolist())\n", " point = PointStruct(\n", " id=idx,\n", - " payload={\"text\": text, \"product_id\": rows[idx]['product_id']}, # Add any additional payload if necessary\n", + " payload={\"text\": text, \"product_id\": rows[idx][\"product_id\"]}, # Add any additional payload if necessary\n", " vector={\n", " \"text-sparse\": sparse_vector,\n", " \"text-dense\": dense_vector.tolist(),\n", @@ -389,9 +831,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "UpdateResult(operation_id=0, status=)" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "client.upsert(collection_name, points)" ] @@ -405,7 +858,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 46, "metadata": {}, "outputs": [], "source": [ @@ -441,7 +894,8 @@ "\n", " return search_results\n", "\n", - "query_text = \"panasonic fans\"\n", + "\n", + "query_text = \" revent 80 cfm\"\n", "search_results = search(query_text)" ] }, @@ -449,14 +903,36 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Ranking" + "## Ranking\n", + "\n", + "We'll combine the results from the two models using Reciprocal Rank Fusion (RRF). You can read more about RRF [here](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).\n", + "\n", + "We select RRF for this task because:\n", + "1. It is a simple and effective method for combining search results.\n", + "2. It is robust to the differences in the ranking scores of the two or more ranking lists.\n", + "3. It is easy to implement and requires minimal tuning (only one parameter: alpha, which we don't tune here)." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 47, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[('A', 0.033465871107430434),\n", + " ('B', 0.033465871107430434),\n", + " ('D', 0.03320985472238179),\n", + " ('C', 0.03294544435749548),\n", + " ('E', 0.01775980832584606)]" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "def rrf(rank_lists, alpha=60, default_rank=1000):\n", " \"\"\"\n", @@ -492,10 +968,11 @@ "\n", " return sorted_items\n", "\n", + "\n", "# Example usage\n", - "rank_list1 = [('A', 1), ('B', 2), ('C', 3)]\n", - "rank_list2 = [('B', 1), ('C', 2), ('D', 3)]\n", - "rank_list3 = [('A', 2), ('D', 1), ('E', 3)]\n", + "rank_list1 = [(\"A\", 1), (\"B\", 2), (\"C\", 3)]\n", + "rank_list2 = [(\"B\", 1), (\"C\", 2), (\"D\", 3)]\n", + "rank_list3 = [(\"A\", 2), (\"D\", 1), (\"E\", 3)]\n", "\n", "# Combine the rank lists\n", "sorted_items = rrf([rank_list1, rank_list2, rank_list3])\n", @@ -511,19 +988,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def rank_list(search_result: List[ScoredPoint]):\n", - " return [(point.id, rank+1) for rank, point in enumerate(search_result)]\n", + " return [(point.id, rank + 1) for rank, point in enumerate(search_result)]\n", + "\n", "\n", "dense_rank_list, sparse_rank_list = rank_list(search_results[0]), rank_list(search_results[1])" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 49, "metadata": {}, "outputs": [], "source": [ @@ -532,19 +1010,141 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 50, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[(3, 0.032018442622950824),\n", + " (8, 0.03149801587301587),\n", + " (1, 0.03131881575727918),\n", + " (13, 0.030834914611005692),\n", + " (15, 0.030536130536130537),\n", + " (9, 0.030309988518943745),\n", + " (12, 0.030158730158730156),\n", + " (14, 0.029437229437229435),\n", + " (11, 0.028985507246376812),\n", + " (2, 0.01707242848447961),\n", + " (4, 0.01564927857935627)]" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "rrf_rank_list" ] }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Record(id=3, payload={'text': 'Delta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDelta Electronics RAD80L BreezRadiance 80 CFM Heater/Fan/Light Combo White (Renewed)\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nThis pre-owned or refurbished product has been professionally inspected and tested to work and look like new. How a product becomes part of Amazon Renewed, your destination for pre-owned, refurbished products: A customer buys a new product and returns it or trades it in for a newer or different model. That product is inspected and tested to work and look like new by Amazon-qualified suppliers. Then, the product is sold as an Amazon Renewed product on Amazon. If not satisfied with the purchase, renewed products are eligible for replacement or refund under the Amazon Renewed Guarantee.\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter\\nQuiet operation at 1.5 sones\\nBuilt-in thermostat regulates temperature. Energy efficiency at 7.6 CFM/Watt\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nGalvanized steel construction resists corrosion\\nDuct: Detachable 4-inch Plastic Duct Adapter', 'product_id': 'B07RH6Z8KW'}, vector=None, shard_key=None),\n", + " Record(id=8, payload={'text': 'Aero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\\nAero Pure ABF80 L5 W ABF80L5 Ceiling Mount 80 CFM w/LED Light/Nightlight, Energy Star Certified, White Quiet Bathroom Ventilation Fan\\nAero Pure\\nWhite\\nNone\\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2\"x8\" joists\\nQuiet 0.3 Sones, 80 CFM fan with choice of three designer grilles in White, Satin Nickel, or Oil Rubbed Bronze; Full 6 year warranty\\n10W 3000K 800 Lumens LED Light with 0.7W Nightlight included\\nInstallation friendly- Quick-mount adjustable metal bracket for new construction and retrofit; 4”, 5: and 6” metal duct adaptor included\\nMeets today’s demanding building specifications- ETL Listed for wet application, ENERGY STAR certified, CALGreen, JA-8 Compliant for CA Title 24, and ASHRAE 62.2 compliant\\nHousing dimensions- 10 2/5”x10 2/5”x 7 ½”; Grille dimensions- 13”x13”; Fits 2\"x8\" joists', 'product_id': 'B07JY1PQNT'}, vector=None, shard_key=None),\n", + " Record(id=1, payload={'text': \"Homewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks 7141-80 Bathroom Fan Integrated LED Light Ceiling Mount Exhaust Ventilation, 1.1 Sones, 80 CFM\\nHomewerks\\n80 CFM\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk's bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1.1 sones at 80 CFM which means it’s able to manage spaces up to 80 square feet and is very quiet..\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a modern style round shape and has an 4000K Cool White Light LED Light. AC motor.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in. 13 in round grill and 4 in round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\", 'product_id': 'B07X3Y6B1V'}, vector=None, shard_key=None),\n", + " Record(id=13, payload={'text': 'Delta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\\nDelta BreezSignature VFB25ACH 80 CFM Exhaust Bath Fan with Humidity Sensor\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWhite\\nNone\\nVirtually silent at less than 0.3 sones\\nPrecision engineered with DC brushless motor for extended reliability\\nEasily switch in and out of humidity sensing mode by toggling wall switch\\nENERGY STAR qualified for efficient cost-saving operation\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances\\nVirtually silent at less than 0.3 sones\\nPrecision engineered with DC brushless motor for extended reliability\\nEasily switch in and out of humidity sensing mode by toggling wall switch\\nENERGY STAR qualified for efficient cost-saving operation\\nPrecision engineered with DC brushless motor for extended reliability, this fan will outlast many household appliances', 'product_id': 'B003O0MNGC'}, vector=None, shard_key=None),\n", + " Record(id=15, payload={'text': 'Delta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\\nDelta Electronics (Americas) Ltd. GBR80HLED Delta BreezGreenBuilder Series 80 CFM Fan/Dimmable H, LED Light, Dual Speed & Humidity Sensor\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith LED Light, Dual Speed & Humidity Sensor\\nNone\\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look\\nUltra energy-efficient LED module (11-watt equivalent to 60-watt incandescent light) included. Main light output-850 Lumens, 3000K\\nExtracts air at a rate of 80 CFM to properly ventilate bathrooms up to 80 sq. Ft., quiet operation at 0.8 sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nEnergy Star qualified for efficient cost-saving operation, galvanized steel construction resists corrosion\\nFan impeller Stops If obstructed, for safe worry-free operation, attractive grille gives your bathroom a fresh look', 'product_id': 'B01N5Y6002'}, vector=None, shard_key=None),\n", + " Record(id=9, payload={'text': \"Delta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDelta Electronics (Americas) Ltd. RAD80 Delta BreezRadiance Series 80 CFM Fan with Heater, 10.5W, 1.5 Sones\\nDELTA ELECTRONICS (AMERICAS) LTD.\\nWith Heater\\nNone\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\\nQuiet operation at 1.5 Sones\\nPrecision engineered with DC brushless motor for extended reliability, this Fan will outlast many household appliances\\nGalvanized steel construction resists corrosion, equipped with metal duct adapter\\nFan impeller Stops If obstructed, for safe worry-free operation\\nPeace of mind quality, performance and reliability from the world's largest DC brushless Fan Manufacturer\", 'product_id': 'B01MZIK0PI'}, vector=None, shard_key=None),\n", + " Record(id=12, payload={'text': 'Aero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\\nAero Pure AP80RVLW Super Quiet 80 CFM Recessed Fan/Light Bathroom Ventilation Fan with White Trim Ring\\nAero Pure\\nWhite\\nNone\\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)\\nSuper quiet 80CFM energy efficient fan virtually disappears into the ceiling leaving only a recessed light in view\\nMay be installed over shower when wired to a GFCI breaker and used with a PAR30L 75W (max) CFL\\nBulb not included. Accepts any of the following bulbs: 75W Max. PAR30, 14W Max. BR30 LED, or 75W Max. PAR30L (for use over tub/shower.)', 'product_id': 'B00MARNO5Y'}, vector=None, shard_key=None),\n", + " Record(id=14, payload={'text': 'Broan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan Very Quiet Ceiling Bathroom Exhaust Fan, ENERGY STAR Certified, 0.3 Sones, 80 CFM\\nBroan-NuTone\\nWhite\\nNone\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home\\nHIGH-QUALITY FAN: Very quiet, energy efficient exhaust fan runs on 0. 3 Sones and is motor engineered for continuous operation\\nEFFICIENT: Operates at 80 CFM in bathrooms up to 75 sq. ft. for a high-quality performance. Dimmable Capability: Non Dimmable\\nEASY INSTALLATION: Fan is easy to install and/or replace existing product for DIY\\'ers and needs only 2\" x 8\" construction space. Can be used over bathtubs or showers when connected to a GFCI protected branch circuit\\nFEATURES: Includes hanger bar system for fast, flexible installation for all types of construction and a 6\" ducting for superior performance\\nCERTIFIED: ENERGY STAR qualified and HVI Certified to ensure the best quality for your home', 'product_id': 'B001E6DMKY'}, vector=None, shard_key=None),\n", + " Record(id=11, payload={'text': 'Panasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic FV-0811VF5 WhisperFit EZ Retrofit Ventilation Fan, 80 or 110 CFM\\nPanasonic\\nWhite\\nNone\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy\\nRetrofit Solution: Ideal for residential remodeling, hotel construction or renovations\\nLow Profile: 5-5/8-Inch housing depth fits in a 2 x 6 construction\\nPick-A-Flow Speed Selector: Allows you to pick desired airflow from 80 or 110 CFM\\nFlexible Installation: Comes with Flex-Z Fast bracket for easy, fast and trouble-free installation\\nEnergy Star Rated: Delivers powerful airflow without wasting energy', 'product_id': 'B00XBZFWWM'}, vector=None, shard_key=None),\n", + " Record(id=2, payload={'text': 'Homewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks 7140-80 Bathroom Fan Ceiling Mount Exhaust Ventilation, 1.5 Sones, 80 CFM, White\\nHomewerks\\nWhite\\nNone\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited\\nOUTSTANDING PERFORMANCE: This Homewerk\\'s bath fan ensures comfort in your home by quietly eliminating moisture and humidity in the bathroom. This exhaust fan is 1. 5 sone at 110 CFM which means it’s able to manage spaces up to 110 square feet\\nBATH FANS HELPS REMOVE HARSH ODOR: When cleaning the bathroom or toilet, harsh chemicals are used and they can leave an obnoxious odor behind. Homewerk’s bathroom fans can help remove this odor with its powerful ventilation\\nBUILD QUALITY: Designed to be corrosion resistant with its galvanized steel construction featuring a grille modern style.\\nEASY INSTALLATION: This exhaust bath fan is easy to install with its no-cut design and ceiling mount ventilation. Ceiling Opening (L) 7-1/2 in x Ceiling Opening (W) 7-1/4 x Ceiling Opening (H) 5-3/4 in and a 4\" round duct connector.\\nHOMEWERKS TRUSTED QUALITY: Be confident in the quality and construction of each and every one of our products. We ensure that all of our products are produced and certified to regional, national and international industry standards. We are proud of the products we sell, you will be too. 3 Year Limited', 'product_id': 'B07WDM7MQQ'}, vector=None, shard_key=None),\n", + " Record(id=4, payload={'text': 'Panasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic FV-08VRE2 Ventilation Fan with Recessed LED (Renewed)\\nPanasonic\\nWhite\\nNone\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design\\nThe design solution for Fan/light combinations\\nEnergy Star rated architectural grade recessed Fan/LED light\\nQuiet, energy efficient and powerful 80 CFM ventilation hidden above the Ceiling\\nLED lamp is dimmable\\nBeautiful Lighting with 6-1/2”aperture and advanced luminaire design', 'product_id': 'B07QJ7WYFQ'}, vector=None, shard_key=None)]" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def find_point_by_id(client: QdrantClient, collection_name: str, rrf_rank_list: List[Tuple[int, float]]):\n", + " return client.retrieve(collection_name=collection_name, ids=[item[0] for item in rrf_rank_list])\n", + "\n", + "\n", + "find_point_by_id(client, collection_name, rrf_rank_list)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, let's check the ESCI (Exact, Substitute, Compliment, and Irrelvant) label for the results against the source data." + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n", + "Exact\n" + ] + } + ], + "source": [ + "ids = [item[0] for item in rrf_rank_list]\n", + "df[df['query'] == query_text]\n", + "\n", + "for idx in ids:\n", + " print(df.iloc[idx]['esci_label'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This was amazing! We pulled only Exact results with k=10. This is a great result for a small dataset like this with out of the box vectors which are not even fine-tuned for e-Commerce." + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "11" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(rrf_rank_list)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "In this notebook, we demonstrated the usage of Hybrid Search with FastEmbed & Qdrant. We used FastEmbed to create Sparse and Dense embeddings for the data and indexed them using Qdrant. We then performed Hybrid Search using FastEmbed & Qdrant and ranked the search results using Reciprocal Rank Fusion (RRF)." + ] } ], "metadata": { From 5d0c25b3b4ad50b42edc5f8e0fbc6ebde3931ab2 Mon Sep 17 00:00:00 2001 From: Nirant Date: Thu, 28 Mar 2024 17:24:45 +0530 Subject: [PATCH 08/10] Update docs/examples/Hybrid_Search.ipynb Co-authored-by: Anush --- docs/examples/Hybrid_Search.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb index 5ba8f731..894ddcec 100644 --- a/docs/examples/Hybrid_Search.ipynb +++ b/docs/examples/Hybrid_Search.ipynb @@ -26,7 +26,7 @@ "\n", "## Setup\n", "\n", - "In order to get started, you need only two dependencies, and we'll install them next:" + "In order to get started, you need a few dependencies, and we'll install them next:" ] }, { From 0800fca2eb05f4ebee5294f99d95575fb99ef8f9 Mon Sep 17 00:00:00 2001 From: Nirant Date: Thu, 28 Mar 2024 17:26:08 +0530 Subject: [PATCH 09/10] Update docs/examples/Hybrid_Search.ipynb Co-authored-by: Anush --- docs/examples/Hybrid_Search.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb index 894ddcec..0a4d1cf2 100644 --- a/docs/examples/Hybrid_Search.ipynb +++ b/docs/examples/Hybrid_Search.ipynb @@ -819,7 +819,7 @@ " payload={\"text\": text, \"product_id\": rows[idx][\"product_id\"]}, # Add any additional payload if necessary\n", " vector={\n", " \"text-sparse\": sparse_vector,\n", - " \"text-dense\": dense_vector.tolist(),\n", + " \"text-dense\": dense_vector,\n", " },\n", " )\n", " points.append(point)\n", From 783b222064fe271c6e9242a87ddfc3a8ba639d73 Mon Sep 17 00:00:00 2001 From: Nirant Kasliwal Date: Thu, 28 Mar 2024 17:26:17 +0530 Subject: [PATCH 10/10] Remove unnecessary code and update vector format --- docs/examples/Hybrid_Search.ipynb | 48 +++---------------------------- 1 file changed, 4 insertions(+), 44 deletions(-) diff --git a/docs/examples/Hybrid_Search.ipynb b/docs/examples/Hybrid_Search.ipynb index 5ba8f731..19ea98cb 100644 --- a/docs/examples/Hybrid_Search.ipynb +++ b/docs/examples/Hybrid_Search.ipynb @@ -624,26 +624,6 @@ "product_texts = df[\"combined_text\"].tolist()" ] }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[]" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "[x for x in product_texts if not isinstance(x, str)]" - ] - }, { "cell_type": "code", "execution_count": 17, @@ -781,27 +761,7 @@ }, { "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "SparseVector(indices=[1, 2, 3], values=[0.1, 0.2, 0.3])" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "SparseVector(indices=[1, 2, 3], values=[0.1, 0.2, 0.3])" - ] - }, - { - "cell_type": "code", - "execution_count": 23, + "execution_count": 59, "metadata": {}, "outputs": [], "source": [ @@ -819,7 +779,7 @@ " payload={\"text\": text, \"product_id\": rows[idx][\"product_id\"]}, # Add any additional payload if necessary\n", " vector={\n", " \"text-sparse\": sparse_vector,\n", - " \"text-dense\": dense_vector.tolist(),\n", + " \"text-dense\": dense_vector,\n", " },\n", " )\n", " points.append(point)\n", @@ -831,7 +791,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 60, "metadata": {}, "outputs": [ { @@ -840,7 +800,7 @@ "UpdateResult(operation_id=0, status=)" ] }, - "execution_count": 24, + "execution_count": 60, "metadata": {}, "output_type": "execute_result" }