Skip to content

Commit

Permalink
Use JinaAI models for embeddings (#14252)
Browse files Browse the repository at this point in the history
* add generic onnx model class and use jina ai clip models for all embeddings

* fix merge confligt

* add generic onnx model class and use jina ai clip models for all embeddings

* fix merge confligt

* preferred providers

* fix paths

* disable download progress bar

* remove logging of path

* drop and recreate tables on reindex

* use cache paths

* fix model name

* use trust remote code per transformers docs

* ensure tokenizer and feature extractor are correctly loaded

* revert

* manually download and cache feature extractor config

* remove unneeded

* remove old clip and minilm code

* docs update
  • Loading branch information
hawkeye217 authored Oct 9, 2024
1 parent dbeaf43 commit d492562
Show file tree
Hide file tree
Showing 7 changed files with 277 additions and 331 deletions.
10 changes: 4 additions & 6 deletions docs/docs/configuration/semantic_search.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: Using Semantic Search

Semantic Search in Frigate allows you to find tracked objects within your review items using either the image itself, a user-defined text description, or an automatically generated one. This feature works by creating _embeddings_ — numerical vector representations — for both the images and text descriptions of your tracked objects. By comparing these embeddings, Frigate assesses their similarities to deliver relevant search results.

Frigate has support for two models to create embeddings, both of which run locally: [OpenAI CLIP](https://openai.com/research/clip) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Embeddings are then saved to Frigate's database.
Frigate has support for [Jina AI's CLIP model](https://huggingface.co/jinaai/jina-clip-v1) to create embeddings, which runs locally. Embeddings are then saved to Frigate's database.

Semantic Search is accessed via the _Explore_ view in the Frigate UI.

Expand All @@ -27,13 +27,11 @@ If you are enabling the Search feature for the first time, be advised that Friga

:::

### OpenAI CLIP
### Jina AI CLIP

This model is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on tracked objects to encode the thumbnail image and store it in the database. When searching for tracked objects via text in the search box, Frigate will perform a `text -> image` similarity search against this embedding. When clicking "Find Similar" in the tracked object detail pane, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails.
The vision model is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on tracked objects to encode the thumbnail image and store it in the database. When searching for tracked objects via text in the search box, Frigate will perform a `text -> image` similarity search against this embedding. When clicking "Find Similar" in the tracked object detail pane, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails.

### all-MiniLM-L6-v2

This is a sentence embedding model that has been fine tuned on over 1 billion sentence pairs. This model is used to embed tracked object descriptions and perform searches against them. Descriptions can be created, viewed, and modified on the Search page when clicking on the gray tracked object chip at the top left of each review item. See [the Generative AI docs](/configuration/genai.md) for more information on how to automatically generate tracked object descriptions.
The text model is used to embed tracked object descriptions and perform searches against them. Descriptions can be created, viewed, and modified on the Search page when clicking on the gray tracked object chip at the top left of each review item. See [the Generative AI docs](/configuration/genai.md) for more information on how to automatically generate tracked object descriptions.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion frigate/embeddings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class EmbeddingsContext:
def __init__(self, db: SqliteVecQueueDatabase):
self.embeddings = Embeddings(db)
self.thumb_stats = ZScoreNormalization()
self.desc_stats = ZScoreNormalization(scale_factor=3, bias=-2.5)
self.desc_stats = ZScoreNormalization()

# load stats from disk
try:
Expand Down
86 changes: 66 additions & 20 deletions frigate/embeddings/embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import time
from typing import List, Tuple, Union

import numpy as np
from PIL import Image
from playhouse.shortcuts import model_to_dict

Expand All @@ -16,8 +17,7 @@
from frigate.models import Event
from frigate.types import ModelStatusTypesEnum

from .functions.clip import ClipEmbedding
from .functions.minilm_l6_v2 import MiniLMEmbedding
from .functions.onnx import GenericONNXEmbedding

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -53,9 +53,23 @@ def get_metadata(event: Event) -> dict:
)


def serialize(vector: List[float]) -> bytes:
"""Serializes a list of floats into a compact "raw bytes" format"""
return struct.pack("%sf" % len(vector), *vector)
def serialize(vector: Union[List[float], np.ndarray, float]) -> bytes:
"""Serializes a list of floats, numpy array, or single float into a compact "raw bytes" format"""
if isinstance(vector, np.ndarray):
# Convert numpy array to list of floats
vector = vector.flatten().tolist()
elif isinstance(vector, (float, np.float32, np.float64)):
# Handle single float values
vector = [vector]
elif not isinstance(vector, list):
raise TypeError(
f"Input must be a list of floats, a numpy array, or a single float. Got {type(vector)}"
)

try:
return struct.pack("%sf" % len(vector), *vector)
except struct.error as e:
raise ValueError(f"Failed to pack vector: {e}. Vector: {vector}")


def deserialize(bytes_data: bytes) -> List[float]:
Expand All @@ -74,10 +88,10 @@ def __init__(self, db: SqliteVecQueueDatabase) -> None:
self._create_tables()

models = [
"sentence-transformers/all-MiniLM-L6-v2-model.onnx",
"sentence-transformers/all-MiniLM-L6-v2-tokenizer",
"clip-clip_image_model_vitb32.onnx",
"clip-clip_text_model_vitb32.onnx",
"jinaai/jina-clip-v1-text_model_fp16.onnx",
"jinaai/jina-clip-v1-tokenizer",
"jinaai/jina-clip-v1-vision_model_fp16.onnx",
"jinaai/jina-clip-v1-preprocessor_config.json",
]

for model in models:
Expand All @@ -89,10 +103,33 @@ def __init__(self, db: SqliteVecQueueDatabase) -> None:
},
)

self.clip_embedding = ClipEmbedding(
preferred_providers=["CPUExecutionProvider"]
def jina_text_embedding_function(outputs):
return outputs[0]

def jina_vision_embedding_function(outputs):
return outputs[0]

self.text_embedding = GenericONNXEmbedding(
model_name="jinaai/jina-clip-v1",
model_file="text_model_fp16.onnx",
tokenizer_file="tokenizer",
download_urls={
"text_model_fp16.onnx": "https://huggingface.co/jinaai/jina-clip-v1/resolve/main/onnx/text_model_fp16.onnx",
},
embedding_function=jina_text_embedding_function,
model_type="text",
preferred_providers=["CPUExecutionProvider"],
)
self.minilm_embedding = MiniLMEmbedding(

self.vision_embedding = GenericONNXEmbedding(
model_name="jinaai/jina-clip-v1",
model_file="vision_model_fp16.onnx",
download_urls={
"vision_model_fp16.onnx": "https://huggingface.co/jinaai/jina-clip-v1/resolve/main/onnx/vision_model_fp16.onnx",
"preprocessor_config.json": "https://huggingface.co/jinaai/jina-clip-v1/resolve/main/preprocessor_config.json",
},
embedding_function=jina_vision_embedding_function,
model_type="vision",
preferred_providers=["CPUExecutionProvider"],
)

Expand All @@ -101,23 +138,30 @@ def _create_tables(self):
self.db.execute_sql("""
CREATE VIRTUAL TABLE IF NOT EXISTS vec_thumbnails USING vec0(
id TEXT PRIMARY KEY,
thumbnail_embedding FLOAT[512]
thumbnail_embedding FLOAT[768]
);
""")

# Create vec0 virtual table for description embeddings
self.db.execute_sql("""
CREATE VIRTUAL TABLE IF NOT EXISTS vec_descriptions USING vec0(
id TEXT PRIMARY KEY,
description_embedding FLOAT[384]
description_embedding FLOAT[768]
);
""")

def _drop_tables(self):
self.db.execute_sql("""
DROP TABLE vec_descriptions;
""")
self.db.execute_sql("""
DROP TABLE vec_thumbnails;
""")

def upsert_thumbnail(self, event_id: str, thumbnail: bytes):
# Convert thumbnail bytes to PIL Image
image = Image.open(io.BytesIO(thumbnail)).convert("RGB")
# Generate embedding using CLIP
embedding = self.clip_embedding([image])[0]
embedding = self.vision_embedding([image])[0]

self.db.execute_sql(
"""
Expand All @@ -130,8 +174,7 @@ def upsert_thumbnail(self, event_id: str, thumbnail: bytes):
return embedding

def upsert_description(self, event_id: str, description: str):
# Generate embedding using MiniLM
embedding = self.minilm_embedding([description])[0]
embedding = self.text_embedding([description])[0]

self.db.execute_sql(
"""
Expand Down Expand Up @@ -177,7 +220,7 @@ def search_thumbnail(
thumbnail = base64.b64decode(query.thumbnail)
query_embedding = self.upsert_thumbnail(query.id, thumbnail)
else:
query_embedding = self.clip_embedding([query])[0]
query_embedding = self.text_embedding([query])[0]

sql_query = """
SELECT
Expand Down Expand Up @@ -211,7 +254,7 @@ def search_thumbnail(
def search_description(
self, query_text: str, event_ids: List[str] = None
) -> List[Tuple[str, float]]:
query_embedding = self.minilm_embedding([query_text])[0]
query_embedding = self.text_embedding([query_text])[0]

# Prepare the base SQL query
sql_query = """
Expand Down Expand Up @@ -246,6 +289,9 @@ def search_description(
def reindex(self) -> None:
logger.info("Indexing event embeddings...")

self._drop_tables()
self._create_tables()

st = time.time()
totals = {
"thumb": 0,
Expand Down
166 changes: 0 additions & 166 deletions frigate/embeddings/functions/clip.py

This file was deleted.

Loading

0 comments on commit d492562

Please sign in to comment.