Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sqlite-vec extension instead of chromadb for embeddings #14163

Merged
merged 56 commits into from
Oct 7, 2024
Merged

Conversation

hawkeye217
Copy link
Collaborator

@hawkeye217 hawkeye217 commented Oct 5, 2024

Proposed change

The initial implementation of semantic search used Chroma for its database. Rather than using a separate database and engine for saving vectors/embeddings, this PR refactors the embeddings manager and semantic search feature to use the sqlite-vec extension to sqlite3 to store embeddings directy in Frigate's database.

Queries are still made through Peewee's SqliteQueueDatabase queue manager class.

This reduces resource usage on the host machine and theoretically will make semantic search faster.

sqlite-vec is a new package still undergoing rapid development. Once it supports metadata filtering and distance sorting (desc), further improvements can be made.

This PR also:

  • Manually fetches minilm onnx models and transformers (similar to the existing CLIP model)
  • Creates a new class, SqliteVecQueueDatabase, which loads the sqlite-vec extension when semantic search is enabled
  • Removes chromadb everywhere (Dockerfile, s6, UI)
  • Adds uvicorn dep (needed for fastapi but was no longer installed after removing Chroma)
  • Retains the custom pysqlite3 build
  • Adds a new threaded downloader class to download models
  • Updates genai deps

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code

Users running the dev builds will have to set reindex: True in their config to regenerate their embeddings and store them in the Frigate database. Additionally, the chroma database folder at /config/chroma can be manually removed by users as it's no longer used.

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • The code has been formatted using Ruff (ruff format frigate)

Copy link

netlify bot commented Oct 5, 2024

Deploy Preview for frigate-docs canceled.

Name Link
🔨 Latest commit e3a81db
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/670442a7e77975000848c780

@hawkeye217 hawkeye217 marked this pull request as ready for review October 7, 2024 18:45
@NickM-27 NickM-27 merged commit 24ac9f3 into dev Oct 7, 2024
13 checks passed
@NickM-27 NickM-27 deleted the sqlite-vec branch October 7, 2024 20:30
@jameslivulpi
Copy link

working good and seems more snappy :)

Embedded 25608 thumbnails and 18740 descriptions in 5481.727172374725 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants