Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings normalization fixes #14284

Merged
merged 6 commits into from
Oct 11, 2024
Merged

Embeddings normalization fixes #14284

merged 6 commits into from
Oct 11, 2024

Conversation

hawkeye217
Copy link
Collaborator

@hawkeye217 hawkeye217 commented Oct 11, 2024

Proposed change

For the Jina AI models, text-text cosine similarity is normally larger than text-image cosine similarity. This PR applies normalization for all vector based searches, uses cosine similarity on image-image searches, and only saves Z score normalization stats for multi-modal searches.

  • Use cosine distance instead of euclidean distance for vector tables (requires users to reindex embeddings)
  • Ensure we fetch model state on Explore page load

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code

Additional information

Users running the dev builds will have to set reindex: True in their config to regenerate their embeddings as distance function has now been changed to cosine.

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • The code has been formatted using Ruff (ruff format frigate)

Copy link

netlify bot commented Oct 11, 2024

Deploy Preview for frigate-docs canceled.

Name Link
🔨 Latest commit 1165c42
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/6709679ce5cdbf0008ae64ab

@hawkeye217 hawkeye217 marked this pull request as draft October 11, 2024 17:18
@hawkeye217 hawkeye217 marked this pull request as ready for review October 11, 2024 18:02
@hawkeye217 hawkeye217 merged commit 8a8a0c7 into dev Oct 11, 2024
13 checks passed
@hawkeye217 hawkeye217 deleted the normalization-fixes branch October 11, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants