Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to calculate similarity score between two embeddings without storing them in a collection #804

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mocobeta
Copy link
Contributor

@mocobeta mocobeta commented Feb 28, 2025

Embeddings are great, but they are meaningful only in the relationship between each other.
I thought it would be helpful if we had a command to directly calculate the similarity score between two contents without storing them in a collection when trying out embedding models.

This PR adds a command embed-score, which takes two contents and returns the cosine similarity score (and actual embeddings when the given format is 'json').

# Basic usage
llm embed-score -c1 "I like pelicans" -c2 "I love pelicans" -m 3-small
0.9376833959553552

The new function describes my rough intention, but I'm fully open to any suggestions on the interface/implementation if this feature is worth having in this tool.

Confession: This is a collaborative work with Anthropic's Cline and Claude 3.7. I prompted Cline to write a new function I wanted and post-edited the generated code. Claude did a great job but couldn't produce unit tests that worked and aligned with existing fixtures (in a reasonable time slot). The whole process greatly helped me to understand the codebase.

Still needs to

  • Add documentation about the new command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant