Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MTEB output for Google's text-embedding-005 #88

Merged

Conversation

chenblair
Copy link
Contributor

Output is generated by in-house Google evaluation script forked off of original MTEB evaluation code

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the results files checker make pre-push.

Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! @KennethEnevoldsen is external the right folder? When naming the folder no_revision_available the tests failed for @chenblair. I had assumed that 1 would be the wrong folder to put it in since that one is only for results run with https://github.com/embeddings-benchmark/mteb/blob/25f4f618f1694d1155919c9771c551fa70b5049b/mteb/models/google_models.py#L157 while these results are probably run with the internal Google implementation. Or is 1 fine?

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Jan 8, 2025

Amazing! @KennethEnevoldsen is external the right folder?

External should be fine, unless it is in fact run with our implementation then it is in fact 1. If not I would recommend that they make a PR of their implementation (allows users to know how they should use the API).

Output is generated by in-house Google evaluation script forked off of original MTEB evaluation code

Explains some of the point. Would rerun it using the latest version of the public code-base.

It seems like you are already using the implemented wrapper (from the model_meta.json):

https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/google_models.py

So I would assume that it should work out of the box.

@chenblair
Copy link
Contributor Author

Thanks for the detailed comments and discussion! I will look into making a PR of our current implementation, the metrics as is should be ready to merge.

@KennethEnevoldsen KennethEnevoldsen merged commit daeacb0 into embeddings-benchmark:main Jan 9, 2025
2 checks passed
@KennethEnevoldsen
Copy link
Contributor

Fixes look good, I have this in

@chenblair chenblair deleted the text-embedding-005_mteb branch January 9, 2025 21:44
@chenblair
Copy link
Contributor Author

@Muennighoff

I'm not currently seeing the text-embedding-005 metrics on the main MTEB leaderboard (https://huggingface.co/spaces/mteb/leaderboard). Is there anything else I need to do to have this model be populated on the leaderboard?

Thanks!

@Muennighoff
Copy link
Contributor

I think the paths.json file hasn't been updated? You need to run this function from the source of the repo

def get_paths():
I think? cc @KennethEnevoldsen

The new leaderboard will no longer require that and I think it already shows up there (http://mteb-leaderboard-2-demo.hf.space/?benchmark_name=MTEB%28eng%2C+classic%29)

@Samoed
Copy link
Collaborator

Samoed commented Jan 18, 2025

I think the paths.json file hasn't been updated? You need to run this function from the source of the repo.

Also to be added to old leaderboard you need to update model_meta.yaml and run refresh.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants