Add MTEB output for Google's text-embedding-005 #88

chenblair · 2025-01-08T03:36:42Z

Output is generated by in-house Google evaluation script forked off of original MTEB evaluation code

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the results files checker make pre-push.

Muennighoff

Amazing! @KennethEnevoldsen is external the right folder? When naming the folder no_revision_available the tests failed for @chenblair. I had assumed that 1 would be the wrong folder to put it in since that one is only for results run with https://github.com/embeddings-benchmark/mteb/blob/25f4f618f1694d1155919c9771c551fa70b5049b/mteb/models/google_models.py#L157 while these results are probably run with the internal Google implementation. Or is 1 fine?

results/google__text-embedding-005/external/model_meta.json

results/google__text-embedding-005/external/AmazonCounterfactualClassification.json

results/google__text-embedding-005/external/ArguAna.json

KennethEnevoldsen · 2025-01-08T17:00:01Z

Amazing! @KennethEnevoldsen is external the right folder?

External should be fine, unless it is in fact run with our implementation then it is in fact 1. If not I would recommend that they make a PR of their implementation (allows users to know how they should use the API).

Output is generated by in-house Google evaluation script forked off of original MTEB evaluation code

Explains some of the point. Would rerun it using the latest version of the public code-base.

It seems like you are already using the implemented wrapper (from the model_meta.json):

https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/google_models.py

So I would assume that it should work out of the box.

Co-authored-by: Kenneth Enevoldsen <[email protected]>

chenblair · 2025-01-09T01:04:17Z

Thanks for the detailed comments and discussion! I will look into making a PR of our current implementation, the metrics as is should be ready to merge.

KennethEnevoldsen · 2025-01-09T09:54:56Z

Fixes look good, I have this in

chenblair · 2025-01-15T18:52:23Z

@Muennighoff

I'm not currently seeing the text-embedding-005 metrics on the main MTEB leaderboard (https://huggingface.co/spaces/mteb/leaderboard). Is there anything else I need to do to have this model be populated on the leaderboard?

Thanks!

Muennighoff · 2025-01-17T22:17:16Z

I think the paths.json file hasn't been updated? You need to run this function from the source of the repo

results/results.py

Line 298 in c33da34

def get_paths():

I think? cc @KennethEnevoldsen

The new leaderboard will no longer require that and I think it already shows up there (http://mteb-leaderboard-2-demo.hf.space/?benchmark_name=MTEB%28eng%2C+classic%29)

Samoed · 2025-01-18T09:16:01Z

I think the paths.json file hasn't been updated? You need to run this function from the source of the repo.

Also to be added to old leaderboard you need to update model_meta.yaml and run refresh.py

Blair Chen added 2 commits January 8, 2025 00:32

Add MTEB output for text-embedding-005

1da2cdd

change revision to external

f32f6b7

Muennighoff approved these changes Jan 8, 2025

View reviewed changes

KennethEnevoldsen reviewed Jan 8, 2025

View reviewed changes

chenblair and others added 4 commits January 8, 2025 12:08

Update results/google__text-embedding-005/external/model_meta.json

0c35ee6

Co-authored-by: Kenneth Enevoldsen <[email protected]>

Merge branch 'main' into text-embedding-005_mteb

b17c887

add reference to 005 model meta

e5f18de

add dataset revisions to retrieval datasets

ecc84bb

KennethEnevoldsen merged commit daeacb0 into embeddings-benchmark:main Jan 9, 2025
2 checks passed

chenblair deleted the text-embedding-005_mteb branch January 9, 2025 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MTEB output for Google's text-embedding-005 #88

Add MTEB output for Google's text-embedding-005 #88

chenblair commented Jan 8, 2025

Muennighoff left a comment

KennethEnevoldsen commented Jan 8, 2025 •

edited

Loading

chenblair commented Jan 9, 2025

KennethEnevoldsen commented Jan 9, 2025

chenblair commented Jan 15, 2025

Muennighoff commented Jan 17, 2025

Samoed commented Jan 18, 2025

Add MTEB output for Google's text-embedding-005 #88

Add MTEB output for Google's text-embedding-005 #88

Conversation

chenblair commented Jan 8, 2025

Checklist

Muennighoff left a comment

Choose a reason for hiding this comment

KennethEnevoldsen commented Jan 8, 2025 • edited Loading

chenblair commented Jan 9, 2025

KennethEnevoldsen commented Jan 9, 2025

chenblair commented Jan 15, 2025

Muennighoff commented Jan 17, 2025

Samoed commented Jan 18, 2025

KennethEnevoldsen commented Jan 8, 2025 •

edited

Loading