[`feat`] Integrate NanoBeIR datasets #2966

ArthurCamara · 2024-09-27T14:54:05Z

As discussed in #2848 (comment), This PR adds a new Evaluator based on the NanoBEIR collection of datasets.

It creates one InformationRetrievalEvaluator for each dataset, and aggregates the results accordingly.

Example:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

# Load a model
model = SentenceTransformer('all-mpnet-base-v2')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
"QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\nQuery: ",
"MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
}

evaluator = NanoBEIREvaluator(
dataset_names=datasets,
name="NanoBEIR",
query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEeIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoBeIRNanoQuoraRetrieval
Evaluating NanoBeIRNanoMSMARCO

Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
Aggregated for Score Function: dot
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
'''
logger.info(evaluator.primary_metric)
# => "cosine_ndcg@10"
logger.info(results["mean"][evaluator.primary_metric])
# => 0.5512516989358924

(Note that this depends on #2951)

…-padded.

tomaarsen · 2024-10-17T11:50:20Z

Although the Be portion obviously stands for Benchmark, I think the abbreviated "BEIR" is usually fully capitalized, so I'd like to propagate that in this PR as well.

tomaarsen · 2024-10-17T14:15:21Z

I'm experimenting with having all outputs in the final dict, rather than a nested dict. This way, people can use any value from the evaluator to guide their e.g. early stopping. It should also match the SequentialEvaluator performance, even though the results from the NanoBEIR are now a bit hectic (i.e., one massive dict).

I hope it's okay if I push into this PR!

ArthurCamara and others added 15 commits September 23, 2024 07:55

Added the possibility of masking the prompts if the tokenizer is left…

7dc7990

…-padded.

Simplify code

8d7b88b

Remove unrelated changes

c92e334

Add separate query and corpus prompts for IREvaluator

6419121

Add query and corpus prompt_name

c0ae3f6

Merge branch 'UKPLab:master' into Integrate-NanoBEIR-datasets

84063e8

Added NanoBEIREvaluator

f27c918

Rename, example and better logging

e35d454

Fix for all datasets

fec088e

Merge branch 'UKPLab:master' into Integrate-NanoBEIR-datasets

4869ea5

Remove unrelated changes

4a82531

Remove unrelated changes

8944de0

Remove unrelated changes

c018084

Remove unrelated changes

657d1a5

Remove wrong function call to InformationRetrievalEvaluator

8460cfc

tomaarsen added 2 commits October 17, 2024 13:53

Merge branch 'master' into pr-2966

f8b4b4e

Fix issue introduced in merge

2cfd817

Flatten output dict, remove 'name' as we already know the dataset names

daf25c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`feat`] Integrate NanoBeIR datasets #2966

[`feat`] Integrate NanoBeIR datasets #2966

ArthurCamara commented Sep 27, 2024

tomaarsen commented Oct 17, 2024

tomaarsen commented Oct 17, 2024 •

edited

Loading

[feat] Integrate NanoBeIR datasets #2966

Are you sure you want to change the base?

[feat] Integrate NanoBeIR datasets #2966

Conversation

ArthurCamara commented Sep 27, 2024

tomaarsen commented Oct 17, 2024

tomaarsen commented Oct 17, 2024 • edited Loading

[`feat`] Integrate NanoBeIR datasets #2966

[`feat`] Integrate NanoBeIR datasets #2966

tomaarsen commented Oct 17, 2024 •

edited

Loading