You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, repro_eval crashes if a baseline run retrieves no documents for a topic that an advanced run has.
In our situation, we have a duoT5 run in comparison with a BM25 run, and the duoT5 run re-ranks the BM25 run.
There is a topic for which BM25 retrieves only a single document, and therefore, duoT5 can not build any pairs, so duoT5 retrieves no documents for the topics for which BM25 retrieves one document.
The opposite direction should also be handled, i.e., when the advanced run does not retrieve a document for a topic.
A possible solution would be that repro_eval implements the empty set behaviour of ir_measures? (citing from the linked page "queries that appear in the qrels but not the run are given a score of 0. Queries that do not appear in the run may have returned no results, and therefore be scored as such."
What do you think?
The text was updated successfully, but these errors were encountered:
A method like this would maybe solve it (adding pseudo non-relevant document to the run) if all parsing would be done with this method:
@staticmethod
def parse_run(run_file, qrels=None):
"""
This parses the passed run_file with pytrec_eval.
For each topic in the qrels that is not in the run,
we add a single non-relevant document to implement the empty set behaviour of ir_measures:
Queries that appear in the qrels but not the run should become a score of 0, so we add a non-relevant document for topics that are not retrieved.
"""
if not qrels:
qrels = {}
with open(run_file, 'r') as f_run:
run = pytrec_eval.parse_run(f_run)
for topic_in_qrels in qrels.keys():
if topic_in_qrels not in run:
run[topic_in_qrels] = {f'non-relevant-pseudo-document-for-topic-{topic_in_qrels}': 1}
return {t: run[t] for t in sorted(run)}
But I am also not too sure if one should implement this in repro_eval, or if one should rather fix the run.
Dear all,
Thank you very much for this nice tool!
At the moment, repro_eval crashes if a baseline run retrieves no documents for a topic that an advanced run has.
In our situation, we have a duoT5 run in comparison with a BM25 run, and the duoT5 run re-ranks the BM25 run.
There is a topic for which BM25 retrieves only a single document, and therefore, duoT5 can not build any pairs, so duoT5 retrieves no documents for the topics for which BM25 retrieves one document.
The program fails when the baseline run has a missing topic: https://github.com/irgroup/repro_eval/blob/master/repro_eval/measure/overall_effects.py#L18
The opposite direction should also be handled, i.e., when the advanced run does not retrieve a document for a topic.
A possible solution would be that repro_eval implements the empty set behaviour of ir_measures? (citing from the linked page "queries that appear in the qrels but not the run are given a score of 0. Queries that do not appear in the run may have returned no results, and therefore be scored as such."
What do you think?
The text was updated successfully, but these errors were encountered: