Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added reproduction log entry for BM25 MS MARCO Passage Ranking #2600

Closed
wants to merge 1,817 commits into from

Conversation

RMaarefdoust
Copy link

@RMaarefdoust RMaarefdoust commented Sep 13, 2024

This pull request includes an update to the reproduction log for the BM25 MS MARCO Passage Ranking. The entry has been added to document the reproduction results for this specific baseline.

Details:

Results reproduced by: [@RMaarefdoust](https://github.com/RMaarefdoust) on 2024-09-17
Commit: [e8b24f6](https://github.com/castorini/anserini/commit/e8b24f69ec45692a5fff641568727c8e46aa195a)

System Environment:

Java Version: 21.0.4
Maven Version: 3.9.4
Python Version: 3.8.8

lintool and others added 30 commits November 27, 2023 20:44
Major refactoring of indexing pipeline (IndexCollection, IndexHnswDenseVectors, and IndexInvertedDenseVectors),
extracting common code paths into AbstractIndexer.
+ Refactored to create HnswDenseSearcher and InvertedDenseSearcher; these will provide Python bindings (later)
+ Refactored SearchHnswDenseVectors and SearchInvertedDenseVectors as wrappers to provide main
+ Improved test coverage
* add PrebuiltIndexHandler
    * add a download progress bar
    * add MD5 checksum checking 
    * add gzip and unzip tarball functionalities
* add corresponding unittests
Also added HNSW int8 regressions: works for cosDPR-distill, issues remain with OpenAI Ada2
+ Improved alignment between SearchCollection and dense vector search classes.
+ Aligned ScoredDoc and ScoredDocs (was previously ScoredDocuments) as container objects for Lucene results.
+ Searchers now use ScoredDoc instead of class-specific Result objects.
+ Tweaked SearchCollection args to use proper camelCasing.
+ Consolidated BaseSearcher class for basic ranked list post-processing functionality.
+ Increased test coverage.
…ings (#2318)

Default of 2047 seems to be too aggressive for hops, getting OOM errors.
lintool and others added 20 commits August 22, 2024 17:03
+ simplified parameters in cases where there are default (for BEIR)
+ moved "threads" parameter up closer to beginning of command (for indexing, all regressions)
+ dense searchers batch_search - change method signature to take queries, then qids - to be consistent
  with SimpleSearcher and SimpleImpactSearcher
+ dense searchers: refactor ThreadPoolExecutor to use try-with-resources, see #2579
Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8.
- [Release notes](https://github.com/micromatch/micromatch/releases)
- [Changelog](https://github.com/micromatch/micromatch/blob/master/CHANGELOG.md)
- [Commits](micromatch/micromatch@4.0.5...4.0.8)

---
updated-dependencies:
- dependency-name: micromatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@lintool
Copy link
Member

lintool commented Sep 14, 2024

Please continue along the onboarding path until you reach the end of the exercises, and then we'll take a look all at once.

@RMaarefdoust
Copy link
Author

This pull request includes an update to the reproduction log for the BM25 MS MARCO Passage Ranking. The entry has been added to document the reproduction results for this specific baseline.

Details:

Results reproduced by: [@RMaarefdoust](https://github.com/RMaarefdoust) on 2024-09-13
Commit: [e8b24f6](https://github.com/castorini/anserini/commit/e8b24f69ec45692a5fff641568727c8e46aa195a)

@RMaarefdoust
Copy link
Author

I did it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.