Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ElasticsearchStore to retrieve with the pure BM25 algorithm without vector search #6

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

g-votte
Copy link
Contributor

@g-votte g-votte commented Mar 31, 2024

This PR is migrated from the main repo of LangChain, since the libs/elasticsearch has been moved during the review process. The original PR is: langchain-ai/langchain#19314

Description

This pull request proposes the implementation of the BM25RetrievalStrategy for ElasticsearchStore. This retrieval strategy enables searches purely based on BM25 without involving vector search.

Usage Example of Introduced Feature

By specifying the BM25RetrievalStrategy as a constructor argument for ElasticsearchStore, users can perform searches using pure BM25 without vector search. Note that in the example below, the embedding option is not specified, indicating that the search is conducted without using embeddings.

from langchain_elasticsearch.vectorstores import ElasticsearchStore

store = ElasticsearchStore(
    es_url="http://localhost:9200",
    index_name="test_index",
    strategy=ElasticsearchStore.BM25RetrievalStrategy(),
)

store.add_texts(
    [
        "foo",
        "foo bar",
        "foo bar baz",
        "bar",
        "bar baz",
        "baz"
    ],
)

results = store.similarity_search(query="foo", k=10)
print(results)

The example above outputs:

[Document(page_content='foo'), Document(page_content='foo bar'), Document(page_content='foo bar baz')]

Details

For more details, please refer to the original PR: langchain-ai/langchain#19314

@g-votte
Copy link
Contributor Author

g-votte commented Mar 31, 2024

I have migrated the changes from the original PR. Although @joemcelroy and @maxjakob have kindly approved the original one, I welcome any additional comments you might have.

The documentation has been excluded from this PR because, in my understanding, the docs will continue to be managed in LangChain's main repository. Consequently, once this PR is merged, I intend to submit a new PR specifically for the documentation in the main repository.

PTAL.
@joemcelroy @miguelgrinberg @maxjakob @baskaryan @efriis

Copy link
Collaborator

@maxjakob maxjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution @g-votte!

The documentation has been excluded from this PR because, in my understanding, the docs will continue to be managed in LangChain's main repository. Consequently, once this PR is merged, I intend to submit a new PR specifically for the documentation in the main repository.

That is correct, the docs still live in the main repo. It would be great if you could create another PR with the documentation change.

@maxjakob maxjakob merged commit 10a96cb into langchain-ai:main Apr 2, 2024
12 checks passed
baskaryan pushed a commit to langchain-ai/langchain that referenced this pull request Apr 9, 2024
…#20098)

This pull request follows up on
#19314 and
langchain-ai/langchain-elastic#6, adding
documentation for the `ElasticsearchStore.BM25RetrievalStrategy`.

Like other retrieval strategies, we are now introducing
BM25RetrievalStrategy.

### Background
- The `BM25RetrievalStrategy` has been introduced to `langchain-elastic`
via the pull request
langchain-ai/langchain-elastic#6.
- This PR was initially created in the main `langchain` repository but
was moved to `langchain-elastic` during the review process due to the
migration of the partner package.
- The original PR can be found at
#19314.
- As
[commented](#19314 (comment))
by @joemcelroy, documenting the new retrieval strategy is part of the
requirements for its introduction.

Although the `BM25RetrievalStrategy` has been merged into
`langchain-elastic`, its documentation is still to be maintained in the
main `langchain` repository. Therefore, this pull request adds the
documentation portion of `BM25RetrievalStrategy`.

The content of the documentation remains the same as that included in
the original PR, #19314.

---------

Co-authored-by: Max Jakob <[email protected]>
junkeon pushed a commit to UpstageAI/langchain that referenced this pull request Apr 16, 2024
…langchain-ai#20098)

This pull request follows up on
langchain-ai#19314 and
langchain-ai/langchain-elastic#6, adding
documentation for the `ElasticsearchStore.BM25RetrievalStrategy`.

Like other retrieval strategies, we are now introducing
BM25RetrievalStrategy.

### Background
- The `BM25RetrievalStrategy` has been introduced to `langchain-elastic`
via the pull request
langchain-ai/langchain-elastic#6.
- This PR was initially created in the main `langchain` repository but
was moved to `langchain-elastic` during the review process due to the
migration of the partner package.
- The original PR can be found at
langchain-ai#19314.
- As
[commented](langchain-ai#19314 (comment))
by @joemcelroy, documenting the new retrieval strategy is part of the
requirements for its introduction.

Although the `BM25RetrievalStrategy` has been merged into
`langchain-elastic`, its documentation is still to be maintained in the
main `langchain` repository. Therefore, this pull request adds the
documentation portion of `BM25RetrievalStrategy`.

The content of the documentation remains the same as that included in
the original PR, langchain-ai#19314.

---------

Co-authored-by: Max Jakob <[email protected]>
hinthornw pushed a commit to langchain-ai/langchain that referenced this pull request Apr 26, 2024
…#20098)

This pull request follows up on
#19314 and
langchain-ai/langchain-elastic#6, adding
documentation for the `ElasticsearchStore.BM25RetrievalStrategy`.

Like other retrieval strategies, we are now introducing
BM25RetrievalStrategy.

### Background
- The `BM25RetrievalStrategy` has been introduced to `langchain-elastic`
via the pull request
langchain-ai/langchain-elastic#6.
- This PR was initially created in the main `langchain` repository but
was moved to `langchain-elastic` during the review process due to the
migration of the partner package.
- The original PR can be found at
#19314.
- As
[commented](#19314 (comment))
by @joemcelroy, documenting the new retrieval strategy is part of the
requirements for its introduction.

Although the `BM25RetrievalStrategy` has been merged into
`langchain-elastic`, its documentation is still to be maintained in the
main `langchain` repository. Therefore, this pull request adds the
documentation portion of `BM25RetrievalStrategy`.

The content of the documentation remains the same as that included in
the original PR, #19314.

---------

Co-authored-by: Max Jakob <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants