Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented a retriever for R2R #1676

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

RamXX
Copy link

@RamXX RamXX commented Oct 23, 2024

Hello team,

This PR implements a retriever for R2R (RAG-to-Riches) as a new retriever option in DSPy.

R2R is a flexible RAG solution that combines vector search, hybrid search, and knowledge graph capabilities, specifically implementing GraphRAG as described in the original Microsoft paper, offering enhanced retrieval options for DSPy users.

Features

The R2R features exposed in this PR are:

  • Very flexible implementation, capable of receiving parameters at instantiation or later during forward()
  • Vector and hybrid search support with configurable parameters, including kwargs for full control of the input
  • Knowledge graph-based retrieval with local search options (global being refactored by R2R)
  • Dual operation modes: direct search and RAG generation by the R2R server
  • Collection management with name/UUID resolution
  • Comprehensive error handling and validation everywhere
  • Full type hints and docstrings documentation

Example

    >>> rm = R2RRetrieve(collection_name="My Collection")
    >>> dspy.configure(lm=lm, rm=rm) # lm previously defined
    >>> retriever = dspy.Retrieve(k=10)
    >>> query="What is machine learning?"
    >>> topK_passages = retriever(query).passages

Currently out of scope

  • Agent searches
  • Ingestion or any non-query operation
  • Metrics and assessments of previous searches (can be implemented by the user)
  • Chunk metadata included in R2R responses (can be implemented in future releases)

I'm not affiliated with this team, but I am using their product in at least one project and I believe our community can benefit from having this opportunity available. I'm open to any suggestions and willing to keep maintaining it as needed.

Thank you for your consideration.

--Ramiro Salas, a.k.a. DrWho? (DSPy & R2R Discord)

@RamXX
Copy link
Author

RamXX commented Nov 13, 2024

R2R has made some breaking changes in the API so this retriever had to be re-worked. Also, they have an odd way to manage sync and async so it was not working as expected in Jupyter or other systems with their own async loop without a wrapper function. I added a helper method to address this and it's now working again.

@RamXX RamXX marked this pull request as draft November 13, 2024 22:45
@RamXX RamXX marked this pull request as ready for review November 14, 2024 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant