Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Azure CosmosDB NoSQL native WHERE and query_constructors #44

Open
wants to merge 6 commits into
base: release-langchain-azure-ai-0.1.2
Choose a base branch
from

Conversation

ianchi
Copy link

@ianchi ianchi commented Feb 12, 2025

Current metadata filter implementation in Azure CosmosDB NoSQL vectorstore uses custom objects, which are not native to CosmosDB nor to LangChain.
This PR adds the option in the vector store to filter with the more native "WHERE" sql clause as string.
It also adds a Translator class, so that in can be natively used with Structured Queries / Filter Expressions.

It adds another (more native) filtering syntax.

This PR introduces a native way to express filter conditions in a native way to CosmosDB NoSQL, that is by a simple string, and by using a FilterExpression for the Langchain's standard objects.

The original implementation "invented" a custom Condition class, very similar conceptually to the langchain standard FilterDirective.

A second commit removes the custom object way of doing filtering in favor of this more native one.
This new repo already includes some breaking changes, so this might be a good moment to go for a standard.

This replaces langchain-ai/langchain#29718

@marlenezw
Copy link
Collaborator

cc @aayush3011 or @fatmelon

@santiagxf santiagxf changed the base branch from main to release-langchain-azure-ai-0.1.2 February 13, 2025 15:37
@santiagxf
Copy link
Collaborator

I have changed the target branch to the release branch where we can include it in the next release. @aayush3011 or @fatmelon please take a look and let us know as code owners.

@santiagxf santiagxf requested a review from aayush3011 February 14, 2025 02:06
@fatmelon
Copy link
Contributor

vCore part looks good to me.
NoSql changes may need Aayush's review

@ianchi
Copy link
Author

ianchi commented Feb 20, 2025

Hi @aayush3011, please share any comments, I'm open to adapting the code if anything is needed.

We want to use this retriever, but the class still has many issues. This is one of the bigger one that we are facing, and is a blocker to adopt.

I have also worked on other problems we found, and I'll be submitting other PR once this is merged (some were already submitted in langchain_community, but will need to be moved here):

  • currently the database is created even if paramater says you don't want to
  • there is no true async support
  • repartition_key handling is partial, and erroneus in most use cases (for instance delete is wrong, insertion will also have problems if partition_key is any other than id)
  • get_by_ids is not implemented
  • metadata key is limited to only one use case
  • Document is created without id

We want to use this in production, but in the current state it will be hard.
If you are open to it, I'm willing to contribute the corresponding PR for review, discussing and eventually merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants