Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Bedrock KB metadata to AmazonKnowledgeBasesRetriever Document metadata #36

Closed

Conversation

karlnewell
Copy link

AWS Bedrock Knowledge Bases support custom user metadata per source document. This change adds that metadata to AmazonKnowledgeBasesRetriever Documents metadata as source_metadata.

This addresses the feature request in langchain-ai/langchain#20649

I'm open to suggestions on the source_metadata key name, but I was avoiding metadata["metadata"]

There are a few different ways to handle this. In this PR, source_metadata is set to None if the Bedrock KB result does not contain a metadata key. I prefer this approach as the presence of the source_metadata key is guaranteed for the caller - i.e., the caller does not have to check for it's presence.

If it's preferred to not set source_metadata if it's empty, then here are a couple of options.

for result in results:
    metadata={
        "location": result["location"],
        "score": result["score"] if "score" in result else 0,
    }
    if "metadata" in result:
        metadata["source_metadata"] = result["metadata"]
    documents.append(
        Document(
            page_content=result["content"]["text"],
            metadata=metadata,
        )
    )

Requires Python 3.9+

for result in results:
    documents.append(
        Document(
            page_content=result["content"]["text"],
            metadata={
                "location": result["location"],
                "score": result["score"] if "score" in result else 0} |
                ({'source_metadata': result["metadata"] } if "metadata" in result else {}),
        )
    )

I updated the integration tests to reflect these changes and added a response that includes KB metadata.

I changed the integration test get_relevant_documents to invoke based on

/home/knewell/.cache/pypoetry/virtualenvs/langchain-aws-eO7Ij7mw-py3.9/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method BaseRetriever.get_relevant_documents was deprecated in langchain-core 0.1.46 and will be removed in 0.3.0. Use invoke instead.
warn_deprecated(

Two simple typos in README.md were fixed.

@3coins
Copy link
Collaborator

3coins commented May 3, 2024

@alexol91
Do you have any feedback for this PR?

@3coins 3coins added the enhancement New feature or request label May 3, 2024
@karlnewell karlnewell force-pushed the feature/bedrock-kb-metadata branch from 1e91608 to 199f291 Compare May 3, 2024 20:31
AWS Bedrock supports custom user metadata per source document.
This change adds that metadata to AmazonKnowledgeBasesRetriever
Documents metadata as "source_metadata"
@karlnewell
Copy link
Author

Same feature added by #40

@karlnewell karlnewell closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants