Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatBedrock: add usage metadata #85

Merged
merged 6 commits into from
Jun 25, 2024
Merged

ChatBedrock: add usage metadata #85

merged 6 commits into from
Jun 25, 2024

Conversation

ccurme
Copy link
Contributor

@ccurme ccurme commented Jun 25, 2024

langchain-core 0.2.2 released a standard field to store usage metadata returned from chat model responses, such as input / output token counts. AIMessage objects have a .usage_metadata attribute which can hold a UsageMetadata dict. For now it is only holding token counts. Standardizing this information makes it simpler to track in monitoring / observability platforms and similar applications.

Here we unpack usage metadata returned by the Bedrock API onto AIMessages generated by chat models.

There are at least two options for implementing this in a streaming context:

  1. (Implemented here) Currently, Bedrock streams a final chunk containing usage data in "amazon-bedrock-invocationMetrics", which we ignore. These data appear standardized, at least for Anthropic and Mistral (I also checked Cohere and Llama3, but streaming for chat models does not work on either currently). We can emit an additional chunk containing these data. The advantage of this is that we may not have to implement any provider-specific processing. The disadvantage is that currently the final chunk contains a "stop_reason", and if users are assuming this is the final chunk, this could break workflows.

Before:

content='' response_metadata={'usage': {'input_tokens': [8], 'output_tokens': [1]}}
content='Hello' response_metadata={'stop_reason': None}
content='!' response_metadata={'stop_reason': None}
content='' response_metadata={'stop_reason': 'max_tokens', 'usage': {'output_tokens': [2]}}

After:

content='' response_metadata={'usage': {'input_tokens': [8], 'output_tokens': [1]}}
content='Hello' response_metadata={'stop_reason': None}
content='!' response_metadata={'stop_reason': None}
content='' response_metadata={'stop_reason': 'max_tokens', 'usage': {'output_tokens': [2]}}
content='' usage_metadata={'input_tokens': 8, 'output_tokens': 2, 'total_tokens': 10}
  1. (Implemented in commit history) Implement provider-specific processing, specifically for Anthropic. This is what I did first. Commit 2b9e400 changes to option 1, and if we want we can revert that commit.

@ccurme ccurme requested review from 3coins and baskaryan June 25, 2024 15:00
@ccurme ccurme merged commit df76647 into main Jun 25, 2024
12 checks passed
@ccurme ccurme deleted the cc/usage_metadata branch June 25, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants