Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for context caching #3636

Open
ekzhu opened this issue Oct 3, 2024 · 1 comment
Open

Add support for context caching #3636

ekzhu opened this issue Oct 3, 2024 · 1 comment

Comments

@ekzhu
Copy link
Collaborator

ekzhu commented Oct 3, 2024

What feature would you like to be added?

Certain APIs have support for context/prompt/prefix caching, notably Gemini and Claude as well as any local LLM. OpenAI also started supporting this not long ago. This enables calls that condition on the same prefix (chat history) to be faster as they cache the context. However, I am not sure if we can have caching infrastructure be independent of the API provider or rather support caching for each provider.

Why is this needed?

Speed up inference time

@ekzhu
Copy link
Collaborator Author

ekzhu commented Oct 3, 2024

@husseinmozannar Moved your issue here.

@rysweet rysweet added this to the future milestone Oct 22, 2024
@jackgerrits jackgerrits modified the milestones: future, 0.4 Oct 22, 2024
@fniedtner fniedtner modified the milestones: 0.4, future Oct 22, 2024
@fniedtner fniedtner removed the feature label Oct 24, 2024
@husseinmozannar husseinmozannar removed their assignment Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants