Add support for context caching #3636

ekzhu · 2024-10-03T22:38:20Z

What feature would you like to be added?

Certain APIs have support for context/prompt/prefix caching, notably Gemini and Claude as well as any local LLM. OpenAI also started supporting this not long ago. This enables calls that condition on the same prefix (chat history) to be faster as they cache the context. However, I am not sure if we can have caching infrastructure be independent of the API provider or rather support caching for each provider.

Why is this needed?

Speed up inference time

ekzhu · 2024-10-03T22:38:59Z

@husseinmozannar Moved your issue here.

ekzhu added enhancement proj-core labels Oct 3, 2024

github-actions bot added the needs-triage label Oct 3, 2024

ekzhu added proj-extensions and removed needs-triage labels Oct 3, 2024

rysweet added this to the future milestone Oct 22, 2024

jackgerrits modified the milestones: future, 0.4 Oct 22, 2024

fniedtner modified the milestones: 0.4, future Oct 22, 2024

ekzhu assigned husseinmozannar Oct 23, 2024

fniedtner removed the feature label Oct 24, 2024

husseinmozannar removed their assignment Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for context caching #3636

Add support for context caching #3636

ekzhu commented Oct 3, 2024

ekzhu commented Oct 3, 2024

Add support for context caching #3636

Add support for context caching #3636

Comments

ekzhu commented Oct 3, 2024

What feature would you like to be added?

Why is this needed?

ekzhu commented Oct 3, 2024