You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain APIs have support for context/prompt/prefix caching, notably Gemini and Claude as well as any local LLM. OpenAI also started supporting this not long ago. This enables calls that condition on the same prefix (chat history) to be faster as they cache the context. However, I am not sure if we can have caching infrastructure be independent of the API provider or rather support caching for each provider.
Why is this needed?
Speed up inference time
The text was updated successfully, but these errors were encountered:
What feature would you like to be added?
Certain APIs have support for context/prompt/prefix caching, notably Gemini and Claude as well as any local LLM. OpenAI also started supporting this not long ago. This enables calls that condition on the same prefix (chat history) to be faster as they cache the context. However, I am not sure if we can have caching infrastructure be independent of the API provider or rather support caching for each provider.
Why is this needed?
Speed up inference time
The text was updated successfully, but these errors were encountered: