Skip to content

Are there any KV cache movement when using PP/TP? #13653

Answered by comaniac
leizhenyuan asked this question in Q&A
Discussion options

You must be logged in to vote

Your understanding is correct. The API you found is used for prefill disaggregation. In this case we transfer the kv cache from a prefill worker to a decode worker.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@leizhenyuan
Comment options

Answer selected by leizhenyuan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants