Are there any KV cache movement when using PP/TP? #13653
-
I am not familiar with distributed. Below is my under standing: But i also found need_recv_kvcache when worker is a driver worker, so i think my above understanding must be wrong, Can anyone answer this question? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Your understanding is correct. The API you found is used for prefill disaggregation. In this case we transfer the kv cache from a prefill worker to a decode worker. |
Beta Was this translation helpful? Give feedback.
Your understanding is correct. The API you found is used for prefill disaggregation. In this case we transfer the kv cache from a prefill worker to a decode worker.