Clarifying how to calculate the KV cache usage in GiB #13803
Unanswered
googlercolin
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I would like to calculate the usage of the KV cache in GiB, however, I am not very sure if
num_total_gpu = self.cache_config.num_gpu_blocks
in the code means the total number of blocks allocated for the KV cache (6.38GiB as per my logging output below) or the total number of GPU blocks (39.50GiB)?Also, when
scheduler.block_manager.get_num_free_gpu_blocks()
is called, are the free blocks only taken from those allocated to for the KV cache (part of the 6.38GiB), or it includes the 0.10 (3.95 GiB) outside the gpu_memory_utilization?Logs:
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions