Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : fix KV cache allocation #2443

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

ggerganov
Copy link
Owner

alt #2433

Dynamically resize the KV cache based on the number of decoders needed.

@WilliamTambellini
Copy link
Contributor

WilliamTambellini commented Oct 2, 2024

ok tks very much @ggerganov
I m testing this change and atm I see a huge increase in memory need, so high that the lib actually easily oom:

...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 560.00 MiB on device 0: cudaMalloc failed: out of memory
whisper_kv_cache_init: failed to allocate memory for the kv cache
whisper_full_with_state: whisper_kv_cache_init() failed for self-attention cache
bin/main: failed to process audio

I now have to decrease beamsize to 4 in order to be able to decode anything (using the v3 model) without OOM (6GB GPU).
I guess:

  • my change in PR 2433 only increases the size of the text kv cache (encoder?)
  • your current change increases the size of all caches

?
best
W

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants