whisper : fix KV cache allocation #2443

ggerganov · 2024-10-02T11:59:57Z

Dynamically resize the KV cache based on the number of decoders needed.

WilliamTambellini · 2024-10-02T17:18:18Z

ok tks very much @ggerganov
I m testing this change and atm I see a huge increase in memory need, so high that the lib actually easily oom:

...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 560.00 MiB on device 0: cudaMalloc failed: out of memory
whisper_kv_cache_init: failed to allocate memory for the kv cache
whisper_full_with_state: whisper_kv_cache_init() failed for self-attention cache
bin/main: failed to process audio

I now have to decrease beamsize to 4 in order to be able to decode anything (using the v3 model) without OOM (6GB GPU).
I guess:

my change in PR 2433 only increases the size of the text kv cache (encoder?)
your current change increases the size of all caches

?
best
W

whisper : fix KV cache allocation

3be0c57

ggerganov mentioned this pull request Oct 2, 2024

Add a new param to set the self attn text context factor #2433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : fix KV cache allocation #2443

whisper : fix KV cache allocation #2443

ggerganov commented Oct 2, 2024

WilliamTambellini commented Oct 2, 2024 •

edited

Loading

whisper : fix KV cache allocation #2443

Are you sure you want to change the base?

whisper : fix KV cache allocation #2443

Conversation

ggerganov commented Oct 2, 2024

WilliamTambellini commented Oct 2, 2024 • edited Loading

WilliamTambellini commented Oct 2, 2024 •

edited

Loading