Hallucinations and repeats of previous transcriptions when running without reloading model #2445

nchudleigh · 2024-10-02T15:18:21Z

I'm running into an issue where subsequent runs of the model are bleeding over results from a previous recording when the model is staying loaded in memory.

I've checked all the inputs to the full transcribe call and there is no difference between the two calls, but it seems that something internal to whisper.cpp is not being reset.

Are there anything that I need to call within whisper.cpp to reset the state of the model?

Example results:

1st transcription (good result):
Hello my name is Mark and this is a boat it floats on water and it's very slow.

2nd transcription (completely different audio does not have any of this transcribed text in it)
Hello my name is Mark. Hello my name is Mark. Hello my name is Mark. Hello my name is Mark.

ggerganov · 2024-10-04T08:08:31Z

Can you confirm that the following patch fixes the issue:

diff --git a/src/whisper.cpp b/src/whisper.cpp
index 9c7c66b..c8ee0f8 100644
--- a/src/whisper.cpp
+++ b/src/whisper.cpp
@@ -1033,6 +1033,8 @@ static void whisper_kv_cache_clear(struct whisper_kv_cache & cache) {
         cache.cells[i].seq_id.clear();
     }
     cache.head = 0;
+
+    ggml_backend_buffer_clear(cache.buffer, 0);
 }
 
 static void whisper_kv_cache_seq_rm(

nchudleigh · 2024-10-04T18:24:52Z

Initial testing looks good, sending out to early release group as well.

ggerganov · 2024-10-05T13:01:09Z

I went ahead and pushed the patch to master. On one hand it's a bit strange that clearing the cache makes a difference at all since the KQ mask would already mask away the unused data from previous runs, so this makes me think that there might be some other issue at hand. Let me know if you continue to experience this problem.

nchudleigh · 2024-10-07T18:25:27Z

@ggerganov It appears to fix the leaks so far, but I will have more feedback from users in the next couple days.

I am also testing the new v3 turbo model on this release candidate, which seems to hallucinate (repetition) a bit. Are you interested in feedback on it? I can spin up a new issue if so.

ggerganov · 2024-10-07T18:29:49Z

I've mostly accepted that v3 models are busted, so I don't expect much from v3-turbo. Feedback is always appreciated though.

nchudleigh · 2024-10-07T19:28:13Z

@ggerganov I feel I might as well document it, on the off chance a solution can be found- the performance is otherwise incredible.

(Un)fortunately the hallucination is not consistent. It mostly manifests as repetition.

# By Georgi Gerganov (18) and others # Via Georgi Gerganov * tag 'v1.7.1': (43 commits) release : v1.7.1 vulkan : retry allocation with fallback flags (ggerganov#2451) release : v1.7.0 scripts : bench v3-turbo whisper : remove mel leftover constants (396089f) whisper : zero-out the KV cache upon clear (ggerganov#2445) objc : fix build metal : zero-init buffer contexts (#0) whisper : revert mel-related changes (#0) whisper : adapt to latest ggml (skip) (#0) ggml : fix typo in example usage ggml_gallocr_new (ggml/984) ggml : fixes after sync (ggml/983) ggml-backend : add device and backend reg interfaces (llama/9707) Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711) ggml-backend : add device and backend reg interfaces (llama/9707) Initial cmake support of SYCL for AMD GPUs (llama/9658) vulkan : do not use tensor->extra (llama/9407) ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) ggml: refactor cross entropy loss CPU impl. (ggml/976) scripts : sync ggml-backend.cpp ... # Conflicts: # bindings/javascript/package.json

ggerganov added a commit that referenced this issue Oct 5, 2024

whisper : zero-out the KV cache upon clear (#2445)

47de411

ggerganov added a commit that referenced this issue Oct 5, 2024

whisper : zero-out the KV cache upon clear (#2445)

847f94f

nchudleigh closed this as completed Oct 10, 2024

lyapple2008 pushed a commit to lyapple2008/whisper.cpp.mars that referenced this issue Nov 2, 2024

whisper : zero-out the KV cache upon clear (ggerganov#2445)

e9c88f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

nchudleigh commented Oct 2, 2024 •

edited

Loading

ggerganov commented Oct 4, 2024

nchudleigh commented Oct 4, 2024

ggerganov commented Oct 5, 2024

nchudleigh commented Oct 7, 2024

ggerganov commented Oct 7, 2024

nchudleigh commented Oct 7, 2024 •

edited

Loading

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

Comments

nchudleigh commented Oct 2, 2024 • edited Loading

ggerganov commented Oct 4, 2024

nchudleigh commented Oct 4, 2024

ggerganov commented Oct 5, 2024

nchudleigh commented Oct 7, 2024

ggerganov commented Oct 7, 2024

nchudleigh commented Oct 7, 2024 • edited Loading

nchudleigh commented Oct 2, 2024 •

edited

Loading

nchudleigh commented Oct 7, 2024 •

edited

Loading