-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hallucinations and repeats of previous transcriptions when running without reloading model #2445
Comments
Can you confirm that the following patch fixes the issue: diff --git a/src/whisper.cpp b/src/whisper.cpp
index 9c7c66b..c8ee0f8 100644
--- a/src/whisper.cpp
+++ b/src/whisper.cpp
@@ -1033,6 +1033,8 @@ static void whisper_kv_cache_clear(struct whisper_kv_cache & cache) {
cache.cells[i].seq_id.clear();
}
cache.head = 0;
+
+ ggml_backend_buffer_clear(cache.buffer, 0);
}
static void whisper_kv_cache_seq_rm( |
Initial testing looks good, sending out to early release group as well. |
I went ahead and pushed the patch to |
@ggerganov It appears to fix the leaks so far, but I will have more feedback from users in the next couple days. I am also testing the new v3 turbo model on this release candidate, which seems to hallucinate (repetition) a bit. Are you interested in feedback on it? I can spin up a new issue if so. |
I've mostly accepted that v3 models are busted, so I don't expect much from v3-turbo. Feedback is always appreciated though. |
@ggerganov I feel I might as well document it, on the off chance a solution can be found- the performance is otherwise incredible. (Un)fortunately the hallucination is not consistent. It mostly manifests as repetition. |
# By Georgi Gerganov (18) and others # Via Georgi Gerganov * tag 'v1.7.1': (43 commits) release : v1.7.1 vulkan : retry allocation with fallback flags (ggerganov#2451) release : v1.7.0 scripts : bench v3-turbo whisper : remove mel leftover constants (396089f) whisper : zero-out the KV cache upon clear (ggerganov#2445) objc : fix build metal : zero-init buffer contexts (#0) whisper : revert mel-related changes (#0) whisper : adapt to latest ggml (skip) (#0) ggml : fix typo in example usage ggml_gallocr_new (ggml/984) ggml : fixes after sync (ggml/983) ggml-backend : add device and backend reg interfaces (llama/9707) Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711) ggml-backend : add device and backend reg interfaces (llama/9707) Initial cmake support of SYCL for AMD GPUs (llama/9658) vulkan : do not use tensor->extra (llama/9407) ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) ggml: refactor cross entropy loss CPU impl. (ggml/976) scripts : sync ggml-backend.cpp ... # Conflicts: # bindings/javascript/package.json
I'm running into an issue where subsequent runs of the model are bleeding over results from a previous recording when the model is staying loaded in memory.
I've checked all the inputs to the full transcribe call and there is no difference between the two calls, but it seems that something internal to whisper.cpp is not being reset.
Are there anything that I need to call within whisper.cpp to reset the state of the model?
Example results:
1st transcription (good result):
Hello my name is Mark and this is a boat it floats on water and it's very slow.
2nd transcription (completely different audio does not have any of this transcribed text in it)
Hello my name is Mark. Hello my name is Mark. Hello my name is Mark. Hello my name is Mark.
The text was updated successfully, but these errors were encountered: