Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

Closed
nchudleigh opened this issue Oct 2, 2024 · 6 comments

Comments

@nchudleigh
Copy link
Contributor

nchudleigh commented Oct 2, 2024

I'm running into an issue where subsequent runs of the model are bleeding over results from a previous recording when the model is staying loaded in memory.

I've checked all the inputs to the full transcribe call and there is no difference between the two calls, but it seems that something internal to whisper.cpp is not being reset.

Are there anything that I need to call within whisper.cpp to reset the state of the model?

Example results:

1st transcription (good result):
Hello my name is Mark and this is a boat it floats on water and it's very slow.

2nd transcription (completely different audio does not have any of this transcribed text in it)
Hello my name is Mark. Hello my name is Mark. Hello my name is Mark. Hello my name is Mark.

@ggerganov
Copy link
Owner

Can you confirm that the following patch fixes the issue:

diff --git a/src/whisper.cpp b/src/whisper.cpp
index 9c7c66b..c8ee0f8 100644
--- a/src/whisper.cpp
+++ b/src/whisper.cpp
@@ -1033,6 +1033,8 @@ static void whisper_kv_cache_clear(struct whisper_kv_cache & cache) {
         cache.cells[i].seq_id.clear();
     }
     cache.head = 0;
+
+    ggml_backend_buffer_clear(cache.buffer, 0);
 }
 
 static void whisper_kv_cache_seq_rm(

@nchudleigh
Copy link
Contributor Author

Initial testing looks good, sending out to early release group as well.

@ggerganov
Copy link
Owner

I went ahead and pushed the patch to master. On one hand it's a bit strange that clearing the cache makes a difference at all since the KQ mask would already mask away the unused data from previous runs, so this makes me think that there might be some other issue at hand. Let me know if you continue to experience this problem.

@nchudleigh
Copy link
Contributor Author

@ggerganov It appears to fix the leaks so far, but I will have more feedback from users in the next couple days.

I am also testing the new v3 turbo model on this release candidate, which seems to hallucinate (repetition) a bit. Are you interested in feedback on it? I can spin up a new issue if so.

@ggerganov
Copy link
Owner

I've mostly accepted that v3 models are busted, so I don't expect much from v3-turbo. Feedback is always appreciated though.

@nchudleigh
Copy link
Contributor Author

nchudleigh commented Oct 7, 2024

@ggerganov I feel I might as well document it, on the off chance a solution can be found- the performance is otherwise incredible.

(Un)fortunately the hallucination is not consistent. It mostly manifests as repetition.

bygreencn added a commit to bygreencn/whisper.cpp that referenced this issue Oct 12, 2024
# By Georgi Gerganov (18) and others
# Via Georgi Gerganov
* tag 'v1.7.1': (43 commits)
  release : v1.7.1
  vulkan : retry allocation with fallback flags (ggerganov#2451)
  release : v1.7.0
  scripts : bench v3-turbo
  whisper : remove mel leftover constants (396089f)
  whisper : zero-out the KV cache upon clear (ggerganov#2445)
  objc : fix build
  metal : zero-init buffer contexts (#0)
  whisper : revert mel-related changes (#0)
  whisper : adapt to latest ggml (skip) (#0)
  ggml : fix typo in example usage ggml_gallocr_new (ggml/984)
  ggml : fixes after sync (ggml/983)
  ggml-backend : add device and backend reg interfaces (llama/9707)
  Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711)
  ggml-backend : add device and backend reg interfaces (llama/9707)
  Initial cmake support of SYCL for AMD GPUs (llama/9658)
  vulkan : do not use tensor->extra (llama/9407)
  ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)
  ggml: refactor cross entropy loss CPU impl. (ggml/976)
  scripts : sync ggml-backend.cpp
  ...

# Conflicts:
#	bindings/javascript/package.json
lyapple2008 pushed a commit to lyapple2008/whisper.cpp.mars that referenced this issue Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants