Transcribing almost 2000 audio files on a laptop iGPU #2806
Replies: 3 comments
-
Have to do some more testing because with the mostly default settings the larger model "ggml-large-v3-turbo.bin" is outputting nonsense, repeated lines when it encounters "noise" (a medium crowd all talking at the same time). Interestingly, the model "ggml-tiny.en.bin" just outputs "[INAUDIBLE]" which is better. Need to do some more testing. So far I've adjusted the command line options to include (wish there was more documentation on what these do): |
Beta Was this translation helpful? Give feedback.
-
So, for my use cases the following adjustments look very promising: Change the model to I can't stress how much better Command line options added: Script has been updated. |
Beta Was this translation helpful? Give feedback.
-
Some more screenshots of |
Beta Was this translation helpful? Give feedback.
-
This post is mainly a thank you to you @ggerganov for making this project. I'm using a Lenovo Yoga6 (AMD Ryzen 7 7730U 2GBVRAM/16GB total RAM) to transcribe almost 2,000 audio files (the best machine available to me atm) and thanks to your project supporting vulkan I can transcribe the files almost twice as fast verses using only the CPU (openai-whisper). The CPU on this machine is very respectable but the GPU is significantly faster for LLM/inference work. Not to mention that since the CPU is barely being used, I can still use the machine to do other things, including running lm-studio (in cpu only mode, even though it perfectly supports vulkan as well) with 15k context to test different 7B/8B models to see which provides the best summaries of the transcriptions :)
The script I'm using for the transcription is here:
https://github.com/toazd/LLM_playground/blob/main/transcribe_audio_whisper.cpp.sh
I'm still testing LLMs for transcription summaries but so far Deepseek-r1-8B and Qwen2.5-7B-instruct (both Q4) have done very well.
Beta Was this translation helpful? Give feedback.
All reactions