-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518
Comments
Hello, I'll gladly take this issue. My plan is: Integrate VAD: I will incorporate the 'silero-vad' library, which is well suited for Friend device, for better voice activity detection. Adjust Audio Buffer Handling: I'll refine the handling of audio data, managing the buffer size and ensuring that it correctly handle different audio formats, such as Opus and PCM. Sample Rate and Codec Handling: I'll try to involve adjusting the VAD parameters and buffer calculations based on the specified sample rate and codec. Looking forward for reply fren :) |
Awesome! assigning to @0xzre for the next 2 days. Some context of what is in place already: |
Hi @0xzre can you submit a Draft PR and show progress? |
Fixed for pcm8 and pcm16. Opus is still pending |
$500 🤑 should i... |
Ended up implementing a shitty** VAD Still does the 80/20. Will keep in backlog, but if silero becomes a viable solution, will merge that solution, and take it to prod, the baseline, is the current implementation, has to be at least at good at discarding, but also at most worst on delaying the transcript. |
Is your feature request related to a problem? Please describe.
VAD needs to determine better when to send or not to send bytes.
File
transcribe.py
/listen endpoint.Describe the solution you'd like
Opus 16k 16 bit.
pcm8 for old firmware version. 8khz.
pcm16 for from device recording.
This requires also to work with multiple languages.
The text was updated successfully, but these errors were encountered: