Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

Open
josancamon19 opened this issue Aug 4, 2024 · 6 comments · Fixed by #565
Open

Websocket silero VAD works for (opus, pcm8, pcm16) ($500) #518

josancamon19 opened this issue Aug 4, 2024 · 6 comments · Fixed by #565
Labels
backend Backend Task (python) Paid Bounty 💰

Comments

@josancamon19
Copy link
Contributor

Is your feature request related to a problem? Please describe.
VAD needs to determine better when to send or not to send bytes.

File transcribe.py /listen endpoint.

while True:
    data = await websocket.receive_bytes()
    # print(len(data))
    # audio_buffer.extend(data)
    # print(len(audio_buffer), window_size_samples * 2) # * 2 because 16bit
    # TODO: vad not working propperly.
    # - PCM still has to collect samples, and while it collects them, still sends them to the socket, so it's like nothing
    # - Opus always says there's no speech (but collection doesn't matter much, as it triggers like 1 per 0.2 seconds)

    # len(data) = 160, 8khz 16bit -> 2 bytes per sample, 80 samples, needs 256 samples, which is 256*2 bytes
    # if len(audio_buffer) >= window_size_samples * 2:
    #     # TODO: vad doesn't work index.html
    #     if is_speech_present(audio_buffer[:window_size_samples * 2], vad_iterator, window_size_samples):
    #         print('*')
    #         # pass
    #     else:
    #         print('-')
    #         audio_buffer = audio_buffer[window_size_samples * 2:]
    #         continue
    #
    #     audio_buffer = audio_buffer[window_size_samples * 2:]

    elapsed_seconds = time.time() - timer_start
    if elapsed_seconds > 20 or not socket2:
        socket1.send(data)
        # print('Sending to socket 1')
        if socket2:
            print('Killing transcript_socket2')
            socket2.finish()
            socket2 = None
    else:
        # print('Sending to socket 2')
        socket2.send(data)

Describe the solution you'd like
Opus 16k 16 bit.
pcm8 for old firmware version. 8khz.
pcm16 for from device recording.

This requires also to work with multiple languages.

@josancamon19 josancamon19 added Paid Bounty 💰 backend Backend Task (python) labels Aug 4, 2024
@josancamon19 josancamon19 changed the title Websocket silero VAD works for (opus, pcm8, pcm16) ($250) Websocket silero VAD works for (opus, pcm8, pcm16) ($500) Aug 6, 2024
@0xzre
Copy link
Contributor

0xzre commented Aug 8, 2024

Hello, I'll gladly take this issue. My plan is:

Integrate VAD: I will incorporate the 'silero-vad' library, which is well suited for Friend device, for better voice activity detection.

Adjust Audio Buffer Handling: I'll refine the handling of audio data, managing the buffer size and ensuring that it correctly handle different audio formats, such as Opus and PCM.

Sample Rate and Codec Handling: I'll try to involve adjusting the VAD parameters and buffer calculations based on the specified sample rate and codec.

Looking forward for reply fren :)

@josancamon19
Copy link
Contributor Author

Awesome! assigning to @0xzre for the next 2 days.

Some context of what is in place already:
https://github.com/BasedHardware/Friend/blob/main/backend/utils/stt/vad.py
https://github.com/BasedHardware/Friend/blob/272b663b0d86832e56a0ccea3656b7f372e8361a/backend/routers/transcribe.py#L66

@josancamon19
Copy link
Contributor Author

Hi @0xzre can you submit a Draft PR and show progress?

@mdmohsin7
Copy link
Collaborator

Fixed for pcm8 and pcm16. Opus is still pending

@beastoin
Copy link
Collaborator

$500 🤑 should i...

@josancamon19
Copy link
Contributor Author

Ended up implementing a shitty** VAD
https://github.com/wiseman/py-webrtcvad/blob/master/example.py

Still does the 80/20.
Tried implementing the VAD on the front
https://github.com/BasedHardware/Friend/blob/6e2d9903b493681673a93aa39f392228ababb660/app/lib/providers/vad.dart#L13
Consumes 10% more battery on iPhone 11, than just sending the bytes to the websocket.
It is still maintainable IMO, and if lower on celullar data, is great.

Will keep in backlog, but if silero becomes a viable solution, will merge that solution, and take it to prod, the baseline, is the current implementation, has to be at least at good at discarding, but also at most worst on delaying the transcript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend Task (python) Paid Bounty 💰
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants