Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

Open
yousri-sellami opened this issue Dec 23, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@yousri-sellami
Copy link

The Azure Speech SDK uses a websocket to communicate with the service. If the connection is newly established, the network latency includes extra time to establish the connection.

In my experiments with the Python SDK, TTFB was divided by ~5 when reusing the SpeechSynthesizer as Azure recommends.

I have tried to move the Azure speech synthesizer as a TTS attribute in order to instanciate once and dynamically update its audio config with each stream. Unfortunately, this doesn't reduce latency. As David suggested, it looks like we would need to keep one stream to make sure the connection is reused.

@yousri-sellami yousri-sellami added the bug Something isn't working label Dec 23, 2024
@yousri-sellami yousri-sellami changed the title Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each turn Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis Dec 23, 2024
@yousri-sellami yousri-sellami changed the title Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis Improve Azure Speech TTFB by reusing connection at each voice synthesis Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant