Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

yousri-sellami · 2024-12-23T21:36:24Z

The Azure Speech SDK uses a websocket to communicate with the service. If the connection is newly established, the network latency includes extra time to establish the connection.

In my experiments with the Python SDK, TTFB was divided by ~5 when reusing the SpeechSynthesizer as Azure recommends.

I have tried to move the Azure speech synthesizer as a TTS attribute in order to instanciate once and dynamically update its audio config with each stream. Unfortunately, this doesn't reduce latency. As David suggested, it looks like we would need to keep one stream to make sure the connection is reused.

The text was updated successfully, but these errors were encountered:

yousri-sellami added the bug Something isn't working label Dec 23, 2024

yousri-sellami changed the title ~~Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each turn~~ Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis Dec 23, 2024

yousri-sellami changed the title ~~Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis~~ Improve Azure Speech TTFB by reusing connection at each voice synthesis Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

yousri-sellami commented Dec 23, 2024

Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

Improve Azure Speech TTFB by reusing connection at each voice synthesis #1287

Comments

yousri-sellami commented Dec 23, 2024