You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Azure Speech SDK uses a websocket to communicate with the service. If the connection is newly established, the network latency includes extra time to establish the connection.
In my experiments with the Python SDK, TTFB was divided by ~5 when reusing the SpeechSynthesizer as Azure recommends.
I have tried to move the Azure speech synthesizer as a TTS attribute in order to instanciate once and dynamically update its audio config with each stream. Unfortunately, this doesn't reduce latency. As David suggested, it looks like we would need to keep one stream to make sure the connection is reused.
The text was updated successfully, but these errors were encountered:
yousri-sellami
changed the title
Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each turn
Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis
Dec 23, 2024
yousri-sellami
changed the title
Improve Azure Speech TTFB by avoiding reconnecting to Speech SDK at each voice synthesis
Improve Azure Speech TTFB by reusing connection at each voice synthesis
Dec 24, 2024
The Azure Speech SDK uses a websocket to communicate with the service. If the connection is newly established, the network latency includes extra time to establish the connection.
In my experiments with the Python SDK, TTFB was divided by ~5 when reusing the
SpeechSynthesizer
as Azure recommends.I have tried to move the Azure speech synthesizer as a
TTS
attribute in order to instanciate once and dynamically update its audio config with each stream. Unfortunately, this doesn't reduce latency. As David suggested, it looks like we would need to keep one stream to make sure the connection is reused.The text was updated successfully, but these errors were encountered: