Allow users to set output format for TTS in livekit-plugins-openai #1341

zhanghx0905 · 2025-01-07T03:24:05Z

Hello, I am using edgetts to simulate the OpenAI TTS service for a local, free voice agent. However, edgetts only supports returning audio in MP3 format. In livekit-plugins-openai version 0.10.9 and later, the request format has been set to PCM.

Proposed Solution

Add a configuration option or parameter to livekit.plugins.openai.tts.TTS to allow users to specify the desired output format (e.g., output_format='mp3' or output_format='pcm').

Temporary Patch

This mismatch requires me to dynamically modify livekit.plugins.openai.tts.ChunkedStream at runtime to handle the MP3 format, which is not ideal.

import httpx
from livekit.agents import (
    APIConnectionError,
    APIStatusError,
    APITimeoutError,
    tts,
    utils,
)

import openai
from livekit.plugins.openai.tts import (
    OPENAI_TTS_SAMPLE_RATE,
    OPENAI_TTS_CHANNELS,
    ChunkedStream,
)


async def _run(self: ChunkedStream):
    print(f"TTS debug: {self.input_text}")
    oai_stream = self._client.audio.speech.with_streaming_response.create(
        input=self.input_text,
        model=self._opts.model,
        voice=self._opts.voice,  # type: ignore
        response_format="mp3",
        speed=self._opts.speed,
        timeout=httpx.Timeout(30, connect=self._conn_options.timeout),
    )
    decoder = utils.codecs.Mp3StreamDecoder()
    request_id = utils.shortuuid()
    audio_bstream = utils.audio.AudioByteStream(
        sample_rate=OPENAI_TTS_SAMPLE_RATE,
        num_channels=OPENAI_TTS_CHANNELS,
    )

    try:
        async with oai_stream as stream:
            async for data in stream.iter_bytes():
                for frame in decoder.decode_chunk(data):
                    for frame in audio_bstream.write(frame.data.tobytes()):
                        self._event_ch.send_nowait(
                            tts.SynthesizedAudio(
                                frame=frame,
                                request_id=request_id,
                            )
                        )

            for frame in audio_bstream.flush():
                self._event_ch.send_nowait(
                    tts.SynthesizedAudio(
                        frame=frame,
                        request_id=request_id,
                    )
                )

    except openai.APITimeoutError:
        raise APITimeoutError()
    except openai.APIStatusError as e:
        raise APIStatusError(
            e.message,
            status_code=e.status_code,
            request_id=e.request_id,
            body=e.body,
        )
    except Exception as e:
        raise APIConnectionError() from e


ChunkedStream._run = _run

The text was updated successfully, but these errors were encountered:

zhanghx0905 added the question Further information is requested label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to set output format for TTS in livekit-plugins-openai #1341

Allow users to set output format for TTS in livekit-plugins-openai #1341

zhanghx0905 commented Jan 7, 2025

Allow users to set output format for TTS in livekit-plugins-openai #1341

Allow users to set output format for TTS in livekit-plugins-openai #1341

Comments

zhanghx0905 commented Jan 7, 2025

Proposed Solution

Temporary Patch