Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to set output format for TTS in livekit-plugins-openai #1341

Open
zhanghx0905 opened this issue Jan 7, 2025 · 0 comments
Open
Labels
question Further information is requested

Comments

@zhanghx0905
Copy link

Hello, I am using edgetts to simulate the OpenAI TTS service for a local, free voice agent. However, edgetts only supports returning audio in MP3 format. In livekit-plugins-openai version 0.10.9 and later, the request format has been set to PCM.

Proposed Solution

Add a configuration option or parameter to livekit.plugins.openai.tts.TTS to allow users to specify the desired output format (e.g., output_format='mp3' or output_format='pcm').

Temporary Patch

This mismatch requires me to dynamically modify livekit.plugins.openai.tts.ChunkedStream at runtime to handle the MP3 format, which is not ideal.

import httpx
from livekit.agents import (
    APIConnectionError,
    APIStatusError,
    APITimeoutError,
    tts,
    utils,
)

import openai
from livekit.plugins.openai.tts import (
    OPENAI_TTS_SAMPLE_RATE,
    OPENAI_TTS_CHANNELS,
    ChunkedStream,
)


async def _run(self: ChunkedStream):
    print(f"TTS debug: {self.input_text}")
    oai_stream = self._client.audio.speech.with_streaming_response.create(
        input=self.input_text,
        model=self._opts.model,
        voice=self._opts.voice,  # type: ignore
        response_format="mp3",
        speed=self._opts.speed,
        timeout=httpx.Timeout(30, connect=self._conn_options.timeout),
    )
    decoder = utils.codecs.Mp3StreamDecoder()
    request_id = utils.shortuuid()
    audio_bstream = utils.audio.AudioByteStream(
        sample_rate=OPENAI_TTS_SAMPLE_RATE,
        num_channels=OPENAI_TTS_CHANNELS,
    )

    try:
        async with oai_stream as stream:
            async for data in stream.iter_bytes():
                for frame in decoder.decode_chunk(data):
                    for frame in audio_bstream.write(frame.data.tobytes()):
                        self._event_ch.send_nowait(
                            tts.SynthesizedAudio(
                                frame=frame,
                                request_id=request_id,
                            )
                        )

            for frame in audio_bstream.flush():
                self._event_ch.send_nowait(
                    tts.SynthesizedAudio(
                        frame=frame,
                        request_id=request_id,
                    )
                )

    except openai.APITimeoutError:
        raise APITimeoutError()
    except openai.APIStatusError as e:
        raise APIStatusError(
            e.message,
            status_code=e.status_code,
            request_id=e.request_id,
            body=e.body,
        )
    except Exception as e:
        raise APIConnectionError() from e


ChunkedStream._run = _run
@zhanghx0905 zhanghx0905 added the question Further information is requested label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant