Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a SendAudio overload that allows streaming through something other than Stream #242

Open
1 task done
lucasmeijer opened this issue Oct 3, 2024 · 3 comments
Open
1 task done

Comments

@lucasmeijer
Copy link

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

  • This is a feature request for the .NET library

Describe the feature or improvement you are requesting

I'm experimenting with the new c# bindings for the realtime voice api. So far so good, except the sending of audio is not great. My audio understandably comes from a microphone. I get a bunch of samples at a time. Right now the only way to get this to the c# binding is by SendAudioAsync(Stream). people in this common scenario have no stream though. the api is forcing people to put the samples in a MemoryStream, which is inefficient, as it will just grow and grow.

The other overload SendAudioAsync(BinaryData) looked promising, but it only supports sending a complete recording, which is a very rare scenario for a realtime api.

Additional context

No response

@trrwilson
Copy link
Collaborator

Hello, @lucasmeijer, and thanks for diving into the Realtime API!

That BinaryData-based overload you came across is the intended way to accomplish event-driven audio like you're describing; it maps directly to the WebSocket protocol's underlying input_audio_buffer.append command and can be called repeatedly for bite-sized chunks of input; it doesn't need to be a whole recording at all once!

With the microphone input you're using, what kind of samples are you working with and what would make things easier to integrate? Input audio integration is one of the areas where we'd like to facilitate as much as we can -- although sending those BinaryData blocks individually should be able to make it work, it's not as pleasant or idiomatic as it'd ideally be.

@lucasmeijer
Copy link
Author

Hey @trrwilson

I misinterpreted the binarydataoverload, because it's broken:

in,

public async Task SendAudioAsync(BinaryData audio, CancellationToken cancellationToken = default)

you're forgetting to reset _sendingAudio to false like you do in the Stream version.

To your other question:
I'm talking to my microphone directly from c#. there's not great support for this in c#, so I'm pinvoking into libbass. It's a bit annoying, but not the end of the world. I suspect "add great media support to .net" is a bit out of scope for the openai library :). most people probably get the audio from somewhere else, so should be fine.

I really appreciate you having kept SendCommandAsync public, so I can work around this bug for now by just sending my own json payload.

@trrwilson
Copy link
Collaborator

Ah, thanks @lucasmeijer -- you're absolutely right, that overload will block forever after the first send. I've corrected the behavior (and added tests) in a development branch and we'll get that fixed in the next preview release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants