Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected tokens used in initial rate_limits.updated #42

Open
samreid opened this issue Oct 7, 2024 · 5 comments
Open

Unexpected tokens used in initial rate_limits.updated #42

samreid opened this issue Oct 7, 2024 · 5 comments

Comments

@samreid
Copy link

samreid commented Oct 7, 2024

On startup, I consistently see nearly 5000 tokens used on "connect". I commented out both addTool calls and set the instructions to 'test', yet I still see output like this on "connect":

00:01.33
server
rate_limits.updated
{
  "type": "rate_limits.updated",
  "event_id": "event_AFWylkC7LwdlyxIrYCHCt",
  "rate_limits": [
    {
      "name": "requests",
      "limit": 5000,
      "remaining": 4999,
      "reset_seconds": 0.012
    },
    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 15482,
      "reset_seconds": 13.554
    }
  ]
}

Observe that the remaining is 15482/20000. Is this to be expected?

Testing with 971323d on macbook air m1 in chrome Version 129.0.6668.90 (Official Build) (arm64).

Thanks!

UPDATE: I'm testing with the in-browser implementation, not the relay server.

@khorwood-openai
Copy link
Contributor

Hey there, can you provide your session ID for any sessions that you encounter this error with? Should be in the session.created event.

@samreid
Copy link
Author

samreid commented Oct 7, 2024

Yes, I just tested it again and had similar behavior. Here is the beginning of that session ID:

{
  "type": "session.created",
  "event_id": "event_AFohAvtiudEvVIO2crDBu",
  "session": {
    "id": "sess_AFoh9g0xJK5jqcwxMWOHz",
    "object": "realtime.session",
    "model": "gpt-4o-realtime-preview-2024-10-01",
    "expires_at": 1728334055,

The rate limits came out like:

    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 14989,
      "reset_seconds": 15.033
    }

This run does have some addTool calls and a paragraph for the conversation instructions.

@dnakov
Copy link

dnakov commented Oct 7, 2024

It's the max_response_tokens, it seems like it "reserves" those

@bakks
Copy link
Member

bakks commented Oct 8, 2024

@dnakov is correct - this is like a "reservation" rather than an immediate consumption. It's ~5000 because we're reserving 4096, the max model output size. I'm going to change this behavior to be more forgiving -- it should give you more headroom on the rate limits. Expect an improvement tomorrow.

@kyleboddy
Copy link

Did this ship @bakks? I've noticed this double counting / reserving costing far more than the OpenAI initial estimates from their models pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants