Unexpected tokens used in initial rate_limits.updated #42

samreid · 2024-10-07T01:43:26Z

On startup, I consistently see nearly 5000 tokens used on "connect". I commented out both addTool calls and set the instructions to 'test', yet I still see output like this on "connect":

00:01.33
server
rate_limits.updated
{
  "type": "rate_limits.updated",
  "event_id": "event_AFWylkC7LwdlyxIrYCHCt",
  "rate_limits": [
    {
      "name": "requests",
      "limit": 5000,
      "remaining": 4999,
      "reset_seconds": 0.012
    },
    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 15482,
      "reset_seconds": 13.554
    }
  ]
}

Observe that the remaining is 15482/20000. Is this to be expected?

Testing with 971323d on macbook air m1 in chrome Version 129.0.6668.90 (Official Build) (arm64).

Thanks!

UPDATE: I'm testing with the in-browser implementation, not the relay server.

The text was updated successfully, but these errors were encountered:

khorwood-openai · 2024-10-07T18:21:49Z

Hey there, can you provide your session ID for any sessions that you encounter this error with? Should be in the session.created event.

samreid · 2024-10-07T20:36:39Z

Yes, I just tested it again and had similar behavior. Here is the beginning of that session ID:

{
  "type": "session.created",
  "event_id": "event_AFohAvtiudEvVIO2crDBu",
  "session": {
    "id": "sess_AFoh9g0xJK5jqcwxMWOHz",
    "object": "realtime.session",
    "model": "gpt-4o-realtime-preview-2024-10-01",
    "expires_at": 1728334055,

The rate limits came out like:

    {
      "name": "tokens",
      "limit": 20000,
      "remaining": 14989,
      "reset_seconds": 15.033
    }

This run does have some addTool calls and a paragraph for the conversation instructions.

dnakov · 2024-10-07T21:47:03Z

It's the max_response_tokens, it seems like it "reserves" those

bakks · 2024-10-08T01:37:14Z

@dnakov is correct - this is like a "reservation" rather than an immediate consumption. It's ~5000 because we're reserving 4096, the max model output size. I'm going to change this behavior to be more forgiving -- it should give you more headroom on the rate limits. Expect an improvement tomorrow.

kyleboddy · 2024-10-10T01:59:25Z

Did this ship @bakks? I've noticed this double counting / reserving costing far more than the OpenAI initial estimates from their models pages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected tokens used in initial rate_limits.updated #42

Unexpected tokens used in initial rate_limits.updated #42

samreid commented Oct 7, 2024 •

edited

Loading

khorwood-openai commented Oct 7, 2024

samreid commented Oct 7, 2024

dnakov commented Oct 7, 2024

bakks commented Oct 8, 2024

kyleboddy commented Oct 10, 2024

Unexpected tokens used in initial rate_limits.updated #42

Unexpected tokens used in initial rate_limits.updated #42

Comments

samreid commented Oct 7, 2024 • edited Loading

khorwood-openai commented Oct 7, 2024

samreid commented Oct 7, 2024

dnakov commented Oct 7, 2024

bakks commented Oct 8, 2024

kyleboddy commented Oct 10, 2024

samreid commented Oct 7, 2024 •

edited

Loading