HTTP2 connection not being reset when error received #1401

davidmytton · 2024-08-24T20:00:05Z

A user is receiving various HTTP2 errors e.g. {"error":{"message":"[internal] Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM"}} and {"error":{"message":"[internal] Stream closed with error code NGHTTP2_INTERNAL_ERROR"}} for every call to decide, which is then being logged via report.

I'm unsure about the cause of this, but it may be a timeout or error on a previous request. This is referenced in various places, but all suggest fixes have already gone out:

The only way to resolve this is to restart the Node process, which isn't ideal. When we get these errors, can we re-establish the connection?

The text was updated successfully, but these errors were encountered:

blaine-arcjet · 2024-08-26T13:05:45Z

This is a message the load balancer sends back to the client. It can be handled in the client via killing the entire session and reopening from scratch, but we don't have control over that with ConnectRPC. This is something we can revisit when we write our own session manager, but in the meantime @davidmytton can look into the load balancer configuration.

davidmytton · 2024-08-26T13:18:33Z

The load balancer isn't under our control - it's from Fly.io. I've opened a support case with them.

davidmytton · 2024-08-30T14:39:15Z

Fly have now confirmed their proxy may return NGHTTP2_ENHANCE_YOUR_CALM in some circumstances. I'm still working with them to figure out why. I expect it's to do with the network jitter and tail latency we're seeing in yul and yuz whereby enough timeouts cause it to hit an error threshold.

davidmytton · 2024-08-31T21:09:43Z

We have had the high tail latency (network jitter) support case open with Fly.io for over a week. The tail latency causes internal slowdowns in our API which results in a timeout on the client, and I think goes back to the Fly proxy as a reset. Given enough timeouts / resets (1024 by default), this causes the Fly proxy to trigger ENHANCE_YOUR_CALM (here) which node reports as NGHTTP2_ENHANCE_YOUR_CALM.

This is what hyperium/h2 describes at:

https://github.com/hyperium/h2/blob/90359ba6d38843b106967a6ac9419a500ea26873/src/server.rs#L892

However, that error isn't being handled properly by the client - it continues to send requests on the same connection. What we see is the Arcjet SDK client connects to our API endpoint on Fly, Fly proxy proxies it our app, the API processes the request normally and returns the response, the client gets NGHTTP2_ENHANCE_YOUR_CALM. It never sees the response we send.

Some additional notes:

Fly report they're receiving a reset - could this be the connection being reset due to the timeout? See the stream tracking logging below (from here in hyperium/h2)
The full error that the Arcjet client sees is [internal] Stream closed with error code NGHTTP2_INTERNAL_ERROR
When it receives that error it should end the stream

https://github.com/connectrpc/connect-es/blob/dc7d75a1eaaac12a06b4dcffd0cfd370c5432648/packages/connect-node/src/node-universal-handler.spec.ts#L122

There may be a bug in connect-es because the NGHTTP2_ENHANCE_YOUR_CALM error should reset the connection. However, it looks like HTTP2 Requests hang upon receiving 'Stream closed with error code NGHTTP2_INTERNAL_ERROR' nodejs/undici#2675 may prevent that.
RST_STREAM Handling connectrpc/connect-es#1108 may also be relevant
The error is coming from Node internal:

https://github.com/nodejs/node/blob/2f0b3713efd1f8f06a38115393cee8531803ce43/lib/internal/errors.js#L1307

Which I think is then coming into connect-es here:

https://github.com/connectrpc/connect-es/blob/dc7d75a1eaaac12a06b4dcffd0cfd370c5432648/packages/connect-node/src/node-error.ts#L22

Stream tracking

Aug 31, 2024 @ 19:59:04.147141000
received Reset { stream_id: StreamId(1306619), error_code: ENHANCE_YOUR_CALM }

Aug 31, 2024 @ 19:59:04.146676000
send Data { stream_id: StreamId(1306619), flags: (0x1: END_STREAM) }

Aug 31, 2024 @ 19:59:04.146645000
send Headers { stream_id: StreamId(1306619), flags: (0x4: END_HEADERS) }

Aug 31, 2024 @ 19:59:04.142444000
received Data { stream_id: StreamId(1306619), flags: (0x1: END_STREAM) }

Aug 31, 2024 @ 19:59:04.141560000
received Data { stream_id: StreamId(1306619) }

Aug 31, 2024 @ 19:59:04.141500000
received Headers { stream_id: StreamId(1306619), flags: (0x4: END_HEADERS) }

davidmytton added bug Something isn't working node Related to the Node SDK. labels Aug 24, 2024

davidmytton assigned blaine-arcjet Aug 24, 2024

davidmytton changed the title ~~Connection not being reset when error received~~ HTTP2 connection not being reset when error received Aug 25, 2024

davidmytton assigned davidmytton and unassigned blaine-arcjet Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP2 connection not being reset when error received #1401

HTTP2 connection not being reset when error received #1401

davidmytton commented Aug 24, 2024

blaine-arcjet commented Aug 26, 2024

davidmytton commented Aug 26, 2024

davidmytton commented Aug 30, 2024

davidmytton commented Aug 31, 2024 •

edited

Loading

HTTP2 connection not being reset when error received #1401

HTTP2 connection not being reset when error received #1401

Comments

davidmytton commented Aug 24, 2024

blaine-arcjet commented Aug 26, 2024

davidmytton commented Aug 26, 2024

davidmytton commented Aug 30, 2024

davidmytton commented Aug 31, 2024 • edited Loading

davidmytton commented Aug 31, 2024 •

edited

Loading