-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadline exceeded errors are increasing during grpc client deployment #11860
Comments
Additional info: Errors are increasing on the new pods that are created during deployment. |
When connections are established to begin with there will be name resolution and connection establishment delays that may include TLS handshake. So the initial set of rpcs will face additional delays. Once connections have been established, they stay up, and any further rpcs don't have to wait for the connection establishment delays and will be faster. You can try increasing the timeout to a more practical value. You can also try using the Round Robin load balancer on the gRPC client instead of the default PickFirst load balancer that always chooses the first available connection for a RPC. Also if you are using maxConcurrentCallsPerConnection on the |
I thought to increase timeout to 3s but this affects deadlines when app is running, and it is 50% increase, which will affect circuit breaker to open after longer time, than when deadlines are 2s. |
It is ultimately a RPC deadline regardless of whether it is because the connection didn't establish in time or if the RPC took a long time after successfully connecting to the server. |
I hope your question is answered. Please comment to reopen if required. |
Versions on client side
io.grpc:grpc-netty-shaded: '1.68.1'
io.grpc:grpc-netty: '1.68.1'
io.grpc:grpc-stub: '1.68.1'
io.grpc:grpc-protobuf: '1.68.1'
Setting
timeout = 2000ms for deadline below
Environment
Running one app using JDK23 as grpc client on openshift
Running one app as grpc server on openshift
Approximate load
~100 calls per second
Problem
During client app deployment, deadline exceeded errors are increasing and then decreasing, getting back to normal.
Expected
No deadline exceeded errors, calls can be sent and gets response from server.
Findings
client log time before call is sent at 12:50:45,895
client log after response at 12:50:49,103
client log “CallOptions deadline exceeded” at 12:50:49.011 -> supposed to be 12:50:47?
server log when call received at 12:50:48.844 -> ~+3sec
Question
What could be the reason of increasing deadline exceeded errors? Could it be related to http/2 pooling or concurrent streams or something else that I couldn't find any clue on the web. Please comment if you need more code pieces/configs.
Thank you in advance.
The text was updated successfully, but these errors were encountered: