You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 27, 2021. It is now read-only.
We used to run on GKE with an old version of Osiris - bb78180, before the http2 PR. Everything worked nicely for months.
We recently upgraded to a more recent commit to benefit from recent changes - 472cbec, after the http2 PR. And now we have failed requests from time to time. Rollbacking to the previous version fixed the issue.
What you expected to happen?
no failed requests.
How to reproduce it (as minimally and precisely as possible):
Output of the ab run with commit 472cbec (broken version):
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 13 bytes
Concurrency Level: 5
Time taken for tests: 0.849 seconds
Complete requests: 1000
Failed requests: 20
(Connect: 0, Receive: 0, Length: 20, Exceptions: 0)
Total transferred: 127400 bytes
HTML transferred: 12740 bytes
Requests per second: 1177.77 [#/sec] (mean)
Time per request: 4.245 [ms] (mean)
Time per request: 0.849 [ms] (mean, across all concurrent requests)
Transfer rate: 146.53 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 1 4 1.6 4 14
Waiting: 0 4 1.6 3 14
Total: 2 4 1.6 4 14
Percentage of the requests served within a certain time (ms)
50% 4
66% 4
75% 5
80% 5
90% 6
95% 7
98% 9
99% 10
100% 14 (longest request)
Notice the 20 failed requests.
Logs of the osiris-proxy container on the hello-osiris pod:
I1014 07:22:31.007410 1 proxy.go:12] Starting Osiris Proxy -- version devel -- commit 472cbec
I1014 07:22:31.007683 1 proxy.go:115] Healthz and metrics server is listening on :5004
Logs of the hello-osiris container:
2019/10/14 07:22:30 Listening for HTTP/1.x without TLS on :8080
2019/10/14 07:22:30 Listening for h2c (HTTP/2 without TLS) on :8081
2019/10/14 07:22:30 Listening for HTTPS (HTTP/1.x OR HTTP/2 with TLS) on :4430
2019/10/14 07:22:30 Listening for insecure gRPC (no TLS) on :8082
2019/10/14 07:22:30 Note: Due to limitations of SNI, Osiris only supports one TLS-enabled port per application, so this example does not demonstrate gRPC with TLS, although this combination should work.
2019/10/14 07:22:42 Received: GET / HTTP/1.1 (without TLS)
...
what I found so far: when I hit the issue with curl, I have:
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: application/json, */*
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
and if I add a time.Sleep(10*time.Second) for example in the singleConnectionListener.Accept func for the 2nd call to Accept, the test with the 1000 requests using ab works. 0 failed request:
func (s *singleConnectionListener) Accept() (net.Conn, error) {
var c net.Conn
s.once.Do(func() {
c = s.conn
})
if c != nil {
return c, nil
}
time.Sleep(10 * time.Second) // wait a little before returning
return nil, io.EOF
}
I'll need to understand why waiting before returning from the Accept func fixes the issue...
not closing the connection in dynamicProxy.ListenAndServe also fixes the issue. I found out that closing the connection returns a ErrNetClosing error. And if I don't close it, it seems like it's getting closed by something else, because it doesn't hold on any resources (I integrated prometheus in the proxy to make sure that we don't leak goroutines or file descriptors by not closing the conn)
Environment:
kubectl version
):for GKE:
for docker-for-mac:
bb78180
&472cbec
helm install
command used):What happened?
We used to run on GKE with an old version of Osiris -
bb78180
, before the http2 PR. Everything worked nicely for months.We recently upgraded to a more recent commit to benefit from recent changes -
472cbec
, after the http2 PR. And now we have failed requests from time to time. Rollbacking to the previous version fixed the issue.What you expected to happen?
no failed requests.
How to reproduce it (as minimally and precisely as possible):
on docker-for-mac:
and then rollback to an older version:
Anything else that we need to know?
Output of the
ab
run with commit472cbec
(broken version):Notice the
20
failed requests.Logs of the
osiris-proxy
container on thehello-osiris
pod:Logs of the
hello-osiris
container:Output of the proxy metrics on
:5004/metrics
:Output of the
ab
run with commitbb78180
(good old version):I'll try to investigate more, but I just wanted to write a bug report first, maybe you'll have an idea of the issue.
The text was updated successfully, but these errors were encountered: