Failed requests with Osiris proxy #45

vbehar · 2019-10-14T07:30:17Z

Environment:

Kubernetes distribution (e.g. AKS, ACS Engine, GKE, EKS, etc.): GKE & docker for mac
Kubernetes version (use kubectl version):

for GKE:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:00Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-gke.0", GitCommit:"569511c9540f78a94cc6a41d895c382d0946c11a", GitTreeState:"clean", BuildDate:"2019-08-21T23:28:44Z", GoVersion:"go1.11.13b4", Compiler:"gc", Platform:"linux/amd64"}

for docker-for-mac:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T12:38:00Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Osiris version (specify by SHA or semver): bb78180 & 472cbec
Install method (specify exact helm install command used):

helm install osiris/osiris-edge --name osiris --devel

What happened?

We used to run on GKE with an old version of Osiris - bb78180, before the http2 PR. Everything worked nicely for months.
We recently upgraded to a more recent commit to benefit from recent changes - 472cbec, after the http2 PR. And now we have failed requests from time to time. Rollbacking to the previous version fixed the issue.

What you expected to happen?

no failed requests.

How to reproduce it (as minimally and precisely as possible):

on docker-for-mac:

$ helm install osiris/osiris-edge --name osiris --devel
$ kubectl apply -f example/hello-osiris.yaml
$ ab -n 1000 -c 5 http://localhost:8080/

and then rollback to an older version:

$ helm upgrade --reset-values osiris osiris/osiris-edge --version=0.0.1-2018.12.21.23.19.56-bb78180
$ kubectl scale deployment hello-osiris --replicas 0
$ kubectl scale deployment hello-osiris --replicas 1
$ ab -n 1000 -c 5 http://localhost:8080/

Anything else that we need to know?

Output of the ab run with commit 472cbec (broken version):

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      5
Time taken for tests:   0.849 seconds
Complete requests:      1000
Failed requests:        20
   (Connect: 0, Receive: 0, Length: 20, Exceptions: 0)
Total transferred:      127400 bytes
HTML transferred:       12740 bytes
Requests per second:    1177.77 [#/sec] (mean)
Time per request:       4.245 [ms] (mean)
Time per request:       0.849 [ms] (mean, across all concurrent requests)
Transfer rate:          146.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    4   1.6      4      14
Waiting:        0    4   1.6      3      14
Total:          2    4   1.6      4      14

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      5
  80%      5
  90%      6
  95%      7
  98%      9
  99%     10
 100%     14 (longest request)

Notice the 20 failed requests.

Logs of the osiris-proxy container on the hello-osiris pod:

I1014 07:22:31.007410       1 proxy.go:12] Starting Osiris Proxy -- version devel -- commit 472cbec
I1014 07:22:31.007683       1 proxy.go:115] Healthz and metrics server is listening on :5004

Logs of the hello-osiris container:

2019/10/14 07:22:30 Listening for HTTP/1.x without TLS on :8080
2019/10/14 07:22:30 Listening for h2c (HTTP/2 without TLS) on :8081
2019/10/14 07:22:30 Listening for HTTPS (HTTP/1.x OR HTTP/2 with TLS) on :4430
2019/10/14 07:22:30 Listening for insecure gRPC (no TLS) on :8082
2019/10/14 07:22:30 Note: Due to limitations of SNI, Osiris only supports one TLS-enabled port per application, so this example does not demonstrate gRPC with TLS, although this combination should work.
2019/10/14 07:22:42 Received: GET / HTTP/1.1 (without TLS)
...

$ kubectl logs hello-osiris-6fdc75d555-l8dxj hello-osiris | grep "Received: GET / HTTP/1.1 (without TLS)" | wc -l
    1000

Output of the proxy metrics on :5004/metrics:

{
    "proxyId": "9ac00e99-ec38-4fda-8ea2-f41a26a602d6",
    "connectionsOpened": 1000,
    "connectionsClosed": 1000
}

Output of the ab run with commit bb78180 (good old version):

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        13 bytes

Concurrency Level:      5
Time taken for tests:   0.881 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      130000 bytes
HTML transferred:       13000 bytes
Requests per second:    1134.97 [#/sec] (mean)
Time per request:       4.405 [ms] (mean)
Time per request:       0.881 [ms] (mean, across all concurrent requests)
Transfer rate:          144.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     2    4   2.0      4      22
Waiting:        1    4   1.8      3      20
Total:          2    4   2.0      4      22

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      5
  75%      5
  80%      5
  90%      6
  95%      8
  98%     10
  99%     12
 100%     22 (longest request)

I'll try to investigate more, but I just wanted to write a bug report first, maybe you'll have an idea of the issue.

The text was updated successfully, but these errors were encountered:

vbehar · 2019-10-14T07:43:45Z

btw is there a reason for still using go 1.11 ? I saw in the source code that there is a copy of the httputil pkg from go 1.12

vbehar · 2019-10-14T12:29:31Z

what I found so far: when I hit the issue with curl, I have:

*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: application/json, */*

* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server

and if I add a time.Sleep(10*time.Second) for example in the singleConnectionListener.Accept func for the 2nd call to Accept, the test with the 1000 requests using ab works. 0 failed request:

func (s *singleConnectionListener) Accept() (net.Conn, error) {
	var c net.Conn
	s.once.Do(func() {
		c = s.conn
	})
	if c != nil {
		return c, nil
	}

	time.Sleep(10 * time.Second) // wait a little before returning

	return nil, io.EOF
}

I'll need to understand why waiting before returning from the Accept func fixes the issue...

vbehar · 2019-10-14T13:06:30Z

not closing the connection in dynamicProxy.ListenAndServe also fixes the issue. I found out that closing the connection returns a ErrNetClosing error. And if I don't close it, it seems like it's getting closed by something else, because it doesn't hold on any resources (I integrated prometheus in the proxy to make sure that we don't leak goroutines or file descriptors by not closing the conn)

vbehar mentioned this issue Oct 23, 2019

Proposal: proxy-less mode #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed requests with Osiris proxy #45

Failed requests with Osiris proxy #45

vbehar commented Oct 14, 2019 •

edited

Loading

vbehar commented Oct 14, 2019

vbehar commented Oct 14, 2019 •

edited

Loading

vbehar commented Oct 14, 2019

Failed requests with Osiris proxy #45

Failed requests with Osiris proxy #45

Comments

vbehar commented Oct 14, 2019 • edited Loading

vbehar commented Oct 14, 2019

vbehar commented Oct 14, 2019 • edited Loading

vbehar commented Oct 14, 2019

vbehar commented Oct 14, 2019 •

edited

Loading

vbehar commented Oct 14, 2019 •

edited

Loading