Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KituraKit tests fail with SwiftyRequest 2.0.6 #69

Closed
djones6 opened this issue Jul 9, 2019 · 30 comments
Closed

KituraKit tests fail with SwiftyRequest 2.0.6 #69

djones6 opened this issue Jul 9, 2019 · 30 comments
Assignees
Milestone

Comments

@djones6
Copy link
Contributor

djones6 commented Jul 9, 2019

The KituraKit CI tests have started failing on Linux with SwiftyRequest 2.0.6 or newer:
https://travis-ci.org/IBM-Swift/KituraKit/builds/554336585

They passed with SwiftyRequest 2.0.5:
https://travis-ci.org/IBM-Swift/KituraKit/builds/526027333

It looks likely that it relates to this change: d9915f2

  • my guess would be the use of URLSession.shared, though I haven't investigated beyond looking at this diff.

The symptom seems to be this:

  • the first test that performs a POST request (MainTests.testClientPost) passes.
  • the next test also performs a POST request (MainTests.testClientPostWithIdentifier) and this fails due to a timeout.

(Note: the next test fails with a connect error, however this seems to be an unrelated symptom of the Kitura test server not being stopped+started again in time for the test execution when a previous test times out.)

However, if I reorder / comment out tests, then different tests fail with the same pattern: for example, if I comment out the testClientPost test, then testClientPostWithIdentifier passes, and the test following it fails with a timeout.

I did a little digging with some debugging

This revealed that the first POST test sends a complete HTTP message:

^[[0;39m[DEBUG] [IncomingHTTPSocketProcessor.swift:245 parse(_:from:completeBuffer:)] HTTP parser parsed 316 out of 316 bytes for socket 12^[[0;39m
^[[0;39m[DEBUG] [IncomingHTTPSocketProcessor.swift:246 parse(_:from:completeBuffer:)] Buffered bytes:
POST /users HTTP/1.1^M
Host: localhost:8080^M
Accept-Encoding: deflate, gzip^M
Accept: application/json^M
Content-Type: application/json^M
Connection: keep-alive^M
User-Agent: KituraKitPackageTests.xctest (unknown version) curl/7.35.0^M
Accept-Language: en^M
Content-Length: 43^M
^M
{"name":"John Doe","date":540899256,"id":5}^[[0;39m
^[[0;39m[VERBOSE] [HTTPServerRequest.swift:333 parsingCompleted()] HTTP request from=127.0.0.1; proto=http;^[[0;39m
^[[0;39m[VERBOSE] [CodableRouter.swift:362 postSafely(_:handler:)] Received POST type-safe request^[[0;39m

But the next test sends an incomplete message (Content-Length is specified, but no message body is written):

^[[0;39m[DEBUG] [IncomingHTTPSocketProcessor.swift:245 parse(_:from:completeBuffer:)] HTTP parser parsed 275 out of 275 bytes for socket 20^[[0;39m
^[[0;39m[DEBUG] [IncomingHTTPSocketProcessor.swift:246 parse(_:from:completeBuffer:)] Buffered bytes:
POST /usersid HTTP/1.1^M
Host: localhost:8080^M
Accept-Encoding: deflate, gzip^M
Accept: application/json^M
Content-Type: application/json^M
Connection: keep-alive^M
User-Agent: KituraKitPackageTests.xctest (unknown version) curl/7.35.0^M
Accept-Language: en^M
Content-Length: 43^M
^M
^[[0;39m
/home/djones6/swift51_repo_update/KituraKit/Tests/KituraKitTests/MainTests.swift:192: error: MainTests.testClientPostWithIdentifier : Asynchronous wait failed - Exceeded timeout of 3.0 seconds, with unfulfilled expectations: A response is received from the server -> user

This causes Kitura to wait indefinitely for the remainder of the message to be delivered, but it never is, and the test then times out while expecting a response.

@pushkarnk pushkarnk self-assigned this Jul 10, 2019
@pushkarnk
Copy link

This issue doesn't reproduce with Swift 5.0.1 on trusty, but it does with 5.0.1 on xenial.

@pushkarnk
Copy link

Correction: I do see failures on trusty too.

@pushkarnk
Copy link

I ran the tests with KITURA_NIO=1 close to 50 times and didn't see the failure. Wondering if this is something to do only with the KituraKit tests 🤔

@djones6 djones6 added this to the 2019.14 milestone Jul 10, 2019
@pushkarnk
Copy link

URLSession.shared? NIO/Net Result
No Net Pass
No NIO Pass
Yes Net Fail
Yes NIO Pass

@pushkarnk
Copy link

A significant difference while running with a shared URLSession and Kitura-net is the number of HTTP requests received for the failing test. In the passing case, the number of HTTP requests received is 1. In the failing case it is 2, the first among which seems to be coming on the connection established for the previous test. This has started like a URLSession issue that gets hidden with SwiftNIO running on the server side.

@pushkarnk
Copy link

So, I can confirm that if we don't use keep-alive on each of the connections URLSession creates with the server, the tests doesn't fail. With keep-alive, the failing test's HTTP request is being sent on the connection created for the previous test, which may not be a problem. The problem is that a duplicate request is subsequently sent on a new connection.

@pushkarnk
Copy link

pushkarnk commented Jul 12, 2019

Studied the two subsequent URLSession.dataTask calls for MainTests.testClientPost and MainTests.testClientPostWithIdentifier again. The requests and responses dumped by tcpdump make things more confusing:

So, in short, there are two POST requests.

  • The first request has body {"name":"John Doe","date":540899256,"id":5}.
  • The second request has body {"name":"Jonathan Swift","date":540899256,"id":5}

This is the tcpdump output in the failing case is:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes

Header of the first request arrives from remote port 39558

06:35:38.527113 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 2204924601:2204924874, ack 526032785, win 342, options [nop,nop,TS val 1482498739 ecr 1482498739], length 273: HTTP: POST /users HTTP/1.1
	0x0000:  4500 0145 c05c 4000 4006 7b54 7f00 0001  E..E.\@.@.{T....
	0x0010:  7f00 0001 9a86 1f90 836c 7ab9 1f5a 9f91  .........lz..Z..
	0x0020:  8018 0156 ff39 0000 0101 080a 585d 22b3  ...V.9......X]".
	0x0030:  585d 22b3 504f 5354 202f 7573 6572 7320  X]".POST./users.
	0x0040:  4854 5450 2f31 2e31 0d0a 486f 7374 3a20  HTTP/1.1..Host:.
	0x0050:  6c6f 6361 6c68 6f73 743a 3830 3830 0d0a  localhost:8080..
	0x0060:  4163 6365 7074 2d45 6e63 6f64 696e 673a  Accept-Encoding:
	0x0070:  2064 6566 6c61 7465 2c20 677a 6970 0d0a  .deflate,.gzip..
	0x0080:  436f 6e74 656e 742d 5479 7065 3a20 6170  Content-Type:.ap
	0x0090:  706c 6963 6174 696f 6e2f 6a73 6f6e 0d0a  plication/json..
	0x00a0:  4163 6365 7074 3a20 6170 706c 6963 6174  Accept:.applicat
	0x00b0:  696f 6e2f 6a73 6f6e 0d0a 436f 6e6e 6563  ion/json..Connec
	0x00c0:  7469 6f6e 3a20 6b65 6570 2d61 6c69 7665  tion:.keep-alive
	0x00d0:  0d0a 5573 6572 2d41 6765 6e74 3a20 4b69  ..User-Agent:.Ki
	0x00e0:  7475 7261 4b69 7450 6163 6b61 6765 5465  turaKitPackageTe
	0x00f0:  7374 732e 7863 7465 7374 2028 756e 6b6e  sts.xctest.(unkn
	0x0100:  6f77 6e20 7665 7273 696f 6e29 2063 7572  own.version).cur
	0x0110:  6c2f 372e 3538 2e30 0d0a 4163 6365 7074  l/7.58.0..Accept
	0x0120:  2d4c 616e 6775 6167 653a 2065 6e0d 0a43  -Language:.en..C
	0x0130:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x0140:  330d 0a0d 0a                             3....

Body of the first request arrives from remote port 39558

06:35:38.527162 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 273:316, ack 1, win 342, options [nop,nop,TS val 1482498739 ecr 1482498739], length 43: HTTP
	0x0000:  4500 005f c05d 4000 4006 7c39 7f00 0001  E.._.]@.@.|9....
	0x0010:  7f00 0001 9a86 1f90 836c 7bca 1f5a 9f91  .........l{..Z..
	0x0020:  8018 0156 fe53 0000 0101 080a 585d 22b3  ...V.S......X]".
	0x0030:  585d 22b3 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f68 6e20 446f 6522 7d    me":"John.Doe"}

Response for the first request is sent to remote port 39558

06:35:38.528090 IP localhost.http-alt > localhost.39558: Flags [P.], seq 1:205, ack 316, win 350, options [nop,nop,TS val 1482498740 ecr 1482498739], length 204: HTTP: HTTP/1.1 201 Created
	0x0000:  4500 0100 2575 4000 4006 1681 7f00 0001  E...%u@.@.......
	0x0010:  7f00 0001 1f90 9a86 1f5a 9f91 836c 7bf5  .........Z...l{.
	0x0020:  8018 015e fef4 0000 0101 080a 585d 22b4  ...^........X]".
	0x0030:  585d 22b3 4854 5450 2f31 2e31 2032 3031  X]".HTTP/1.1.201
	0x0040:  2043 7265 6174 6564 0d0a 4461 7465 3a20  .Created..Date:.
	0x0050:  4672 692c 2031 3220 4a75 6c20 3230 3139  Fri,.12.Jul.2019
	0x0060:  2030 353a 3335 3a33 3820 474d 540d 0a43  .05:35:38.GMT..C
	0x0070:  6f6e 7465 6e74 2d54 7970 653a 2061 7070  ontent-Type:.app
	0x0080:  6c69 6361 7469 6f6e 2f6a 736f 6e0d 0a43  lication/json..C
	0x0090:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x00a0:  330d 0a43 6f6e 6e65 6374 696f 6e3a 204b  3..Connection:.K
	0x00b0:  6565 702d 416c 6976 650d 0a4b 6565 702d  eep-Alive..Keep-
	0x00c0:  416c 6976 653a 2074 696d 656f 7574 3d36  Alive:.timeout=6
	0x00d0:  300d 0a0d 0a7b 2269 6422 3a35 2c22 6461  0....{"id":5,"da
	0x00e0:  7465 223a 3534 3038 3939 3235 362c 226e  te":540899256,"n
	0x00f0:  616d 6522 3a22 4a6f 686e 2044 6f65 227d  ame":"John.Doe"}

Header for the second request arrives on the same port 39558

06:35:38.534869 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 316:591, ack 205, win 350, options [nop,nop,TS val 1482498747 ecr 1482498740], length 275: HTTP: POST /usersid HTTP/1.1
	0x0000:  4500 0147 c05f 4000 4006 7b4f 7f00 0001  E..G._@.@.{O....
	0x0010:  7f00 0001 9a86 1f90 836c 7bf5 1f5a a05d  .........l{..Z.]
	0x0020:  8018 015e ff3b 0000 0101 080a 585d 22bb  ...^.;......X]".
	0x0030:  585d 22b4 504f 5354 202f 7573 6572 7369  X]".POST./usersi
	0x0040:  6420 4854 5450 2f31 2e31 0d0a 486f 7374  d.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3830 3830  :.localhost:8080
	0x0060:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
	0x0070:  673a 2064 6566 6c61 7465 2c20 677a 6970  g:.deflate,.gzip
	0x0080:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
	0x0090:  6170 706c 6963 6174 696f 6e2f 6a73 6f6e  application/json
	0x00a0:  0d0a 4163 6365 7074 3a20 6170 706c 6963  ..Accept:.applic
	0x00b0:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e6e  ation/json..Conn
	0x00c0:  6563 7469 6f6e 3a20 6b65 6570 2d61 6c69  ection:.keep-ali
	0x00d0:  7665 0d0a 5573 6572 2d41 6765 6e74 3a20  ve..User-Agent:.
	0x00e0:  4b69 7475 7261 4b69 7450 6163 6b61 6765  KituraKitPackage
	0x00f0:  5465 7374 732e 7863 7465 7374 2028 756e  Tests.xctest.(un
	0x0100:  6b6e 6f77 6e20 7665 7273 696f 6e29 2063  known.version).c
	0x0110:  7572 6c2f 372e 3538 2e30 0d0a 4163 6365  url/7.58.0..Acce
	0x0120:  7074 2d4c 616e 6775 6167 653a 2065 6e0d  pt-Language:.en.
	0x0130:  0a43 6f6e 7465 6e74 2d4c 656e 6774 683a  .Content-Length:
	0x0140:  2034 390d 0a0d 0a                        .49....

Body of the second request arrives from port 39558

06:35:38.534913 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 591:640, ack 205, win 350, options [nop,nop,TS val 1482498747 ecr 1482498740], length 49: HTTP
	0x0000:  4500 0065 c060 4000 4006 7c30 7f00 0001  E..e.`@.@.|0....
	0x0010:  7f00 0001 9a86 1f90 836c 7d08 1f5a a05d  .........l}..Z.]
	0x0020:  8018 015e fe59 0000 0101 080a 585d 22bb  ...^.Y......X]".
	0x0030:  585d 22b4 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f6e 6174 6861 6e20 5377  me":"Jonathan.Sw
	0x0060:  6966 7422 7d                             ift"}

A duplicate body pertaining to the second request arrives from port 39558

06:35:38.550290 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 591:640, ack 205, win 350, options [nop,nop,TS val 1482498762 ecr 1482498740], length 49: HTTP
	0x0000:  4500 0065 c061 4000 4006 7c2f 7f00 0001  E..e.a@.@.|/....
	0x0010:  7f00 0001 9a86 1f90 836c 7d08 1f5a a05d  .........l}..Z.]
	0x0020:  8018 015e fe59 0000 0101 080a 585d 22ca  ...^.Y......X]".
	0x0030:  585d 22b4 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f6e 6174 6861 6e20 5377  me":"Jonathan.Sw
	0x0060:  6966 7422 7d                             ift"}

Header of the second request arrives from a new remote port 39562

06:35:38.573892 IP localhost.39562 > localhost.http-alt: Flags [P.], seq 2927441858:2927442133, ack 3923650617, win 342, options [nop,nop,TS val 1482498786 ecr 1482498786], length 275: HTTP: POST /usersid HTTP/1.1
	0x0000:  4500 0147 8d2e 4000 4006 ae80 7f00 0001  E..G..@.@.......
	0x0010:  7f00 0001 9a8a 1f90 ae7d 37c2 e9de 2839  .........}7...(9
	0x0020:  8018 0156 ff3b 0000 0101 080a 585d 22e2  ...V.;......X]".
	0x0030:  585d 22e2 504f 5354 202f 7573 6572 7369  X]".POST./usersi
	0x0040:  6420 4854 5450 2f31 2e31 0d0a 486f 7374  d.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3830 3830  :.localhost:8080
	0x0060:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
	0x0070:  673a 2064 6566 6c61 7465 2c20 677a 6970  g:.deflate,.gzip
	0x0080:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
	0x0090:  6170 706c 6963 6174 696f 6e2f 6a73 6f6e  application/json
	0x00a0:  0d0a 4163 6365 7074 3a20 6170 706c 6963  ..Accept:.applic
	0x00b0:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e6e  ation/json..Conn
	0x00c0:  6563 7469 6f6e 3a20 6b65 6570 2d61 6c69  ection:.keep-ali
	0x00d0:  7665 0d0a 5573 6572 2d41 6765 6e74 3a20  ve..User-Agent:.
	0x00e0:  4b69 7475 7261 4b69 7450 6163 6b61 6765  KituraKitPackage
	0x00f0:  5465 7374 732e 7863 7465 7374 2028 756e  Tests.xctest.(un
	0x0100:  6b6e 6f77 6e20 7665 7273 696f 6e29 2063  known.version).c
	0x0110:  7572 6c2f 372e 3538 2e30 0d0a 4163 6365  url/7.58.0..Acce
	0x0120:  7074 2d4c 616e 6775 6167 653a 2065 6e0d  pt-Language:.en.
	0x0130:  0a43 6f6e 7465 6e74 2d4c 656e 6774 683a  .Content-Length:
	0x0140:  2034 390d 0a0d 0a                        .49....

No body for the second request arrives from the new port 39562

@pushkarnk
Copy link

Surprisingly, things are quite straightforward with Kitura-NIO:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes

Header for the first request arrives from remote port 39596

06:37:32.514466 IP localhost.39596 > localhost.http-alt: Flags [P.], seq 3181728486:3181728759, ack 4033032946, win 342, options [nop,nop,TS val 1482612727 ecr 1482612726], length 273: HTTP: POST /users HTTP/1.1
	0x0000:  4500 0145 78d9 4000 4006 c2d7 7f00 0001  E..Ex.@.@.......
	0x0010:  7f00 0001 9aac 1f90 bda5 52e6 f063 32f2  ..........R..c2.
	0x0020:  8018 0156 ff39 0000 0101 080a 585e dff7  ...V.9......X^..
	0x0030:  585e dff6 504f 5354 202f 7573 6572 7320  X^..POST./users.
	0x0040:  4854 5450 2f31 2e31 0d0a 486f 7374 3a20  HTTP/1.1..Host:.
	0x0050:  6c6f 6361 6c68 6f73 743a 3830 3830 0d0a  localhost:8080..
	0x0060:  4163 6365 7074 2d45 6e63 6f64 696e 673a  Accept-Encoding:
	0x0070:  2064 6566 6c61 7465 2c20 677a 6970 0d0a  .deflate,.gzip..
	0x0080:  436f 6e74 656e 742d 5479 7065 3a20 6170  Content-Type:.ap
	0x0090:  706c 6963 6174 696f 6e2f 6a73 6f6e 0d0a  plication/json..
	0x00a0:  4163 6365 7074 3a20 6170 706c 6963 6174  Accept:.applicat
	0x00b0:  696f 6e2f 6a73 6f6e 0d0a 436f 6e6e 6563  ion/json..Connec
	0x00c0:  7469 6f6e 3a20 6b65 6570 2d61 6c69 7665  tion:.keep-alive
	0x00d0:  0d0a 5573 6572 2d41 6765 6e74 3a20 4b69  ..User-Agent:.Ki
	0x00e0:  7475 7261 4b69 7450 6163 6b61 6765 5465  turaKitPackageTe
	0x00f0:  7374 732e 7863 7465 7374 2028 756e 6b6e  sts.xctest.(unkn
	0x0100:  6f77 6e20 7665 7273 696f 6e29 2063 7572  own.version).cur
	0x0110:  6c2f 372e 3538 2e30 0d0a 4163 6365 7074  l/7.58.0..Accept
	0x0120:  2d4c 616e 6775 6167 653a 2065 6e0d 0a43  -Language:.en..C
	0x0130:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x0140:  330d 0a0d 0a                             3....

Body pertaining to the first request arrives from remote port 39596

06:37:32.514540 IP localhost.39596 > localhost.http-alt: Flags [P.], seq 273:316, ack 1, win 342, options [nop,nop,TS val 1482612727 ecr 1482612727], length 43: HTTP
	0x0000:  4500 005f 78da 4000 4006 c3bc 7f00 0001  E.._x.@.@.......
	0x0010:  7f00 0001 9aac 1f90 bda5 53f7 f063 32f2  ..........S..c2.
	0x0020:  8018 0156 fe53 0000 0101 080a 585e dff7  ...V.S......X^..
	0x0030:  585e dff7 7b22 6964 223a 352c 226e 616d  X^..{"id":5,"nam
	0x0040:  6522 3a22 4a6f 686e 2044 6f65 222c 2264  e":"John.Doe","d
	0x0050:  6174 6522 3a35 3430 3839 3932 3536 7d    ate":540899256}

Response is sent to remote port 39596

06:37:32.518937 IP localhost.http-alt > localhost.39596: Flags [P.], seq 1:207, ack 316, win 350, options [nop,nop,TS val 1482612731 ecr 1482612727], length 206: HTTP: HTTP/1.1 201 Created
	0x0000:  4500 0102 18c3 4000 4006 2331 7f00 0001  E.....@.@.#1....
	0x0010:  7f00 0001 1f90 9aac f063 32f2 bda5 5422  .........c2...T"
	0x0020:  8018 015e fef6 0000 0101 080a 585e dffb  ...^........X^..
	0x0030:  585e dff7 4854 5450 2f31 2e31 2032 3031  X^..HTTP/1.1.201
	0x0040:  2043 7265 6174 6564 0d0a 4461 7465 3a20  .Created..Date:.
	0x0050:  4672 692c 2031 3220 4a75 6c20 3230 3139  Fri,.12.Jul.2019
	0x0060:  2030 353a 3337 3a33 3220 474d 540d 0a43  .05:37:32.GMT..C
	0x0070:  6f6e 7465 6e74 2d54 7970 653a 2061 7070  ontent-Type:.app
	0x0080:  6c69 6361 7469 6f6e 2f6a 736f 6e0d 0a43  lication/json..C
	0x0090:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x00a0:  330d 0a43 6f6e 6e65 6374 696f 6e3a 204b  3..Connection:.K
	0x00b0:  6565 702d 416c 6976 650d 0a4b 6565 702d  eep-Alive..Keep-
	0x00c0:  416c 6976 653a 2074 696d 656f 7574 3d36  Alive:.timeout=6
	0x00d0:  302e 300d 0a0d 0a7b 226e 616d 6522 3a22  0.0....{"name":"
	0x00e0:  4a6f 686e 2044 6f65 222c 2269 6422 3a35  John.Doe","id":5
	0x00f0:  2c22 6461 7465 223a 3534 3038 3939 3235  ,"date":54089925
	0x0100:  367d                                     6}

Header of the second request arrives from the same port 39596

06:37:32.559200 IP localhost.39596 > localhost.http-alt: Flags [P.], seq 316:591, ack 207, win 350, options [nop,nop,TS val 1482612771 ecr 1482612731], length 275: HTTP: POST /usersid HTTP/1.1
	0x0000:  4500 0147 78dc 4000 4006 c2d2 7f00 0001  E..Gx.@.@.......
	0x0010:  7f00 0001 9aac 1f90 bda5 5422 f063 33c0  ..........T".c3.
	0x0020:  8018 015e ff3b 0000 0101 080a 585e e023  ...^.;......X^.#
	0x0030:  585e dffb 504f 5354 202f 7573 6572 7369  X^..POST./usersi
	0x0040:  6420 4854 5450 2f31 2e31 0d0a 486f 7374  d.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3830 3830  :.localhost:8080
	0x0060:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
	0x0070:  673a 2064 6566 6c61 7465 2c20 677a 6970  g:.deflate,.gzip
	0x0080:  0d0a 4163 6365 7074 3a20 6170 706c 6963  ..Accept:.applic
	0x0090:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e74  ation/json..Cont
	0x00a0:  656e 742d 5479 7065 3a20 6170 706c 6963  ent-Type:.applic
	0x00b0:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e6e  ation/json..Conn
	0x00c0:  6563 7469 6f6e 3a20 6b65 6570 2d61 6c69  ection:.keep-ali
	0x00d0:  7665 0d0a 5573 6572 2d41 6765 6e74 3a20  ve..User-Agent:.
	0x00e0:  4b69 7475 7261 4b69 7450 6163 6b61 6765  KituraKitPackage
	0x00f0:  5465 7374 732e 7863 7465 7374 2028 756e  Tests.xctest.(un
	0x0100:  6b6e 6f77 6e20 7665 7273 696f 6e29 2063  known.version).c
	0x0110:  7572 6c2f 372e 3538 2e30 0d0a 4163 6365  url/7.58.0..Acce
	0x0120:  7074 2d4c 616e 6775 6167 653a 2065 6e0d  pt-Language:.en.
	0x0130:  0a43 6f6e 7465 6e74 2d4c 656e 6774 683a  .Content-Length:
	0x0140:  2034 390d 0a0d 0a                        .49....

Body of the second request arrives from the same port 39596

06:37:32.559247 IP localhost.39596 > localhost.http-alt: Flags [P.], seq 591:640, ack 207, win 350, options [nop,nop,TS val 1482612771 ecr 1482612731], length 49: HTTP
	0x0000:  4500 0065 78dd 4000 4006 c3b3 7f00 0001  E..ex.@.@.......
	0x0010:  7f00 0001 9aac 1f90 bda5 5535 f063 33c0  ..........U5.c3.
	0x0020:  8018 015e fe59 0000 0101 080a 585e e023  ...^.Y......X^.#
	0x0030:  585e dffb 7b22 6964 223a 352c 226e 616d  X^..{"id":5,"nam
	0x0040:  6522 3a22 4a6f 6e61 7468 616e 2053 7769  e":"Jonathan.Swi
	0x0050:  6674 222c 2264 6174 6522 3a35 3430 3839  ft","date":54089
	0x0060:  3932 3536 7d                             9256}

Response for the second response sent to remote port 39596

06:37:32.560227 IP localhost.http-alt > localhost.39596: Flags [P.], seq 207:432, ack 640, win 359, options [nop,nop,TS val 1482612772 ecr 1482612771], length 225: HTTP: HTTP/1.1 201 Created
	0x0000:  4500 0115 18c5 4000 4006 231c 7f00 0001  E.....@.@.#.....
	0x0010:  7f00 0001 1f90 9aac f063 33c0 bda5 5566  .........c3...Uf
	0x0020:  8018 0167 ff09 0000 0101 080a 585e e024  ...g........X^.$
	0x0030:  585e e023 4854 5450 2f31 2e31 2032 3031  X^.#HTTP/1.1.201
	0x0040:  2043 7265 6174 6564 0d0a 4461 7465 3a20  .Created..Date:.
	0x0050:  4672 692c 2031 3220 4a75 6c20 3230 3139  Fri,.12.Jul.2019
	0x0060:  2030 353a 3337 3a33 3220 474d 540d 0a4c  .05:37:32.GMT..L
	0x0070:  6f63 6174 696f 6e3a 2035 0d0a 436f 6e74  ocation:.5..Cont
	0x0080:  656e 742d 5479 7065 3a20 6170 706c 6963  ent-Type:.applic
	0x0090:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e74  ation/json..Cont
	0x00a0:  656e 742d 4c65 6e67 7468 3a20 3439 0d0a  ent-Length:.49..
	0x00b0:  436f 6e6e 6563 7469 6f6e 3a20 4b65 6570  Connection:.Keep
	0x00c0:  2d41 6c69 7665 0d0a 4b65 6570 2d41 6c69  -Alive..Keep-Ali
	0x00d0:  7665 3a20 7469 6d65 6f75 743d 3630 2e30  ve:.timeout=60.0
	0x00e0:  0d0a 0d0a 7b22 6e61 6d65 223a 224a 6f6e  ....{"name":"Jon
	0x00f0:  6174 6861 6e20 5377 6966 7422 2c22 6964  athan.Swift","id
	0x0100:  223a 352c 2264 6174 6522 3a35 3430 3839  ":5,"date":54089
	0x0110:  3932 3536 7d                             9256}

@pushkarnk
Copy link

I wonder how the server (Net vs. NIO) changes the nature of the incoming request messages. I checked and confirmed that both Net and NIO use Connection: keep-alive; Keep-Alive: timeout=60. I wonder if we should start with libcurl on the top of our suspect list 🤔

@pushkarnk
Copy link

Ran the test with the latest libcurl (7.65.1) and I still see the failure.

@pushkarnk
Copy link

Ran the test with the latest libcurl (7.65.1) and I still see the failure.

So, this issue isn't related to Kitura/Kitura#1465

@pushkarnk pushkarnk modified the milestones: 2019.14, 2019.15 Jul 17, 2019
@pushkarnk
Copy link

I am now able to repro the issue using this sample program:

import Foundation
import Kitura

func start() {
    let router = Router()
    router.post("/users") { _, response, _ in
        try! response.send("Hello, World!").end()
    }
    Kitura.addHTTPServer(onPort: 8080, with: router)
    Kitura.start()
}

let session = URLSession.shared
func test() {
    start()
    let semaphore = DispatchSemaphore(value: 0)
    makeRequest(semaphore)
    semaphore.wait()
    Kitura.stop()
}

func makeRequest(_ semaphore: DispatchSemaphore) {
    let data = "Hello, World!".data(using: .utf8)
    var request = URLRequest(url: URL(string: "http://localhost:8080/users")!)
    request.httpMethod = "POST"
    session.uploadTask(with: request, from: data) { data, response, error in
        if let data = data, data.count == 13 {
            semaphore.signal()
        }
    }.resume()
}

test()
print("Finished first test")
test()
print("Finished second test")

@pushkarnk
Copy link

The program hangs in the second test() invocation only when using URLSession.shared and Kitura-net. Using a new session in test(), or using Kitura-NIO resolves the issue.

@djones6
Copy link
Contributor Author

djones6 commented Jul 19, 2019

I don't understand how changing the server side affects how the client side is behaving, given that both server implementations are producing the same response. I wonder if it's somehow related to timing?

@pushkarnk
Copy link

Yes, can't rule out a timing issue.

@pushkarnk
Copy link

Studied the two subsequent URLSession.dataTask calls for MainTests.testClientPost and MainTests.testClientPostWithIdentifier again. The requests and responses dumped by tcpdump make things more confusing:

So, in short, there are two POST requests.

  • The first request has body {"name":"John Doe","date":540899256,"id":5}.
  • The second request has body {"name":"Jonathan Swift","date":540899256,"id":5}

This is the tcpdump output in the failing case is:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes

Header of the first request arrives from remote port 39558

06:35:38.527113 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 2204924601:2204924874, ack 526032785, win 342, options [nop,nop,TS val 1482498739 ecr 1482498739], length 273: HTTP: POST /users HTTP/1.1
	0x0000:  4500 0145 c05c 4000 4006 7b54 7f00 0001  E..E.\@.@.{T....
	0x0010:  7f00 0001 9a86 1f90 836c 7ab9 1f5a 9f91  .........lz..Z..
	0x0020:  8018 0156 ff39 0000 0101 080a 585d 22b3  ...V.9......X]".
	0x0030:  585d 22b3 504f 5354 202f 7573 6572 7320  X]".POST./users.
	0x0040:  4854 5450 2f31 2e31 0d0a 486f 7374 3a20  HTTP/1.1..Host:.
	0x0050:  6c6f 6361 6c68 6f73 743a 3830 3830 0d0a  localhost:8080..
	0x0060:  4163 6365 7074 2d45 6e63 6f64 696e 673a  Accept-Encoding:
	0x0070:  2064 6566 6c61 7465 2c20 677a 6970 0d0a  .deflate,.gzip..
	0x0080:  436f 6e74 656e 742d 5479 7065 3a20 6170  Content-Type:.ap
	0x0090:  706c 6963 6174 696f 6e2f 6a73 6f6e 0d0a  plication/json..
	0x00a0:  4163 6365 7074 3a20 6170 706c 6963 6174  Accept:.applicat
	0x00b0:  696f 6e2f 6a73 6f6e 0d0a 436f 6e6e 6563  ion/json..Connec
	0x00c0:  7469 6f6e 3a20 6b65 6570 2d61 6c69 7665  tion:.keep-alive
	0x00d0:  0d0a 5573 6572 2d41 6765 6e74 3a20 4b69  ..User-Agent:.Ki
	0x00e0:  7475 7261 4b69 7450 6163 6b61 6765 5465  turaKitPackageTe
	0x00f0:  7374 732e 7863 7465 7374 2028 756e 6b6e  sts.xctest.(unkn
	0x0100:  6f77 6e20 7665 7273 696f 6e29 2063 7572  own.version).cur
	0x0110:  6c2f 372e 3538 2e30 0d0a 4163 6365 7074  l/7.58.0..Accept
	0x0120:  2d4c 616e 6775 6167 653a 2065 6e0d 0a43  -Language:.en..C
	0x0130:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x0140:  330d 0a0d 0a                             3....

Body of the first request arrives from remote port 39558

06:35:38.527162 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 273:316, ack 1, win 342, options [nop,nop,TS val 1482498739 ecr 1482498739], length 43: HTTP
	0x0000:  4500 005f c05d 4000 4006 7c39 7f00 0001  E.._.]@.@.|9....
	0x0010:  7f00 0001 9a86 1f90 836c 7bca 1f5a 9f91  .........l{..Z..
	0x0020:  8018 0156 fe53 0000 0101 080a 585d 22b3  ...V.S......X]".
	0x0030:  585d 22b3 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f68 6e20 446f 6522 7d    me":"John.Doe"}

Response for the first request is sent to remote port 39558

06:35:38.528090 IP localhost.http-alt > localhost.39558: Flags [P.], seq 1:205, ack 316, win 350, options [nop,nop,TS val 1482498740 ecr 1482498739], length 204: HTTP: HTTP/1.1 201 Created
	0x0000:  4500 0100 2575 4000 4006 1681 7f00 0001  E...%u@.@.......
	0x0010:  7f00 0001 1f90 9a86 1f5a 9f91 836c 7bf5  .........Z...l{.
	0x0020:  8018 015e fef4 0000 0101 080a 585d 22b4  ...^........X]".
	0x0030:  585d 22b3 4854 5450 2f31 2e31 2032 3031  X]".HTTP/1.1.201
	0x0040:  2043 7265 6174 6564 0d0a 4461 7465 3a20  .Created..Date:.
	0x0050:  4672 692c 2031 3220 4a75 6c20 3230 3139  Fri,.12.Jul.2019
	0x0060:  2030 353a 3335 3a33 3820 474d 540d 0a43  .05:35:38.GMT..C
	0x0070:  6f6e 7465 6e74 2d54 7970 653a 2061 7070  ontent-Type:.app
	0x0080:  6c69 6361 7469 6f6e 2f6a 736f 6e0d 0a43  lication/json..C
	0x0090:  6f6e 7465 6e74 2d4c 656e 6774 683a 2034  ontent-Length:.4
	0x00a0:  330d 0a43 6f6e 6e65 6374 696f 6e3a 204b  3..Connection:.K
	0x00b0:  6565 702d 416c 6976 650d 0a4b 6565 702d  eep-Alive..Keep-
	0x00c0:  416c 6976 653a 2074 696d 656f 7574 3d36  Alive:.timeout=6
	0x00d0:  300d 0a0d 0a7b 2269 6422 3a35 2c22 6461  0....{"id":5,"da
	0x00e0:  7465 223a 3534 3038 3939 3235 362c 226e  te":540899256,"n
	0x00f0:  616d 6522 3a22 4a6f 686e 2044 6f65 227d  ame":"John.Doe"}

Header for the second request arrives on the same port 39558

06:35:38.534869 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 316:591, ack 205, win 350, options [nop,nop,TS val 1482498747 ecr 1482498740], length 275: HTTP: POST /usersid HTTP/1.1
	0x0000:  4500 0147 c05f 4000 4006 7b4f 7f00 0001  E..G._@.@.{O....
	0x0010:  7f00 0001 9a86 1f90 836c 7bf5 1f5a a05d  .........l{..Z.]
	0x0020:  8018 015e ff3b 0000 0101 080a 585d 22bb  ...^.;......X]".
	0x0030:  585d 22b4 504f 5354 202f 7573 6572 7369  X]".POST./usersi
	0x0040:  6420 4854 5450 2f31 2e31 0d0a 486f 7374  d.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3830 3830  :.localhost:8080
	0x0060:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
	0x0070:  673a 2064 6566 6c61 7465 2c20 677a 6970  g:.deflate,.gzip
	0x0080:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
	0x0090:  6170 706c 6963 6174 696f 6e2f 6a73 6f6e  application/json
	0x00a0:  0d0a 4163 6365 7074 3a20 6170 706c 6963  ..Accept:.applic
	0x00b0:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e6e  ation/json..Conn
	0x00c0:  6563 7469 6f6e 3a20 6b65 6570 2d61 6c69  ection:.keep-ali
	0x00d0:  7665 0d0a 5573 6572 2d41 6765 6e74 3a20  ve..User-Agent:.
	0x00e0:  4b69 7475 7261 4b69 7450 6163 6b61 6765  KituraKitPackage
	0x00f0:  5465 7374 732e 7863 7465 7374 2028 756e  Tests.xctest.(un
	0x0100:  6b6e 6f77 6e20 7665 7273 696f 6e29 2063  known.version).c
	0x0110:  7572 6c2f 372e 3538 2e30 0d0a 4163 6365  url/7.58.0..Acce
	0x0120:  7074 2d4c 616e 6775 6167 653a 2065 6e0d  pt-Language:.en.
	0x0130:  0a43 6f6e 7465 6e74 2d4c 656e 6774 683a  .Content-Length:
	0x0140:  2034 390d 0a0d 0a                        .49....

Body of the second request arrives from port 39558

06:35:38.534913 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 591:640, ack 205, win 350, options [nop,nop,TS val 1482498747 ecr 1482498740], length 49: HTTP
	0x0000:  4500 0065 c060 4000 4006 7c30 7f00 0001  E..e.`@.@.|0....
	0x0010:  7f00 0001 9a86 1f90 836c 7d08 1f5a a05d  .........l}..Z.]
	0x0020:  8018 015e fe59 0000 0101 080a 585d 22bb  ...^.Y......X]".
	0x0030:  585d 22b4 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f6e 6174 6861 6e20 5377  me":"Jonathan.Sw
	0x0060:  6966 7422 7d                             ift"}

A duplicate body pertaining to the second request arrives from port 39558

06:35:38.550290 IP localhost.39558 > localhost.http-alt: Flags [P.], seq 591:640, ack 205, win 350, options [nop,nop,TS val 1482498762 ecr 1482498740], length 49: HTTP
	0x0000:  4500 0065 c061 4000 4006 7c2f 7f00 0001  E..e.a@.@.|/....
	0x0010:  7f00 0001 9a86 1f90 836c 7d08 1f5a a05d  .........l}..Z.]
	0x0020:  8018 015e fe59 0000 0101 080a 585d 22ca  ...^.Y......X]".
	0x0030:  585d 22b4 7b22 6461 7465 223a 3534 3038  X]".{"date":5408
	0x0040:  3939 3235 362c 2269 6422 3a35 2c22 6e61  99256,"id":5,"na
	0x0050:  6d65 223a 224a 6f6e 6174 6861 6e20 5377  me":"Jonathan.Sw
	0x0060:  6966 7422 7d                             ift"}

Header of the second request arrives from a new remote port 39562

06:35:38.573892 IP localhost.39562 > localhost.http-alt: Flags [P.], seq 2927441858:2927442133, ack 3923650617, win 342, options [nop,nop,TS val 1482498786 ecr 1482498786], length 275: HTTP: POST /usersid HTTP/1.1
	0x0000:  4500 0147 8d2e 4000 4006 ae80 7f00 0001  E..G..@.@.......
	0x0010:  7f00 0001 9a8a 1f90 ae7d 37c2 e9de 2839  .........}7...(9
	0x0020:  8018 0156 ff3b 0000 0101 080a 585d 22e2  ...V.;......X]".
	0x0030:  585d 22e2 504f 5354 202f 7573 6572 7369  X]".POST./usersi
	0x0040:  6420 4854 5450 2f31 2e31 0d0a 486f 7374  d.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3830 3830  :.localhost:8080
	0x0060:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
	0x0070:  673a 2064 6566 6c61 7465 2c20 677a 6970  g:.deflate,.gzip
	0x0080:  0d0a 436f 6e74 656e 742d 5479 7065 3a20  ..Content-Type:.
	0x0090:  6170 706c 6963 6174 696f 6e2f 6a73 6f6e  application/json
	0x00a0:  0d0a 4163 6365 7074 3a20 6170 706c 6963  ..Accept:.applic
	0x00b0:  6174 696f 6e2f 6a73 6f6e 0d0a 436f 6e6e  ation/json..Conn
	0x00c0:  6563 7469 6f6e 3a20 6b65 6570 2d61 6c69  ection:.keep-ali
	0x00d0:  7665 0d0a 5573 6572 2d41 6765 6e74 3a20  ve..User-Agent:.
	0x00e0:  4b69 7475 7261 4b69 7450 6163 6b61 6765  KituraKitPackage
	0x00f0:  5465 7374 732e 7863 7465 7374 2028 756e  Tests.xctest.(un
	0x0100:  6b6e 6f77 6e20 7665 7273 696f 6e29 2063  known.version).c
	0x0110:  7572 6c2f 372e 3538 2e30 0d0a 4163 6365  url/7.58.0..Acce
	0x0120:  7074 2d4c 616e 6775 6167 653a 2065 6e0d  pt-Language:.en.
	0x0130:  0a43 6f6e 7465 6e74 2d4c 656e 6774 683a  .Content-Length:
	0x0140:  2034 390d 0a0d 0a                        .49....

No body for the second request arrives from the new port 39562

Oh, but the tcpdump output is the same as pasted in this comment. The HTTP request pertaining to the second message first comes on the first connection (where it is ignored) and a partial request arrives on the second connection. This is a puzzle.

@pushkarnk
Copy link

I tried to track new HTTP connections using the tcpdump command:
tcpdump -i lo "tcp[tcpflags] & (tcp-syn) != 0"
Kitura-net

05:25:23.209797 IP localhost.51178 > localhost.http-alt: Flags [S], seq 2153091303, win 43690, options [mss 65495,sackOK,TS val 17254833 ecr 0,nop,wscale 7], length 0
05:25:23.209847 IP localhost.http-alt > localhost.51178: Flags [S.], seq 251452148, ack 2153091304, win 43690, options [mss 65495,sackOK,TS val 17254833 ecr 17254833,nop,wscale 7], length 0
Finished first test
05:25:23.243223 IP localhost.51180 > localhost.http-alt: Flags [S], seq 3293555298, win 43690, options [mss 65495,sackOK,TS val 17254836 ecr 0,nop,wscale 7], length 0
05:25:23.243260 IP localhost.http-alt > localhost.51180: Flags [S.], seq 1966048554, ack 3293555299, win 43690, options [mss 65495,sackOK,TS val 17254836 ecr 17254836,nop,wscale 7], length 0

Clearly, the two tests make two connections, which is expected.

Kitura-NIO

05:32:28.425719 IP localhost.56438 > localhost.http-alt: Flags [S], seq 3787294488, win 43690, options [mss 65495,sackOK,TS val 17297402 ecr 0,nop,wscale 7], length 0
05:32:28.425754 IP localhost.http-alt > localhost.56438: Flags [S.], seq 1684457496, ack 3787294489, win 43690, options [mss 65495,sackOK,TS val 17297402 ecr 17297402,nop,wscale 7], length 0
Finished first test
Finished second test

Kitura-NIO makes only one connection. This is wrong. When we have one server per test, there's no question of running the two tests on a single connection.

I then tried to track the FIN and RST frames to track connection closures using :
tcpdump -i lo "tcp port 8080 and (tcp[tcpflags] & (tcp-fin|tcp-rst)) != 0"

Then, tracking both new connections and connection closures, the activity on Kitura-net looks like this:

05:28:20.964559 IP localhost.52884 > localhost.http-alt: Flags [S], seq 1311510042, win 43690, options [mss 65495,sackOK,TS val 17272629 ecr 0,nop,wscale 7], length 0
05:28:20.964614 IP localhost.http-alt > localhost.52884: Flags [S.], seq 4132951617, ack 1311510043, win 43690, options [mss 65495,sackOK,TS val 17272629 ecr 17272629,nop,wscale 7], length 0
Finished first test
05:28:20.975631 IP localhost.http-alt > localhost.52884: Flags [R.], seq 4132951755, ack 1311510477, win 359, options [nop,nop,TS val 17272630 ecr 17272630], length 0
05:28:21.002507 IP localhost.52888 > localhost.http-alt: Flags [S], seq 588256837, win 43690, options [mss 65495,sackOK,TS val 17272633 ecr 0,nop,wscale 7], length 0
05:28:21.002550 IP localhost.http-alt > localhost.52888: Flags [S.], seq 3943166294, ack 588256838, win 43690, options [mss 65495,sackOK,TS val 17272633 ecr 17272633,nop,wscale 7], length 0

(notice the RST frame)
On the other hand, the activity on Kitura-NIO looks like this:

05:36:06.681791 IP localhost.60212 > localhost.http-alt: Flags [S], seq 3969418106, win 43690, options [mss 65495,sackOK,TS val 17319251 ecr 0,nop,wscale 7], length 0
05:36:06.681902 IP localhost.http-alt > localhost.60212: Flags [S.], seq 3552994157, ack 3969418107, win 43690, options [mss 65495,sackOK,TS val 17319251 ecr 17319251,nop,wscale 7], length 0
Finished first test
Finished second test
05:36:06.746010 IP localhost.http-alt > localhost.60212: Flags [F.], seq 3552994436, ack 3969418541, win 359, options [nop,nop,TS val 17319258 ecr 17319256], length 0
05:36:06.747444 IP localhost.60212 > localhost.http-alt: Flags [F.], seq 1, ack 1, win 359, options [nop,nop,TS val 17319258 ecr 17319256], length 0

Clearly, we aren't closing active connections in Kitura-NIO on Kitura.stop(). We just close the listening socket.

@pushkarnk
Copy link

I created Kitura/Kitura-NIO#216 to track the Kitura-NIO issue.

@pushkarnk
Copy link

I put a hack in Kitura-NIO's HTTPServer.stop() to close the one active connection and tcpdump logs this:

05:52:04.948371 IP localhost.40902 > localhost.http-alt: Flags [S], seq 1565419037, win 43690, options [mss 65495,sackOK,TS val 17415186 ecr 0,nop,wscale 7], length 0
05:52:04.948416 IP localhost.http-alt > localhost.40902: Flags [S.], seq 3234454903, ack 1565419038, win 43690, options [mss 65495,sackOK,TS val 17415186 ecr 17415186,nop,wscale 7], length 0
05:52:04.959630 IP localhost.http-alt > localhost.40902: Flags [F.], seq 3234455043, ack 1565419255, win 350, options [nop,nop,TS val 17415187 ecr 17415186], length 0
Finished first test
05:52:04.963260 IP localhost.40902 > localhost.http-alt: Flags [F.], seq 1, ack 1, win 350, options [nop,nop,TS val 17415187 ecr 17415187], length 0
05:52:04.963587 IP localhost.40904 > localhost.http-alt: Flags [S], seq 575297704, win 43690, options [mss 65495,sackOK,TS val 17415187 ecr 0,nop,wscale 7], length 0
05:52:04.963633 IP localhost.http-alt > localhost.40904: Flags [S.], seq 612301858, ack 575297705, win 43690, options [mss 65495,sackOK,TS val 17415187 ecr 17415187,nop,wscale 7], length 0
05:52:04.968882 IP localhost.http-alt > localhost.40904: Flags [F.], seq 612301998, ack 575297922, win 350, options [nop,nop,TS val 17415188 ecr 17415187], length 0
Finished second test
05:52:04.981074 IP localhost.40904 > localhost.http-alt: Flags [F.], seq 1, ack 1, win 350, options [nop,nop,TS val 17415189 ecr 17415188], length 0

Still, I wasn't able to reproduce the problem with Kitura-NIO.

@pushkarnk
Copy link

pushkarnk commented Jul 22, 2019

Assuming that Kitura-NIO closes active connections on HTTPServer.stop() this is the timeline of connection open and close:

test1 -> server1 SYN
server1 -> test1 SYN
connections opened for test1
server1 -> test1 FIN
test1 -> server1 FIN
connection closed
test2 -> server2 SYN
server2 -> test2 SYN
connections opened for test2
server2 -> test2 FIN
test2 -> server2 FIN
connection closed

For Kitura-net there's a passing and a failing case:
Passing case

test1 -> server1 SYN
server1 -> test1 SYN
connections opened for test1
server1 -> test1 RST
connection closed
test2 -> server2 SYN
server2 -> test2 SYN
connections opened for test2
server2 -> test2 FIN
test2 -> server2 FIN
connection closed

Failing case

test1 -> server1 SYN
server1 -> test1 SYN
connections opened for test1
server1 -> test1 RST
connection closed
test2 -> server2 SYN
server2 -> test2 SYN
connections opened for test2

@pushkarnk
Copy link

Why a Kitura-net based server sent an RST and why Kitura-NIO sent a FIN is something we have to comprehend to move the investigation further. It must be noted that Kitura-net uses libcurl for the TCP, Kitura-NIO doesn't.

An RST packet is typically sent in the middle of a data transfer to close the connection. This indicates that data transfer related to the second test had already begun on the first connection and it was terminated midway by the RST message.

Only using URLSession.shared results an RST packet. With an independent URLSession instance, I can see a FIN packet.

I suspect this issue is due to some complex interplay of URLSession - a libcurl enabled client using the MultiHandle API and Kitura-net - a libcurl enabled server that uses epoll.

@djones6
Copy link
Contributor Author

djones6 commented Jul 23, 2019

I don't think that libcurl is involved on the Kitura-net server side. The only time that Kitura-net uses it is in ClientRequest (and that's not in play here). For server requests, we receive bytes directly via BlueSocket and parse them using CHTTPParser.

If there's a difference in behaviour using Kitura-net vs Kitura-NIO as the server side, then my suspicions would be either something timing related, or some difference in socket options with BlueSocket vs. swift-nio.

@pushkarnk
Copy link

Thanks for that input @djones6

@pushkarnk
Copy link

With this change to abruptly shutdown all active connections during an HTTPServer.stop(), I see the problem reproducing with Kitura-NIO too. So, the reason of this issue now seems to localised to libcurl.

The first test established a connection with the first server instance and completed its transfer. The server was stopped, all connections are reset. The second test comes up and libcurl's multi handle reuses the first connection for it. Midway, the server's RST packet reaches the second test client but it has already finished a part of its transfer. From there, libcurl tries to recover from here, but fails.

Perhaps, if we implemented quiescing in both Kitura-NIO and Kitura-net, the failure may go away. But even in that case, we'd be connecting to the server of test 1, for test 2 and completing the test. This may or may not be acceptable.

It seems like it is the asynchronous nature of a HTTPServer.stop() that possibly leads to some chaos the connection reuse feature of libcurl's multi interface, causing this failure.

@pushkarnk
Copy link

More about connection reuse here: https://ec.haxx.se/libcurl-connectionreuse.html

@pushkarnk
Copy link

pushkarnk commented Jul 25, 2019

I put a hack in libcurl to forbid handle reuse here by data->set.forbid_reuse=1. With this the test consistently passes. This proves the hypothesis.

@pushkarnk
Copy link

There is a related CURLOPT_FORBID_REUSE which could be configured in a shared URLSession as a temporary workaround. But it will be difficult to make a solid case for this. Also, this issue is sensitive to a time window and must be dealt by libcurl.

@pushkarnk
Copy link

I can conclude that this is an issue caused by abrupt connection closures on the server side and libcurl's multi interface(with connection reuse enabled) unable to handle these abrupt closures. The solution on the server-side is a graceful shutdown of active connections (which apparently isn't implemented in Kitura-net). The client-side solution must probably be a fix to libcurl (but proving the problem using a C test program may need more effort).

@djones6
Copy link
Contributor Author

djones6 commented Jul 31, 2019

Closing as Kitura/Kitura-net#306 is remaining work needed to resolve this

@djones6 djones6 closed this as completed Jul 31, 2019
@djones6
Copy link
Contributor Author

djones6 commented Aug 1, 2019

For reference, Kitura/KituraKit#50 (specifically Kitura/KituraKit@c1f4d1d) refactors the KituraKit tests into a separate, long-lived TestServer process, which will effectively hide this issue.

@pushkarnk it would be worth posting your recreate code to Kitura/Kitura-net#306 so that we can recreate the failure while working on the equivalent fix there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants