Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slic pong frame #3273

Closed
bernardnormier opened this issue May 27, 2023 · 5 comments · Fixed by icerpc/icerpc-slice#3
Closed

Slic pong frame #3273

bernardnormier opened this issue May 27, 2023 · 5 comments · Fixed by icerpc/icerpc-slice#3
Assignees
Labels
bug Something isn't working slic Slic Transport
Milestone

Comments

@bernardnormier
Copy link
Member

Slic sends a pong frame to acknowledge the receipt of a ping frame.

There is no equivalent frame in QUIC: QUIC has ping frames but no pong frame. See https://www.rfc-editor.org/rfc/rfc9000.html#frame-ping.

Then, with Slic, only the client-side of a connection sends ping frames:

_enableIdleTimeoutAndKeepAlive(idleTimeout, IsServer ? null : KeepAlive);

and in turns, only the server-side sends pong frame.

Note: It's not clear if that's an implementation detail of this Slic implementation or a rule of the Slic protocol.

Now, we receive Ping frames here:

There are several issues with this code.

  1. We don't appear to check anywhere this Ping frame is expected. Is it ok for a client-connection to receive a Ping frame?
  2. Presumably, the sender of a Ping frame expects to get its own Pong frame, but we actually don't send a Pong frame for each Ping frame; if a Pong frame is already being sent, we don't send anything back.
  3. Even if the Slic protocol says "you can't send a Ping frame until after you've received a Pong frame for the previous Ping frame you sent", there is a race condition in this code. Say a client sends Ping, gets Pong then sends the next Ping very quickly. It's possible the pongTask for the first Ping to not be completed when the second Ping is processed.
  4. I am pretty sure this ReadFrameAsync method executes in the critical "read loop" where we need to be really careful about not doing too much work.
    Here, we lock a mutex and call SendPongFrameAsync:
    https://github.com/icerpc/icerpc-csharp/blob/9e443a67bef5e32804023b403881646b7b0ba125/src/IceRpc/Transports/Slic/Internal/SlicConnection.cs#LL1189C20-L1189C38

There is no Task.Yield so the sending could actually occur synchronously. We should not do that in the "read loop" task.
It would make sense to use a Task.Run to schedule the sending of the Pong frame vs sending it from the "read loop" task.

Then, since Ping/Pong is Slic-specific (doesn't come from QUIC), why not use the simpler Ping & no-Pong approach we use for ice? Is the goal to use Ping/Pong someday for measure latency?

@bernardnormier bernardnormier added the bug Something isn't working label May 27, 2023
@bernardnormier bernardnormier added this to the 0.1 milestone May 27, 2023
@bentoi
Copy link
Contributor

bentoi commented May 29, 2023

Note: It's not clear if that's an implementation detail of this Slic implementation or a rule of the Slic protocol.
We don't appear to check anywhere this Ping frame is expected. Is it ok for a client-connection to receive a Ping frame?

I don't have a preference for one or the other.

  1. Presumably, the sender of a Ping frame expects to get its own Pong frame, but we actually don't send a Pong frame for each Ping frame; if a Pong frame is already being sent, we don't send anything back.

The choice I made here was to check for bi-directional activity and this doesn't necessarily imply that a pong frame should be tied to a ping frame.

  1. Even if the Slic protocol says "you can't send a Ping frame until after you've received a Pong frame for the previous Ping frame you sent",

As of today, the Slic protocol doesn't say anything 🙂.

  1. I am pretty sure this ReadFrameAsync method executes in the critical "read loop" where we need to be really careful about not doing too much work

I would open another issue for looking into Slic performance improvements.

  1. Then, since Ping/Pong is Slic-specific (doesn't come from QUIC), why not use the simpler Ping & no-Pong approach we use for ice? Is the goal to use Ping/Pong someday for measure latency?

No real reason. If I recall correctly, I choose ping/pong because HTTP/2 uses Ping/Ping+Ack. Using the ice approach is fine with me.

@bernardnormier
Copy link
Member Author

bernardnormier commented Jun 4, 2023

this doesn't necessarily imply that a pong frame should be tied to a ping frame.

And yet:

I choose ping/pong because HTTP/2 uses Ping/Ping+Ack.

And the description for FrameType::Pong is "Acknowledges the receipt of a Ping frame.".

For me, two designs make sense:

  • a pong is truly an acknowledgement for a ping, with PingAsync awaiting the corresponding Pong frame. This also requires some ID (like a varuint62 counter) in the Ping frame and the corresponding Pong (PingAck) frame,
    or
  • we just send one-way pings in both directions without any coupling (no PingAck/Pong); that's what we do for ice

Currently with Slic, SendPingAsync doesn't await anything for the peer. And Pong is definitely not an ack for a Ping frame.

@bernardnormier
Copy link
Member Author

7. I am pretty sure this ReadFrameAsync method executes in the critical "read loop" where we need to be really careful about not doing too much work

I would open another issue for looking into Slic performance improvements.

This is not directly a performance issue but a design issue. Which activities are ok in the "read loop" of Slic/IceProtocolConnection?

Ok and necessary:

  • reading from the duplex connection into a PipeWriter
  • decoding frame headers
  • blocking reads for a while on purpose to put back pressure on the caller

Not acceptable:

  • calling "synchronously" application code (since it can be a CPU intensive call that takes a while to complete or yield)

It's not clear to me where sending the Pong frame should be.
It should indeed complete quickly (not CPU intensive), but it seems gratuitous to execute this code in the read loop.

See:

while (!cancellationToken.IsCancellationRequested)

private Task ReadFrameAsync(FrameType type, int size, ulong? streamId, CancellationToken cancellationToken)

@bernardnormier bernardnormier added the slic Slic Transport label Jun 5, 2023
@bentoi bentoi self-assigned this Jun 6, 2023
@bentoi
Copy link
Contributor

bentoi commented Jun 7, 2023

a pong is truly an acknowledgement for a ping, with PingAsync awaiting the corresponding Pong frame

Implementing this proposal is probably best. It also allows to determine latency which can be useful for a dynamic flow control algorithm (see #3304).

This is not directly a performance issue but a design issue. Which activities are ok in the "read loop" of Slic/IceProtocolConnection?

To me, it's an implementation detail issue not a protocol design issue (which is what this issue is about). Opened #3318.

@bentoi
Copy link
Contributor

bentoi commented Jun 9, 2023

If I recall correctly, I choose ping/pong because HTTP/2 uses Ping/Ping+Ack. Using the ice approach is fine with me.

It's actually designed after the WebSocket Ping/Pong frames: https://www.rfc-editor.org/rfc/rfc6455#section-5.5.2

I suggest that we implement Slic pings along the lines of the ping frame of the HTTP/2 protocol:

  • a Ping and Pong frame with a 64 bits opaque payload
  • the receiver of the Ping frame sends back a Pong frame with the same payload

Our Slic implementation will send a new ping frame if there's no pending ping waiting for a pong frame and if writes are idle since (idleTimeout / 2). Pings are only sent by the client side of the connection.

For now, we could just use an incrementing counter for the payload or even a constant long=0 payload since we don't send a new ping frame before getting the pong frame for the previous ping.

The 64 bits payload can be used later by the Slic implementation to distinguish different uses of the Slic ping/pong mechanism.

For instance, the .NET HTTP/2 stream dynamic window size is computed from the min RTT value. This value is obtained by performing a bunch of pings on the HTTP/2 connection. It's important here to be able to figure out the pings used for idle timeout and the pings used for the RTT. The .NET HTTP/2 implementation uses negative values for RTT, positive ones for idle timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working slic Slic Transport
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants