Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Windows registered IO #918

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Commits on Nov 14, 2020

  1. POC: Windows registered IO

    This change demonstrates how quinn could make use of windows
    registered IO (RIO), or other kinds of completion based IO.
    
    Upfront note: The change is incomplete, buggy, and will leak all memory.
    Don't even think about using it as is. This is just a quick hack to get
    some ideas about integration and about achievable performance.
    
    Integrating with registered IO requires the following changes to get
    things working:
    1. The endpoint is now running on its own dedicated IO thread instead of
      running on the shared tokio runtime. This allows it to use any platform
      specific IO primitives it requires to use. In this case we are using RIO,
      and a custom eventloop which waits for new IO being possible using
      a windows ManualResetEvent. Waiting via IOCP, or letting the thread
      busy-spinning is also possible.
      The actual loop, which is implemented in `EndpointDriver::run`, is not
      that different from the existing `EndpointDriver::poll` method.
    2. Reading and writing data with RIO is submission+completion oriented,
      and requires buffers to be registered with kernel space for the complete
      lifetime of the socket. In order to accomodate for those requirements,
      a buffer pool is allocated when the socket + endpoint are created, and
      reused throughout the lifetime of the endpoint. The endpoint will make
      sure the maximum possible amount of concurrent receive operations is
      scheduled. Transmit operations get scheduled whenever data to transmit
      is available and TX buffers are available.
    3. Since remaining quinn is not aware about pinned buffers, and requires
      `Vec` to transmit outgoing buffers and  `BytesMut` to decode incoming
      datagrams, all datagrams are copied once from the IO buffers to those
      higher level buffers. This could theoretically be optimized.
    4. Since the endpoint is no longer an async task, it can't receive instructions
      from connections anymore using an async channel. This adds a custom
      channel implemetation for this purpose, which consists of a trivial
      synchronized queue and a wakeup of the endpoint eventloop.
    5. The endpoint can't use `tokio::spawn` anymore to spawn new connections,
      since it is not running inside a tokio context.
      Therefore a runtime handle needs to be explicitely propagated.
    6. Socket needs to be created with the `WSA_FLAG_REGISTERED_IO`. Therefore
      UDP sockets create via `std::net::UdpSocket` unfortunately can't be trivially
      forwarded. It would be debatable whether this means the quinn library
      should be resopnsible for creating all sockets, or whether it should still
      accept external sockets but explictily require that those have been configured
      with all the necessary flags.
    
    Most of the points outlined here would also be required to support io_uring
    or AF_XDP with buffers pre-registered with the kernel, or just
    `sendmsg/sendmmsg`  using `MSG_ZEROCOPY`, which has similar
    requirements.
    
    Performance with this approach varies. The benchmarks indicate a
    throughput somewhere between 180MB/s and 330MB/s. If a benchmark
    was started, it will either consistently report the low or the high value.
    Some comments in the msquic repository indicate that this might be
    due to RSS. Maybe something can be improved here by making sure
    the endpoint IO thread runs on the ideal core.
    
    A follow up POC which could be built, but isn't part of this demo,
    is to also move the `Connection` handling onto the new dedicated
    IO thread.
    Matthias247 committed Nov 14, 2020
    Configuration menu
    Copy the full SHA
    4b622fb View commit details
    Browse the repository at this point in the history