Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion-based I/O #915

Open
Ralith opened this issue Nov 13, 2020 · 10 comments
Open

Completion-based I/O #915

Ralith opened this issue Nov 13, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@Ralith
Copy link
Collaborator

Ralith commented Nov 13, 2020

Modern high-performance I/O APIs (e.g. io_uring and I/O completion ports) are completion-oriented, unlike the traditional readiness-oriented epoll paradigm. Advantages include fewer syscalls and no copying. On Windows in particular, readiness-oriented I/O is poorly supported.

@Matthias247 has reported that a prototype variant of Quinn modified to use registered I/O on windows performs significantly (~20%?) better than our Linux backend using sendmmsg and recvmmsg for efficient batching, and drastically better than the fallback backend on Windows. Due to major changes in tokio/mio's windows support, we should re-evaluate the latter result after moving to tokio 0.3.

On Linux, io_uring is only available on recent kernels. Other platforms (e.g. macOS, BSDs) may not offer completion-based I/O at all. Retaining a readiness-oriented fallback will therefore remain necessary for some time. However, the performance benefits seem to justify making completion-based I/O the first-class target.

A complicating factor is that tokio itself is, currently, 100% readiness-oriented. On Linux, we may be able to bridge this gap gracefully due to io_uring/epoll interop, but it's unclear if something similar is possible on Windows. If not, a background thread may be necessary.

@Matthias247
Copy link
Contributor

I pushed my POC for RIO and completion based IO to #918

This might help on getting an idea what is necessary. I think in a model where Quinn owns all IO and the associated thread it's not terribly complicated.
The fact that datagrams are received and transmitted in an all-or-nothing fashion, and that it mostly has to deal with a single socket and not lots of them makes it easier than e.g. implementing support for completion oriented TCP. But obviously one still has to be a bit careful around ownership.

@KirillLykov
Copy link

KirillLykov commented May 26, 2022

I wonder if there are any plans about using io_uring? Have you considered integrating DataDog's glommio crate? Also nowadays it looks like another way to proceed might be using AF_XDP which might be simpler(?) in a way

@Ralith
Copy link
Collaborator Author

Ralith commented May 27, 2022

See also discussion at #1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.

@KirillLykov
Copy link

See also discussion at #1319. This issue was originally opened before GSO/GRO support were implemented and it's unclear if io_uring will be a benefit in net on Linux, though we'd be happy to mentor someone who wants to experiment.

It sounds super interesting to check out, not sure if I will manage yet. I would start with creating a hack to plugin monoio and check the performance metrics. Hence, the first question is if there are any existing benchmarks that I can use to measure the performance. And the second -- where to start from?

@Ralith
Copy link
Collaborator Author

Ralith commented May 27, 2022

See the bench and perf directories for some benchmarking tools. As discussed in #1319, tokio-uring might be a more appropriate place to start. You could look at @Matthias247's prototype above for inspiration, or try modifying the endpoint driver in quinn directly, or try building directly on top of quinn-proto to avoid being influenced by the existing architecture.

@djc
Copy link
Member

djc commented Jun 27, 2022

@sowhu do you have more context on the talk? Not sure which Stephen you're referring to.

@sowhu
Copy link

sowhu commented Jun 27, 2022

Sorry guys. I just realized I posted on the wrong issue. Please just disregard my previous two comments. My mistake.

@KirillLykov
Copy link

For the reference, @djc regarding io_uring library in rust: if the plots by monoio are still up-to-date and the benchmarks scenarios are fair, it looks like monoio is the fastest choice, see

@Ralith
Copy link
Collaborator Author

Ralith commented Jun 27, 2022

Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.

@Icelk
Copy link

Icelk commented Feb 4, 2023

Those benchmarks don't seem to involve tokio-uring, just regular epoll-based tokio.

I did some benchmarks with tokio-uring (which is single-threaded). It scores 370k/s, while a single-threaded monoio scores 270k/s. Monoio can scale to multiple threads, but so can tokio-uring (if you just spawn multiple executors!).

TL;DR: tokio-uring seems to be the way to go.

I would really like to use QUIC with io_uring! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants