You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying out io_uring and am testing different ways of submitting requests. My test is a simple webserver-like application that accepts multiple sockets, and then alternatively reads and writes each socket. Everything is running on a single application thread within a single ring.
In general, using IOSQE_ASYNC does not seem to make too much sense for network reads because it often does strictly more work than using the default path. On the other hand, for a single threaded server, much CPU time will be spent inside of the kernel TCP stack, so using IOSQE_ASYNC could help by freeing the application thread for other work while the kernel threads do all the heavy lifting.
Looking into the performance with Linux 5.19.11 I noticed that the flamegraph shows lots of time spent in allocating buffers from the provided buffers:
Zooming in on io_read:
This is with max 128 concurrent reads. It seems in that scenario the amount of concurrent wqe_workers gets quite high (maybe even 1 per requests?), so if there's a mutex in the buffer selection path that cannot work well if many or all of the sockets are readable at the same time.
Is this contention expected and should be documented?
The text was updated successfully, but these errors were encountered:
I would not recommend using provided buffers with IOSQE_ASYNC, as you have noticed they need to serialize with the ring mutex. This is generally not a concern, but it does certainly become one if you have a lot of io-wq activity due to marking the SQEs async. You'll be better off setting aside some threads in userspace, each with a ring, and using provided buffers with those.
In general, IOSQE_ASYNC isn't very efficient and should be avoided for most use cases.
I'm trying out io_uring and am testing different ways of submitting requests. My test is a simple webserver-like application that accepts multiple sockets, and then alternatively reads and writes each socket. Everything is running on a single application thread within a single ring.
In general, using
IOSQE_ASYNC
does not seem to make too much sense for network reads because it often does strictly more work than using the default path. On the other hand, for a single threaded server, much CPU time will be spent inside of the kernel TCP stack, so usingIOSQE_ASYNC
could help by freeing the application thread for other work while the kernel threads do all the heavy lifting.Looking into the performance with Linux 5.19.11 I noticed that the flamegraph shows lots of time spent in allocating buffers from the provided buffers:
Zooming in on
io_read
:This is with max 128 concurrent reads. It seems in that scenario the amount of concurrent wqe_workers gets quite high (maybe even 1 per requests?), so if there's a mutex in the buffer selection path that cannot work well if many or all of the sockets are readable at the same time.
Is this contention expected and should be documented?
The text was updated successfully, but these errors were encountered: