Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rpc): io_uring integration & redesign #477

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

InKryption
Copy link
Contributor

@InKryption InKryption commented Jan 3, 2025

  • get receive working
  • get send working
  • error handling:
    • eliminate explicit TODO panics
    • improve first_error error handling if practical

follow up work (not part of this pr): #517

@InKryption InKryption self-assigned this Jan 3, 2025
@0xNineteen 0xNineteen changed the title RPC server: io_uring upgrade WIP perf(rpc): io_uring WIP Jan 3, 2025
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch 6 times, most recently from 4f8ccba to 6b56293 Compare January 14, 2025 20:36
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch 6 times, most recently from 9f42c0c to 22c4f6d Compare January 20, 2025 19:09
@InKryption InKryption changed the title perf(rpc): io_uring WIP perf(rpc): io_uring integration & redesign Jan 20, 2025
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch from 72c0b7f to 945988d Compare January 20, 2025 19:25
@InKryption InKryption marked this pull request as ready for review January 21, 2025 10:33
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch 2 times, most recently from 922a7bb to 8fdab03 Compare January 23, 2025 05:24
@0xNineteen 0xNineteen requested a review from kprotty January 23, 2025 16:38
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch 7 times, most recently from 6195797 to ecfe590 Compare January 24, 2025 21:37
* Handle potentially failing/cancelling of `accept_multishot` by
  re-queueing it, based on the `IORING_CQE_F_MORE` flag.
* Revise/simplify the queueing logic for the `accept_multishot` SQE.
* Resolve the EINTR TODO panics, returning a catch-all error value
  indicating it as a bad but non-critical error.
* Update the `a: ?noreturn` `if (a) |*b|` TODO, adding that it's solved
  in 0.14; it should be resolved after we update to 0.14.
* Unify EAGAIN panic message.
On MacOS, on basic WorkPool, this means we now need to manually set the
accepted socket's flags to the right things, ie, blocking, as opposed
to the server socket's nonblocking mode.
Means we also have to handle EAGAIN a bit differently in the io_uring
backend, but that's a fine tradeoff.
And re-organize some methods based on that change
Do not exit for *any* errors that are specific to the
related connection, simply free them and continue to the next CQE.

Specifically in the case of `error.SubmissionQueueFull`, instead of
immediately failing, we instead first try to flush the submission queue
and then try again to submit; if it fails a second time, that means
despite flushing the submission queue, it somehow still failed, so
we panic, since this indicates something is *very* wrong.

This also eliminates the `pending_cqes_buf`, since there is actually
no situation in which `consumeOurCqe` returns an error, and we resume
work afterwards - either we process all the received CQEs, or we hard
exit - this was already essentially the case before, now it's more
obvious.

For the main submit, we now wait for at least 1 connection, but we
also add a timeout SQE to make it terminate if we don't receive a
connection or completion of another task for 1 second; this alleviates
the busy loop that was running before.
Also slightly refactor error sets.
Now instead of checking to see if we need to set a flag to re-queue
the multishot accept, we just pass in the server context on init
and queue it, which now makes sense since the context and workpool
are separate.
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch from 1f6c7bc to 2ee76dc Compare February 4, 2025 17:45
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch from 2ee76dc to 85136f9 Compare February 4, 2025 17:57
src/rpc/server/lib.zig Outdated Show resolved Hide resolved
src/rpc/server/connection.zig Outdated Show resolved Hide resolved
src/rpc/server/connection.zig Outdated Show resolved Hide resolved
src/rpc/server/linux_io_uring.zig Outdated Show resolved Hide resolved
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch from 706f136 to eb95eb0 Compare February 5, 2025 22:02
@InKryption InKryption force-pushed the ink/rpc-server-optimize branch from eb95eb0 to 1e042dd Compare February 5, 2025 22:06
0xNineteen
0xNineteen previously approved these changes Feb 6, 2025
Copy link
Contributor

@0xNineteen 0xNineteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :shipit:

* Move more specific functions to the only files they're used.
* Move the `serve*` functions outside of `Context`, making them
  free functions which just accept the context and work pool.
* Remove `acceptAndServeConnection`; originally this was required to
  be able to nicely structure the unit test, and used to be more
  integrated, however it no longer makes sense as a concept.
* Inline `handleRequest` into the basic backend.
* Make the `acceptHandled` function, moved into the basic backend,
  guarantee the specified `sync` behavior, and inline `have_accept4`.
* Appropriately re-export the relevant parts of the server API.
* Added top level doc comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

5 participants