You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm guessing I'm fundamentally doing this wrong/in a poor form. Currently I am just trying to get a simple http server built on top of sockets and zig-aio. The performance tends to be quite sporadic. Overall, it is really poor.
On top of that, multiple threads look to make the performance worse. Or at a minimum don't really change the perf.
I have an 8 core machine, so I leave half of the cores for the server and half for a benchmarking tool.
I have been benchmarking this super slim server impl with wrk: wrk -t 4 -c 32 -d 10 http://127.0.0.1:8000. I also use -c 512 to do a high connection test.
All tests on macos.
Overall, I'm mostly curious if I am using the library fundamentally wrong? Do you have any general suggestions on the most performant way to do something like this? I understand that my current impl will be missing tons of advanced scheduler features (like a work stealing queue), but given the simple requests and roughly even distributions of connections, I would expect that not to be a big deal.
I get the feeling that the issue I am hitting are fundamental to even the single threaded case. The only time I saw really good perf numbers was when running the handlers via a coro.ThreadPool. But the good perf would fall apart when the handlers block. Cause the thread pool would block fully instead of yielding to the scheduler.
Thanks for your work on this library, I totally understand if something like this is out of scope currently. I'm personally hoping that I am doing something silly in the code below and fixing it will give crazy perf boosts.
conststd=@import("std");
constaio=@import("aio");
constcoro=@import("coro");
constlog=std.log.scoped(.coro_aio);
pubconstaio_options: aio.Options= .{
.debug=false, // set to true to enable debug logs
};
pubconstcoro_options: coro.Options= .{
.debug=false, // set to true to enable debug logs
};
pubconststd_options: std.Options= .{
.log_level=.err,
};
pubfnmain() !void {
vargpa: std.heap.GeneralPurposeAllocator(.{}) = .{};
defer_=gpa.deinit();
varsocket: std.posix.socket_t=undefined;
trycoro.io.single(aio.Socket{
.domain=std.posix.AF.INET,
.flags=std.posix.SOCK.STREAM|std.posix.SOCK.CLOEXEC,
.protocol=std.posix.IPPROTO.TCP,
.out_socket=&socket,
});
constaddress=std.net.Address.initIp4(.{ 0, 0, 0, 0 }, 8000);
trystd.posix.setsockopt(socket, std.posix.SOL.SOCKET, std.posix.SO.REUSEADDR, &std.mem.toBytes(@as(c_int, 1)));
if (@hasDecl(std.posix.SO, "REUSEPORT")) {
trystd.posix.setsockopt(socket, std.posix.SOL.SOCKET, std.posix.SO.REUSEPORT, &std.mem.toBytes(@as(c_int, 1)));
}
trystd.posix.bind(socket, &address.any, address.getOsSockLen());
trystd.posix.listen(socket, 128);
varthreads=trygpa.allocator().alloc(std.Thread, 4);
for (0..threads.len) |i| {
threads[i] =trystd.Thread.spawn(.{}, server_thread, .{ gpa.allocator(), socket, i });
}
for (0..threads.len) |i| {
threads[i].join();
}
}
fnserver_thread(allocator: std.mem.Allocator, socket: std.posix.socket_t, thread_id: usize) !void {
log.info("Launching Server Thread {}\n", .{thread_id});
varscheduler=trycoro.Scheduler.init(allocator, .{});
deferscheduler.deinit();
vartasks=std.ArrayList(HandlerTask).init(allocator);
varhave_tasks: coro.ResetEvent= .{};
_=tryscheduler.spawn(accept_requests, .{ &scheduler, socket, &tasks, &have_tasks, thread_id }, .{});
_=tryscheduler.spawn(clean_up_tasks, .{ &tasks, &have_tasks }, .{});
tryscheduler.run(.wait);
}
fnaccept_requests(scheduler: *coro.Scheduler, socket: std.posix.socket_t, tasks: *std.ArrayList(HandlerTask), have_tasks: *coro.ResetEvent, thread_id: usize) !void {
while (true) {
log.info("Loop accept\n", .{});
varclient_socket: std.posix.socket_t=undefined;
trycoro.io.single(aio.Accept{ .socket=socket, .out_socket=&client_socket });
consttask=tryscheduler.spawn(handler, .{ client_socket, thread_id }, .{});
trytasks.append(task);
if (!have_tasks.is_set) {
have_tasks.set();
}
}
}
// Is this actually needed? Is there a better way to do this?// Can tasks clean up after themselves?fnclean_up_tasks(tasks: *std.ArrayList(HandlerTask), have_tasks: *coro.ResetEvent) !void {
tryhave_tasks.wait();
while (true) {
if (tasks.items.len==0) {
have_tasks.reset();
tryhave_tasks.wait();
}
vari: usize=0;
while (i<tasks.items.len) {
// Ensure we break for the scheduler to run.trycoro.io.single(aio.Nop{ .ident=0 });
if (tasks.items[i].isComplete()) {
log.debug("Cleaning up a task\n", .{});
// This will deinit the function and clean up resources.consttask=tasks.swapRemove(i);
task.complete(.wait);
} else {
i+=1;
}
}
}
}
constHandlerTask=coro.Task.Generic(void);
fnhandler(socket: std.posix.socket_t, thread_id: usize) void {
log.info("Starting new handler on {}\n", .{thread_id});
deferlog.info("Closing handler\n", .{});
// I should do a proper check for keepalive here?// And http headers in general I guess.varbuf: [1024]u8=undefined;
varlen: usize=0;
while (true) {
coro.io.single(aio.Recv{ .socket=socket, .buffer=&buf, .out_read=&len }) catchbreak;
log.debug("request:\n{s}\n\n", .{buf[0..len]});
// This is the fake costly part, sleep for a bit (pretend this is some server web request)// Sleep 20ms// coro.io.single(aio.Timeout{ .ns = 20 * 1_000 * 1_000 }) catch break;constresponse="HTTP/1.1 200 OK\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 13\r\n\r\nHello, World!";
coro.io.single(aio.Send{ .socket=socket, .buffer=response }) catchbreak;
}
coro.io.single(aio.CloseSocket{ .socket=socket }) catchreturn;
}
The text was updated successfully, but these errors were encountered:
On macos the posix backend is used, which doesn't have that great perf. You'd want to test on linux with io_uring which this library mainly targets. iocp may work okay as well. As for clean_up_tasks, I think it would be good idea to add ability so tasks can be detached so it won't be needed, in fact you may want to replace nop in cleanup_tasks with delay (and move it from the inner while to the outer one) as it seems that loop will be very busy otherwise causing lots of context switches.
I'm guessing I'm fundamentally doing this wrong/in a poor form. Currently I am just trying to get a simple http server built on top of sockets and zig-aio. The performance tends to be quite sporadic. Overall, it is really poor.
On top of that, multiple threads look to make the performance worse. Or at a minimum don't really change the perf.
I have an 8 core machine, so I leave half of the cores for the server and half for a benchmarking tool.
I have been benchmarking this super slim server impl with
wrk
:wrk -t 4 -c 32 -d 10 http://127.0.0.1:8000
. I also use-c 512
to do a high connection test.All tests on macos.
Overall, I'm mostly curious if I am using the library fundamentally wrong? Do you have any general suggestions on the most performant way to do something like this? I understand that my current impl will be missing tons of advanced scheduler features (like a work stealing queue), but given the simple requests and roughly even distributions of connections, I would expect that not to be a big deal.
I get the feeling that the issue I am hitting are fundamental to even the single threaded case. The only time I saw really good perf numbers was when running the handlers via a coro.ThreadPool. But the good perf would fall apart when the handlers block. Cause the thread pool would block fully instead of yielding to the scheduler.
Thanks for your work on this library, I totally understand if something like this is out of scope currently. I'm personally hoping that I am doing something silly in the code below and fixing it will give crazy perf boosts.
The text was updated successfully, but these errors were encountered: