Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run libuv loop on a background thread #37

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

Frixuu
Copy link
Contributor

@Frixuu Frixuu commented Sep 29, 2024

This implementation creates two threads:

  • Thread A runs the libuv loop in the Default mode (so it does not saturate the CPU).
    • On an incoming TCP connection, thread A's event loop is ticked on thread A.
    • This is because AFAIK libuv is not thread-safe, but arbitrary events/timers can do work on libuv handles.
    • Between TCP events, the garbage collector is turned off. If it was turned on, it would be possible for idle thread A to prevent it from stopping the world.
  • Thread B runs a regular haxe.Timer.
    • On timer interval, a non-libuv TCP socket is being opened, connected to the server, and immediately closed.
    • This is so A's events (and the garbage collector) are being run even if the Weblink server has no traffic.

Closes #28. Supersedes #36.

@Frixuu
Copy link
Contributor Author

Frixuu commented Oct 4, 2024

Some further notes from when I was tinkering with this:

  • haxe.thread.Condition was broken on HL 1.14.0 (current stable) and below.
    • Fortunately it ended up not being necessary for this to work.
  • The GC issue came up when I was debugging a hanging up test suite.
    • Loop on thread A with no tasks to run spends time at epoll_wait.
    • If it never interacts with HL APIs, the main thread spinwaits forever because it can't finish the STW phase.
    • In the STW phase, it might be possible that thread B is stopped. This means thread A can't be woken up (by Weblink).

@Frixuu Frixuu force-pushed the do-not-saturate branch 2 times, most recently from 3af8c33 to c6148ea Compare October 5, 2024 11:31
@Frixuu Frixuu marked this pull request as ready for review October 5, 2024 12:01
@Frixuu Frixuu changed the title Exploratory: Run libuv loop on a background thread Run libuv loop on a background thread Oct 5, 2024
@PXshadow
Copy link
Owner

PXshadow commented Oct 7, 2024

Thread B runs a regular haxe.Timer.
On timer interval, a non-libuv TCP socket is being opened, connected to the server, and immediately closed.
This is so A's events (and the garbage collector) are being run even if the Weblink server has no traffic.

Walk me through why running the garbage collector is important for A's event are being running through the garbage collector even if there is no traffic? Is there a memory leak issue somewhere, and this is a temporary fix for that?

I don't know what you are trying to solve for.

if (data == null) { // EOF
request = null;
stream.close();
client.close();
Gc.enable(false);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comprehend why after an EOF disabling the GC is a good idea, it will simply be reenabled when another client sends data to the server.

@Frixuu
Copy link
Contributor Author

Frixuu commented Oct 7, 2024

Walk me through why running the garbage collector is important for A's event are being running through the garbage collector even if there is no traffic? Is there a memory leak issue somewhere, and this is a temporary fix for that?

Let's forget for a moment that Weblink exists. We create a TCP listener using HL's libuv bindings:

import hl.uv.Loop;
import hl.uv.Tcp;
import sys.net.Host;
import sys.thread.Thread;

function main() {

	Thread.create(() -> {
		final loop = @:privateAccess Loop.default_loop();
		final socket = new Tcp(loop);
		socket.bind(new Host("0.0.0.0"), 2000);
		socket.listen(100, () -> {});
		loop.run(Default);
	});

	Sys.sleep(1.0);
	Sys.println("Hello!");
}

As you might expect, this application runs for a second, prints "Hello" and exits. Pretty self-explanatory.

Now, before we quit, let's force the garbage collector to run:

	Sys.sleep(1.0);
+	hl.Gc.major();
	Sys.println("Hello!");

The application no longer exits! It also does not print to the console. Moreover, you may also realize that the main thread uses your entire CPU core:
Screenshot of htop

You may want to attach a debugger to the main thread. Ideally, if you have a build of HL with some extra info, you might see:

0x00007b47a50580c9 in gc_stop_world (b=true) at src/gc.c:362
362				while( t->gc_blocking == 0 ) {}; // spinwait
(gdb) bt
#0  0x00007b47a50580c9 in gc_stop_world (b=true) at src/gc.c:362
#1  0x00007b47a5059b15 in gc_major () at src/gc.c:889
#2  0x00007b47a5059c75 in hl_gc_major () at src/gc.c:913

The garbage collector, triggered on the main thread, is waiting for some other thread to be stopped. It's a spinwait, so it should really happen any moment now! But where is that other thread, actually?

(gdb) bt
#0  0x00007b47a4de9750 in epoll_pwait () from /usr/lib/libc.so.6
#1  0x00007b47a4fe012a in ?? () from /usr/lib/libuv.so.1
#2  0x00007b47a4fc809f in uv_run () from /usr/lib/libuv.so.1

It's somewhere in native code, waiting for IO. Turns out stopping a thread that does not want to be stopped is a pretty difficult problem! Let's see if providing said IO will help.

curl http://localhost:2000

It did not! The curl request hangs, just like our server.

But, you might say, we technically did exit the native code for a moment! socket.listen takes a closure for an argument, so surely we must have notified the HL virtual machine we want to get off that wild ride, no?

Close, but not exactly.

The way Hashlink notifies the GC it is allowed to stop a thread is by calling a hl_blocking(bool) guard. It is a part of many standard library APIs, sure, but an empty function would not trigger any of them.

An example of an operation that would do that is printing to standard output. Or trying to allocate a new object. You can try it out by adding a simple call like this (unless you enable static analyzer and its local_dce module):

-		socket.listen(100, () -> {});
+		socket.listen(100, () -> {
+			final _:Dynamic = {};
+		});

The application still initially hangs, but unblocks when you try to make a request:

curl http://localhost:2000

curl: (56) Recv failure: Connection reset by peer

Let's go back to Weblink.

As you've seen in my first implementation of this PR, I've decided to disable the GC between TCP events and enable it back as soon as such an event gets raised. I agree it was not an optimal solution and I've since changed it to call hl.Gc.blocking instead. However, I think I made it clear that some form of garbage collector manipulation is necessary here.

As an example, take this comment about saving the socket handlers for later:

  • Someone was the only client connected to the server.
  • You fetch this client's DB data in the background.
  • You schedule an event to respond to them with that data (libuv is not multithread-friendly).
  • GC hits.
  • All threads, except the listener one, stop. This means the "thread B" stops too.
  • You are not able to respond until someone else connects and unblocks the HL runtime for you.

@PXshadow
Copy link
Owner

Great writeup thanks for the detailed response, I see now given how threads behave, gc manipulation is necessary.

@PXshadow PXshadow merged commit 5c78b7c into PXshadow:master Oct 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Busy waiting (100%) cpu usage
2 participants