Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add loom Support. #159

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

feat: Add loom Support. #159

wants to merge 4 commits into from

Conversation

He-Pin
Copy link
Contributor

@He-Pin He-Pin commented Jan 11, 2025

Motivation:
Add Loom support.
refs: #90 , again

Modification:

  1. add handlerExecutor() and some helper methods for virtual threads.
  2. some documents.

Result:
Virtual threads supported.

wrk is needed to run the benchmark

  • miniAppWithSleep 100ms : ↑2300%
  • todoDb: ~↓6%
  • staticFiles: ~↓6%
./mill --no-build-lock benchmark.runBenchmarks

Results of same 4 Carrier/Platform threads:

[1] staticFilesWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.23ms  436.88us  28.64ms   98.91%
[1]     Req/Sec    20.54k     1.08k   22.69k    93.44%
[1]   2461250 requests in 30.10s, 342.70MB read
[1] Requests/sec:  81766.27
[1] Transfer/sec:     11.38MB
[1] 
[1] staticFilesWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     4.41ms   14.69ms 157.91ms   95.15%
[1]     Req/Sec    19.00k     4.29k   22.09k    88.63%
[1]   2266289 requests in 30.02s, 315.55MB read
[1] Requests/sec:  75488.32
[1] Transfer/sec:     10.51MB
[1] 
[1] todoDbWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.22ms  172.42us  11.67ms   96.46%
[1]     Req/Sec    20.62k     1.29k   42.72k    93.01%
[1]   2466143 requests in 30.10s, 395.12MB read
[1]   Non-2xx or 3xx responses: 2466143
[1] Requests/sec:  81929.40
[1] Transfer/sec:     13.13MB
[1] 
[1] todoDbWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     3.97ms   13.20ms 160.29ms   95.28%
[1]     Req/Sec    19.21k     3.68k   22.32k    89.41%
[1]   2284539 requests in 30.03s, 366.02MB read
[1]   Non-2xx or 3xx responses: 2284539
[1] Requests/sec:  76072.80
[1] Transfer/sec:     12.19MB
[1] 
[1] minimalApplicationWithLoom result with (platform threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency     1.05s   575.57ms   1.99s    57.89%
[1]     Req/Sec    10.05      3.94    30.00     81.66%
[1]   1152 requests in 30.03s, 172.12KB read
[1]   Socket errors: connect 0, read 0, write 0, timeout 1076
[1] Requests/sec:     38.36
[1] Transfer/sec:      5.73KB
[1] 
[1] 
[1] minimalApplicationWithLoom result with (virtual threads):
[1] Running 30s test @ http://localhost:8080/
[1]   4 threads and 100 connections
[1]   Thread Stats   Avg      Stdev     Max   +/- Stdev
[1]     Latency   106.11ms    2.74ms 126.59ms   77.47%
[1]     Req/Sec   239.19     36.74   252.00     92.68%
[1]   28100 requests in 30.03s, 4.10MB read
[1] Requests/sec:    935.74
[1] Transfer/sec:    139.81KB


Some design choices:

  1. Using MethodHandle/Reflect to make it compile on Java 8 too.
  2. Using MethodHandle to name the virtual threads that are needed, JPMS code can be added to open the can by default, but that is a little over-killed, so better with an explicitly --add-opens java.base/java.lang=ALL-UNNAMED
  3. Add a virtualize or screen method to create a virtual thread executor from a platform thread pool, this is useful, especially if you want to limit the underlying queue size, FJP is unbounded.
  4. Users can override the handleExecutor directly too.

@lolgab
Copy link
Member

lolgab commented Jan 11, 2025

In Mill you use T.env to get always the updated environment. Otherwise you get the environment at the moment of instantiating the server, which is not what you usually want.

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 12, 2025

@lihaoyi I think this pr is ready now, seems the virtual thread only improves the performance when heavy blocking.

@lihaoyi
Copy link
Member

lihaoyi commented Jan 13, 2025

@He-Pin looks good overall, please summarize your benchmark results into a short summary table to include in the documentation with a link back to this PR for people to view the full details

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@lihaoyi I have updated with a summary table and added another note about classloading

@lihaoyi
Copy link
Member

lihaoyi commented Jan 13, 2025

Thanks @He-Pin, I'll leave the PR open a few more days to see if anyone else wants to comment before merging

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@sorumehta Hi, this pr is ready, would you like to take a look at this, thanks.

@samikrc
Copy link

samikrc commented Jan 13, 2025

@He-Pin Quick question: once this is merged, do virtual threads become the default execution method? Or this happens based on certain condition(s)?

@He-Pin
Copy link
Contributor Author

He-Pin commented Jan 13, 2025

@samikrc thanks for intresting.

Short answer: No.

it will only enabled once

  1. Running with JDK 21 (or Where Virtual Thread is enabled),
  2. With the --add-opens java.base/java.lang=ALL-UNNAMED and -Dcask.virtual-thread.enabled=true, OR
  3. You override the handlerExecutor() and supply it a virtual threads backing executor, eg: Executors.newVirtualThreadPerTaskExecutor(), you can switch the default scheduler with cask.internal.Util.createVirtualThreadExecutor if you like.

Because, with the current implementation of JDK, virtual threads can lead to deadlock if they are not configured properly, we encountered this in production when a virtual thread and platform threads both tried to load the same class ByteBuf, but there is no carrier threads to execute the classloading, because the classloading happens in the CPP Native frame, where a mutex is hold, and the virtual threads can not be unmounted, and then deadlock.

The JVM team at Alibaba helped us with a tweaked AJDK, which will add one more carrier thread for this, and they backported the JDK24 features into AJDK21 too.

So, virtual threads help us reduce 50% ygc and 10 pt CPU usage under heavy load. Our system is fully async, so we can only see these improvements.

For any user who tries to use Virtual threads in production, I would suggest:

  1. try to fix the code where it pins virtual threads first.
  2. do heavy load testing, with some bursts.
  3. Configure a larger max pool size, if needed.

Copy link

@sideeffffect sideeffffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this PR 😍 Was curious and checked out the code. I've noticed one or two things.

Comment on lines +67 to +68
globalLogger.exception(e)
throw new UnsupportedOperationException("Failed to create newThreadPerTaskExecutor.", e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception is both logged and re-thrown.

case NonFatal(e) =>
globalLogger.exception(e)
//--add-opens java.base/java.lang=ALL-UNNAMED
throw new UnsupportedOperationException("Failed to create virtual thread factory.", e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception is both logged and re-thrown.

Comment on lines +58 to +67
protected def handlerExecutor(): Executor = {
if (enableVirtualThread) {
Util.createDefaultCaskVirtualThreadExecutor.orNull
} else null
}

protected def enableVirtualThread: Boolean = {
val enableVirtualThread = System.getProperty(Main.VIRTUAL_THREAD_ENABLED)
enableVirtualThread != null && enableVirtualThread.toBoolean
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 2 methods instead of just one?

protected def handlerExecutor(): Option[Executor] = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this way, it can support our box, otherwise, I need to do some kind of conditional compiling with an additional source folder and only include that folder when compiling with Java 21.

But the current one is better because the user can enable it with only a property, we use this kind in production too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants