-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8335480: Only deoptimize threads if needed when closing shared arena #20158
Conversation
👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into |
@JornVernee This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 177 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@JornVernee The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
I decided to add this to the PR for completeness, so that we don't go and deoptimize frames that are not using scoped accesses at all. |
Nice work ! Thinking a bit about how to improve the benchmark and given the semantics of Arena.close(), there is a trick that you can use. There are two kinds of memory segments, the one that only visible from Java and the one that are visible not only from Java. By example, a memory segment created from a mmap or a memory segment with an address sent to a C code are visible from outside Java, for those, you have no choice but wait in Arena.close() until all threads have answered to the handshakes. For all the other memory segments, because they are only visible from Java, their memory can be reclaimed asynchronously, i.e. the last thread of the handshakes can free the corresponding memory segments, so the thread that call Arena.close() is free to run even if the memory is not yet reclaimed. From my armchair, that seems a awful lot of engeneering so it may not worth it, but now you know :) |
That is something we considered in the past as well (I think Maurizio even had a prototype at some point). The issue is that close should be deterministic. i.e. after the call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@dougxc might want to have a look at Graal support for this one. |
Yes, I conservatively implemented |
Knowing that all the segments are freed during close() is something you may want. Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded. |
Hi Jorn, Many thanks for working on this! I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast. The given benchmark somehow only measures the following: It starts many threads; in each it opens a shared memory segment, does some work and closes it. So it measures the throughput of the whole "create shared/work on it/close shared" workload. Actually the problems we see in Lucene are more that we have many threads working on shared memory segments or on other tasks not related to memory segments at all, while a few threads are concurrently closing and opening new arenas. With more threads concurrently closing the arenas, also the throughput on other threads degrades. So IMHO, the benchamrk should be improved to have a few threads (configurable) that open/close memory segments and a list of other threads that do other work and finally a list of threads reading from the memory segments opened by first thread. The testcase you wrote is more fitting the above workload. Maybe the benchmark should be setup more like the test. If you have a benchmark with that workload it should better show an improvement. The current benchmark has the problem that it measures the whole open/work/close on shared sgements. And slosing a shared segment is always heavy, because it has to trigger and wait for the thread-local handshake. Why is the test preventing inlining of the inner read method? I may be able to benchmark a Lucene workload with a custom JDK build next week. It might be an idea to use the wrong DaCapoBenchmark (downgrade to older version before it has fixed dacapobench/dacapobench#264 , specifically dacapobench/dacapobench@76588b2). Uwe |
That's a great suggestion! In our case we just want the index files close as soon as possible, but not on next GC (which will be horrible and brings us back into the times of DirectByteBuffer where some buffer which were longer on use were never ever closed). The problem with GC is that the Arena/MemorySegments and so on are tiny objects which will live for very long time, especially when they were used for quite some time (like an index segment of an Lucene index). So basically we would like to have: Close arena as soon as possible, but don't wait for it. Of course for testing purposes in Lucene we could use Uwe |
// } | ||
// | ||
// The safepoint at which we're stopped may be in between the liveness check | ||
// and actual memory access, but is itself 'outside' of @Scoped code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is @Scoped code
? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM:
jdk/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template
Lines 117 to 119 in a96de6d
@Target({ElementType.METHOD, ElementType.CONSTRUCTOR}) @Retention(RetentionPolicy.RUNTIME) @interface Scoped { } jdk/src/hotspot/share/prims/scopedMemoryAccess.cpp
Lines 143 to 149 in a96de6d
/* * This function performs a thread-local handshake against all threads running at the time * the given session (deopt) was closed. If the handshake for a given thread is processed while * one or more threads is found inside a scoped method (that is, a method inside the ScopedMemoryAccess * class annotated with the '@Scoped' annotation), and whose local variables mention the session being * closed (deopt), this method returns false, signalling that the session cannot be closed safely. */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is
@Scoped code
? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html
This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically if the VM is inside a @Scoped
method and it starts a thread-local handshake, it will deoptimize top-most frame of all those threads so they can do the "isAlive" check.
Note that this has nothing to do with implicit conversion, as the memory segment var handle is called by our implementation, with the correct type (a long). This is likely an issue with bound check elimination with "long loops". |
@JornVernee Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java
Outdated
Show resolved
Hide resolved
/contributor add Carlo Refice [email protected] |
@JornVernee |
As discussed offline, JVMCI/Graal changes will be handled by a followup PR. |
/contributor remove Carlo Refice [email protected] |
@JornVernee |
Thanks for the discussion and changes in this PR - it's super helpful ( in what we can do to workaround ), as well as a great improvement for the future. |
Is there an issue where I can follow this? [ EDIT: oh! it's JDK-8290892 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in scopedMemoryAccess and benchmark look good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with compiler and CI changes - it is just marking nmethod as having scoped access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
/integrate |
Going to push as commit 7bf5313.
Your commit was automatically rebased without conflicts. |
@JornVernee Pushed as commit 7bf5313. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This issue is tracked here: |
This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.
Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.
In this PR:
@Scoped
method, but are inside a compiled frame that has a scoped access somewhere inside of it.for_scope_method
) from the code that checks for a reference to the arena being closed (is_accessing_session
), and added logging code to the former. That also required changing vframe code to accept anouputStream*
rather than always printing totty
.TestConcurrentClose
), that tries to close many shared arenas at the same time, in order to stress that use case.ConcurrentClose
), that stresses the cases where many threads are accessing and closing shared arenas.I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:
(I manually added the
Threads
collumn btw)Testing: tier 1-4
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158
$ git checkout pull/20158
Update a local copy of the PR:
$ git checkout pull/20158
$ git pull https://git.openjdk.org/jdk.git pull/20158/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20158
View PR using the GUI difftool:
$ git pr show -t 20158
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20158.diff
Webrev
Link to Webrev Comment