8335480: Only deoptimize threads if needed when closing shared arena #20158

JornVernee · 2024-07-12T13:57:23Z

This PR limits the number of cases in which we deoptimize frames when closing a shared Arena. The initial intent of this was to improve the performance of shared arena closure in cases where a lot of threads are accessing and closing shared arenas at the same time (see attached benchmark), but unfortunately even disabling deoptimization altogether does not have a great effect on that benchmark.

Nevertheless, I think the extra logging/testing/benchmark code, and comments I've written, together with reducing the number of cases where we deoptimize (which makes it clearer exactly why we need to deoptimize in the first place), will be useful going forward. So, I've a create this PR out of them.

In this PR:

Deoptimizing is now only done in cases where it's needed, instead of always. Which is in cases where we are not inside an @Scoped method, but are inside a compiled frame that has a scoped access somewhere inside of it.
I've separated the stack walking code (for_scope_method) from the code that checks for a reference to the arena being closed (is_accessing_session), and added logging code to the former. That also required changing vframe code to accept an ouputStream* rather than always printing to tty.
Added a new test (TestConcurrentClose), that tries to close many shared arenas at the same time, in order to stress that use case.
Added a new benchmark (ConcurrentClose), that stresses the cases where many threads are accessing and closing shared arenas.

I've done several benchmark runs with different amounts of threads. The confined case stays much more consistent, while the shared cases balloons up in time spent quickly when there are more than 4 threads:

Benchmark                     Threads   Mode  Cnt     Score     Error  Units
ConcurrentClose.sharedAccess       32   avgt   10  9017.397 ± 202.870  us/op
ConcurrentClose.sharedAccess       24   avgt   10  5178.214 ± 164.922  us/op
ConcurrentClose.sharedAccess       16   avgt   10  2224.420 ± 165.754  us/op
ConcurrentClose.sharedAccess        8   avgt   10   593.828 ±   8.321  us/op
ConcurrentClose.sharedAccess        7   avgt   10   470.700 ±  22.511  us/op
ConcurrentClose.sharedAccess        6   avgt   10   386.697 ±  59.170  us/op
ConcurrentClose.sharedAccess        5   avgt   10   291.157 ±   7.023  us/op
ConcurrentClose.sharedAccess        4   avgt   10   209.178 ±   5.802  us/op
ConcurrentClose.sharedAccess        1   avgt   10    52.042 ±   0.630  us/op
ConcurrentClose.confinedAccess     32   avgt   10    25.517 ±   1.069  us/op
ConcurrentClose.confinedAccess      1   avgt   10    12.398 ±   0.098  us/op

(I manually added the Threads collumn btw)

Testing: tier 1-4

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8335480: Only deoptimize threads if needed when closing shared arena (Enhancement - P3)

Reviewers

Erik Österlund (@fisk - Reviewer) 🔄 Re-review required (review applies to d1266b53)
Maurizio Cimadamore (@mcimadamore - Reviewer)
Vladimir Kozlov (@vnkozlov - Reviewer)
Uwe Schindler (@uschindler - Author)
Vladimir Ivanov (@iwanowww - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20158/head:pull/20158
$ git checkout pull/20158

Update a local copy of the PR:
$ git checkout pull/20158
$ git pull https://git.openjdk.org/jdk.git pull/20158/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20158

View PR using the GUI difftool:
$ git pr show -t 20158

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20158.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-07-12T13:58:10Z

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-07-12T13:58:54Z

@JornVernee This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8335480: Only deoptimize threads if needed when closing shared arena

Reviewed-by: mcimadamore, kvn, uschindler, vlivanov, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 177 new commits pushed to the master branch:

bcb5e69: 8335921: Fix HotSpot VM build without JVMTI
10186ff: 8336300: DateFormatSymbols#getInstanceRef returns non-cached instance
7ec55df: 8336638: Parallel: Remove redundant mangle in PSScavenge::invoke
6df7acb: 8299080: Wrong default value of snippet lang attribute
8713628: 8334217: [AIX] Misleading error messages after JDK-8320005
67979eb: 8334781: JFR crash: assert(((((JfrTraceIdBits::load(klass)) & ((JfrTraceIdEpoch::this_epoch_method_and_class_bits()))) != 0))) failed: invariant
d41d2a7: 8334502: gtest/GTestWrapper.java fails on armhf due to LogDecorations.iso8601_utctime_test
59843f4: 8336040: Missing closing anchor element in Docs.gmk
70f3e99: 8336463: Parallel: Add PSOldGen::expand_and_allocate
b9b0b85: 8336375: Crash on paste to JShell
... and 167 more: https://git.openjdk.org/jdk/compare/dd74e7f8c1570ed34c89f4aca184f5668e4471db...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-07-12T13:59:30Z

@JornVernee The following labels will be automatically applied to this pull request:

core-libs
hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-07-12T14:43:24Z

Webrevs

JornVernee · 2024-07-12T20:55:39Z

This could be narrowed down further by tracking for each compiled method if it has an (inlined) call to an @Scoped method, but I've left that out for now.

I decided to add this to the PR for completeness, so that we don't go and deoptimize frames that are not using scoped accesses at all.

forax · 2024-07-13T11:10:44Z

Nice work !

Thinking a bit about how to improve the benchmark and given the semantics of Arena.close(), there is a trick that you can use. There are two kinds of memory segments, the one that only visible from Java and the one that are visible not only from Java. By example, a memory segment created from a mmap or a memory segment with an address sent to a C code are visible from outside Java, for those, you have no choice but wait in Arena.close() until all threads have answered to the handshakes. For all the other memory segments, because they are only visible from Java, their memory can be reclaimed asynchronously, i.e. the last thread of the handshakes can free the corresponding memory segments, so the thread that call Arena.close() is free to run even if the memory is not yet reclaimed.

From my armchair, that seems a awful lot of engeneering so it may not worth it, but now you know :)

JornVernee · 2024-07-13T14:26:46Z

That is something we considered in the past as well (I think Maurizio even had a prototype at some point). The issue is that close should be deterministic. i.e. after the call to close() returns, all memory should be freed. That is an essential property for applications that have most of their virtual address space tied up, and then want to release and immediately re-use a big chunk of it. If it's not important that memory is freed deterministically, but memory should still be accessible from multiple threads, an automatic arena might be a better choice.

fisk

Looks good.

fisk · 2024-07-13T15:28:57Z

@dougxc might want to have a look at Graal support for this one.

JornVernee · 2024-07-13T16:04:50Z

@dougxc might want to have a look at Graal support for this one.

Yes, I conservatively implemented has_scoped_access() for Graal (see jvmciRuntime.cpp changes). It won't regress anything, but there's still an opportunity for improvement.

forax · 2024-07-13T16:43:16Z

Knowing that all the segments are freed during close() is something you may want.
But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?).

Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded.

uschindler · 2024-07-14T11:01:58Z

Hi Jorn,

Many thanks for working on this!

I have one problem with the benchmark: I think it is not measuring the whole setup in a way that is our workload: The basic problem is that we don't want to deoptimize threads which are not related to MemorySegments. So basically, the throughput of those threads should not be affected. For threads currently in a memory-segment read it should have a bit of effect, but it should recover fast.

The given benchmark somehow only measures the following: It starts many threads; in each it opens a shared memory segment, does some work and closes it. So it measures the throughput of the whole "create shared/work on it/close shared" workload. Actually the problems we see in Lucene are more that we have many threads working on shared memory segments or on other tasks not related to memory segments at all, while a few threads are concurrently closing and opening new arenas. With more threads concurrently closing the arenas, also the throughput on other threads degrades.

So IMHO, the benchamrk should be improved to have a few threads (configurable) that open/close memory segments and a list of other threads that do other work and finally a list of threads reading from the memory segments opened by first thread. The testcase you wrote is more fitting the above workload. Maybe the benchmark should be setup more like the test. If you have a benchmark with that workload it should better show an improvement.

The current benchmark has the problem that it measures the whole open/work/close on shared sgements. And slosing a shared segment is always heavy, because it has to trigger and wait for the thread-local handshake.

Why is the test preventing inlining of the inner read method?

I may be able to benchmark a Lucene workload with a custom JDK build next week. It might be an idea to use the wrong DaCapoBenchmark (downgrade to older version before it has fixed dacapobench/dacapobench#264 , specifically dacapobench/dacapobench@76588b2).

Uwe

uschindler · 2024-07-14T11:07:08Z

Knowing that all the segments are freed during close() is something you may want. But having the execution time of close() be linear with the number of threads is also problematic. Maybe, it means that we need another kind of Arena that works like shared() but allow the freed to be done asynchronously (ofSharedAsyncFree ?).

Note that the semantics of ofSharedAsyncFree() is different from ofAuto(), ofAuto() relies on the GC to free a segment so the delay before a segment is freed is not time bounded if the application has enough memory, the memory of the segment may never be reclaimed. With ofSharedAsyncFree(), the segments are freed by the last thread, so while this mechanism is not deterministic, it is time bounded.

That's a great suggestion! In our case we just want the index files close as soon as possible, but not on next GC (which will be horrible and brings us back into the times of DirectByteBuffer where some buffer which were longer on use were never ever closed). The problem with GC is that the Arena/MemorySegments and so on are tiny objects which will live for very long time, especially when they were used for quite some time (like an index segment of an Lucene index).

So basically we would like to have: Close arena as soon as possible, but don't wait for it.

Of course for testing purposes in Lucene we could use ofShared() (to make sure all mmapped files are freeed, especially on Windows as soon as index is close), but in production environments we could offer the option to use delayed close to improve throughput.

Uwe

dougxc · 2024-07-15T08:28:16Z

src/hotspot/share/prims/scopedMemoryAccess.cpp

+        // }
+        //
+        // The safepoint at which we're stopped may be in between the liveness check
+        // and actual memory access, but is itself 'outside' of @Scoped code


what is @Scoped code? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html

This is the whole magic around the shared arena. It is not public API and internal to Hotspot/VM:

jdk/src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template

Lines 117 to 119 in a96de6d

@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})

@Retention(RetentionPolicy.RUNTIME)

@interface Scoped { }

jdk/src/hotspot/share/prims/scopedMemoryAccess.cpp

Lines 143 to 149 in a96de6d

/*

* This function performs a thread-local handshake against all threads running at the time

* the given session (deopt) was closed. If the handshake for a given thread is processed while

* one or more threads is found inside a scoped method (that is, a method inside the ScopedMemoryAccess

* class annotated with the '@Scoped' annotation), and whose local variables mention the session being

* closed (deopt), this method returns false, signalling that the session cannot be closed safely.

*/

what is @Scoped code? I don't see that annotation mentioned here: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/ScopedValue.html

This is nothing to do with scoped values, instead this is an annotation declared in jdk.internal.misc.ScopedMemoryAccess that is known to the VM.

Basically if the VM is inside a @Scoped method and it starts a thread-local handshake, it will deoptimize top-most frame of all those threads so they can do the "isAlive" check.

mcimadamore · 2024-07-15T17:07:24Z

Even if the int vs long issue is fixed for this case, i think we should recommand to call withInvokeExactBehavior() after creating any VarHandle so all the auto-conversions are treated as runtime errors.

This is what i do with my students (when using compareAndSet) and it makes this kind of perf issue easy to find and easy to fix.

Note that this has nothing to do with implicit conversion, as the memory segment var handle is called by our implementation, with the correct type (a long). This is likely an issue with bound check elimination with "long loops".

JornVernee · 2024-07-16T14:57:57Z

Added JVMCI/Graal support, courtesy of @c-refice

/contributor add @c-refice

openjdk · 2024-07-16T14:59:54Z

@JornVernee @c-refice was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

/contributor add @openjdk-bot
/contributor add duke
/contributor add J. Duke <[email protected]>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java

JornVernee · 2024-07-16T15:02:43Z

/contributor add Carlo Refice [email protected]

openjdk · 2024-07-16T15:03:37Z

@JornVernee
Contributor Carlo Refice <[email protected]> successfully added.

JornVernee · 2024-07-16T18:04:22Z

As discussed offline, JVMCI/Graal changes will be handled by a followup PR.

JornVernee · 2024-07-16T18:04:37Z

/contributor remove Carlo Refice [email protected]

openjdk · 2024-07-16T18:06:36Z

@JornVernee
Contributor Carlo Refice <[email protected]> successfully removed.

ChrisHegarty · 2024-07-17T13:21:54Z

Thanks for the discussion and changes in this PR - it's super helpful ( in what we can do to workaround ), as well as a great improvement for the future.

ChrisHegarty · 2024-07-17T13:23:48Z

Effectively, once all the issues surrounding reachability fences will be addressed, we should be able to achieve numbers similar to above even in the case of shared close.

Is there an issue where I can follow this? [ EDIT: oh! it's JDK-8290892 ]

mcimadamore

Changes in scopedMemoryAccess and benchmark look good

vnkozlov

I am fine with compiler and CI changes - it is just marking nmethod as having scoped access.

iwanowww

Looks good.

JornVernee · 2024-07-18T10:59:47Z

/integrate

openjdk · 2024-07-18T11:00:40Z

Going to push as commit 7bf5313.
Since your change was applied there have been 181 commits pushed to the master branch:

1b83bd9: 8336661: Parallel: Remove stacks_empty assert in PSScavenge::invoke
72297d2: 8317720: RISC-V: Implement Adler32 intrinsic
21a6cf8: 8336587: failure_handler lldb command times out on macosx-aarch64 core file
78cc0f9: 8336091: Fix HTML warnings in the generated HTML files
bcb5e69: 8335921: Fix HotSpot VM build without JVMTI
10186ff: 8336300: DateFormatSymbols#getInstanceRef returns non-cached instance
7ec55df: 8336638: Parallel: Remove redundant mangle in PSScavenge::invoke
6df7acb: 8299080: Wrong default value of snippet lang attribute
8713628: 8334217: [AIX] Misleading error messages after JDK-8320005
67979eb: 8334781: JFR crash: assert(((((JfrTraceIdBits::load(klass)) & ((JfrTraceIdEpoch::this_epoch_method_and_class_bits()))) != 0))) failed: invariant
... and 171 more: https://git.openjdk.org/jdk/compare/dd74e7f8c1570ed34c89f4aca184f5668e4471db...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-07-18T11:00:48Z

@JornVernee Pushed as commit 7bf5313.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

mcimadamore · 2024-07-18T15:24:38Z

there is some issue involving segment access with int induction variable which we should investigate separately

This issue is tracked here:
https://bugs.openjdk.org/browse/JDK-8336759

JornVernee added 10 commits July 2, 2024 15:23

Return before deoptimizing of target thread already has async exception

9363a5a

refactor close handshake

63c223a

Only deopt if necessary

46e57a7

add note about lacking session oop at safepoint

a02ff6b

add benchmark

b700fd9

cleanup

068c431

improve benchmark parameters

2948b72

tweak comment

dde703e

slightly improve comment

890170f

polish

34ff5fd

openjdk bot added hotspot [email protected] core-libs [email protected] labels Jul 12, 2024

JornVernee marked this pull request as ready for review July 12, 2024 14:37

openjdk bot added the rfr Pull request is ready for review label Jul 12, 2024

track has_scoped_access for compiled methods

d1266b5

fisk approved these changes Jul 13, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jul 13, 2024

dougxc reviewed Jul 15, 2024

View reviewed changes

JVMCI support

62849aa

dougxc reviewed Jul 16, 2024

View reviewed changes

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethod.java Outdated Show resolved Hide resolved

JornVernee added 2 commits July 16, 2024 15:08

clarify javadoc

cd5f290

Revert JVMCI changes

138fba4

benchmark review comments

2019289

mcimadamore approved these changes Jul 17, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jul 17, 2024

vnkozlov approved these changes Jul 17, 2024

View reviewed changes

uschindler approved these changes Jul 17, 2024

View reviewed changes

iwanowww approved these changes Jul 17, 2024

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Jul 18, 2024

openjdk bot closed this Jul 18, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 18, 2024

JornVernee deleted the FasterSharedClose branch July 18, 2024 11:04

c-refice mentioned this pull request Jul 19, 2024

8336489: Track scoped accesses in JVMCI compiled code #20256

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8335480: Only deoptimize threads if needed when closing shared arena #20158

8335480: Only deoptimize threads if needed when closing shared arena #20158

JornVernee commented Jul 12, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented Jul 12, 2024

openjdk bot commented Jul 12, 2024 •

edited

Loading

openjdk bot commented Jul 12, 2024

mlbridge bot commented Jul 12, 2024 •

edited

Loading

JornVernee commented Jul 12, 2024 •

edited

Loading

forax commented Jul 13, 2024

JornVernee commented Jul 13, 2024 •

edited

Loading

fisk left a comment

fisk commented Jul 13, 2024

JornVernee commented Jul 13, 2024 •

edited

Loading

forax commented Jul 13, 2024

uschindler commented Jul 14, 2024

uschindler commented Jul 14, 2024 •

edited

Loading

dougxc Jul 15, 2024

uschindler Jul 15, 2024

AlanBateman Jul 15, 2024

uschindler Jul 15, 2024 •

edited

Loading

mcimadamore commented Jul 15, 2024

JornVernee commented Jul 16, 2024

openjdk bot commented Jul 16, 2024

JornVernee commented Jul 16, 2024

openjdk bot commented Jul 16, 2024

JornVernee commented Jul 16, 2024

JornVernee commented Jul 16, 2024 •

edited

Loading

openjdk bot commented Jul 16, 2024

ChrisHegarty commented Jul 17, 2024

ChrisHegarty commented Jul 17, 2024 •

edited

Loading

mcimadamore left a comment

vnkozlov left a comment

iwanowww left a comment

JornVernee commented Jul 18, 2024

openjdk bot commented Jul 18, 2024

openjdk bot commented Jul 18, 2024

mcimadamore commented Jul 18, 2024

	@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})
	@Retention(RetentionPolicy.RUNTIME)
	@interface Scoped { }

	/*
	* This function performs a thread-local handshake against all threads running at the time
	* the given session (deopt) was closed. If the handshake for a given thread is processed while
	* one or more threads is found inside a scoped method (that is, a method inside the ScopedMemoryAccess
	* class annotated with the '@Scoped' annotation), and whose local variables mention the session being
	* closed (deopt), this method returns false, signalling that the session cannot be closed safely.
	*/

8335480: Only deoptimize threads if needed when closing shared arena #20158

8335480: Only deoptimize threads if needed when closing shared arena #20158

Conversation

JornVernee commented Jul 12, 2024 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Jul 12, 2024

openjdk bot commented Jul 12, 2024 • edited Loading

openjdk bot commented Jul 12, 2024

mlbridge bot commented Jul 12, 2024 • edited Loading

Webrevs

JornVernee commented Jul 12, 2024 • edited Loading

forax commented Jul 13, 2024

JornVernee commented Jul 13, 2024 • edited Loading

fisk left a comment

Choose a reason for hiding this comment

fisk commented Jul 13, 2024

JornVernee commented Jul 13, 2024 • edited Loading

forax commented Jul 13, 2024

uschindler commented Jul 14, 2024

uschindler commented Jul 14, 2024 • edited Loading

dougxc Jul 15, 2024

Choose a reason for hiding this comment

uschindler Jul 15, 2024

Choose a reason for hiding this comment

AlanBateman Jul 15, 2024

Choose a reason for hiding this comment

uschindler Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

mcimadamore commented Jul 15, 2024

JornVernee commented Jul 16, 2024

openjdk bot commented Jul 16, 2024

JornVernee commented Jul 16, 2024

openjdk bot commented Jul 16, 2024

JornVernee commented Jul 16, 2024

JornVernee commented Jul 16, 2024 • edited Loading

openjdk bot commented Jul 16, 2024

ChrisHegarty commented Jul 17, 2024

ChrisHegarty commented Jul 17, 2024 • edited Loading

mcimadamore left a comment

Choose a reason for hiding this comment

vnkozlov left a comment

Choose a reason for hiding this comment

iwanowww left a comment

Choose a reason for hiding this comment

JornVernee commented Jul 18, 2024

openjdk bot commented Jul 18, 2024

openjdk bot commented Jul 18, 2024

mcimadamore commented Jul 18, 2024

JornVernee commented Jul 12, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Jul 12, 2024 •

edited

Loading

mlbridge bot commented Jul 12, 2024 •

edited

Loading

JornVernee commented Jul 12, 2024 •

edited

Loading

JornVernee commented Jul 13, 2024 •

edited

Loading

JornVernee commented Jul 13, 2024 •

edited

Loading

uschindler commented Jul 14, 2024 •

edited

Loading

uschindler Jul 15, 2024 •

edited

Loading

JornVernee commented Jul 16, 2024 •

edited

Loading

ChrisHegarty commented Jul 17, 2024 •

edited

Loading