-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jdk_foreign_0_FAILED - java/foreign/TestHandshake.java #13211
Comments
A slightly different error at JDK16 0.27 release build https://openj9-jenkins.osuosl.org/job/Test_openjdk16_j9_sanity.openjdk_x86-64_windows_Release/6/consoleFull
|
Another one at JDK 16 0.27 release build https://openj9-jenkins.osuosl.org/job/Test_openjdk16_j9_sanity.openjdk_ppc64le_linux_Release/6/consoleFull
|
@EricYangIBM This may be related to #13234 |
I believe the issue is that https://github.com/ibmruntimes/openj9-openjdk-jdk16/blob/openj9/test/jdk/java/foreign/TestHandshake.java#L242 keeps throwing and looping without ever successfully closing the segment because there is always another thread accessing that segment. I'm not sure why it only happens for the SegmentMismatchAccessor variation, I think it is because ScopedMemoryAccess.vectorizedMismatch takes longer and therefore gives fewer opportunities to close the segment. Would having closeScope0 interrupt threads that are accessing the scope during close with an exception increase the success rate of close? I get the feeling that this is an issue with the test since it is a try until success - there is a probability that close will always happen during segment access. Also, I am intermittently getting this failure locally. Do the release builds run the test over multiple iterations? |
Release builds don't do anything special, they just run the test. |
Can you set |
Testing that right now, but I think it will. https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/17169/console shows SegmentMismatchAccessor iterations that successfully close the segment (e.g. at 348 ms, 5049 ms for iteration 0 and 1 for the last grinder iteration). If the timeout is increased I think there will be more successful SegmentMismatchAccessor iterations. |
I think the problem is that SegmentMismatchAccessor takes too long (it compares two 1 000 000 byte native memory segments) which prevents the main thread from obtaining the lock. In https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/17198/consoleFull you can see that an accessor (thread) often holds the lock for 2000 ms. In the hotspot implementation closeScope interrupts accessor threads so the closer thread can immediately acquire the lock. Our implementation doesn't do this so we get the long hold times. |
Update: Replacing https://github.com/ibmruntimes/openj9-openjdk-jdk16/blob/openj9/test/jdk/java/foreign/TestHandshake.java#L76 with while (!accessExecutor.isTerminated()) {
Thread.sleep(5000);
} eventually results in the test passing (locally on x86-64 linux). I will open a PR to exclude this test and an issue to add interrupts to closeScope0. |
Test depends on RI behaviour not implemented. Issue: eclipse-openj9/openj9#13211 Signed-off-by: Eric Yang <[email protected]>
Test depends on RI behaviour not implemented. Issue: eclipse-openj9/openj9#13211 Signed-off-by: Eric Yang <[email protected]>
The
fyi @llxia |
Seems the test was excluded after the build which failed. |
The test in question has been excluded,
Moving the issue to 0.29. |
In JDK22, the definition has changed to native void closeScope0(MemorySessionImpl session, ScopedAccessError error); The return type has changed from boolean to void, and a second parameter is re-introduced: "ScopedAccessError error". Previously, closeScope0 would be retried until it returned true, which suggested that the scope/session was not found on any thread's stack. The new behaviour sets an asynchronous exception for all threads, which have the scope/session on their stack. The second parameter, error, is the asynchronous exception, which is thrown. Before throwing the error, the thread verifies if the scope/session is still on the stack. If not, the asynchronous exception is cleared and not thrown. Related: eclipse-openj9#13211 (comment) Signed-off-by: Babneet Singh <[email protected]>
There are gaps where async exceptions are not processed in time (e.g. JIT compiled code in a loop). Threads will wait in closeScope0 until J9VMThread->scopedError (async exception) is transferred to J9VMThread->currentException. The wait prevents a MemorySession to be closed until no more operations are being performed on it. Related: eclipse-openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
TestHandshake has been fixed by - eclipse-openj9/openj9#19167 - eclipse-openj9/openj9#19412 Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
TestHandshake has been fixed by - eclipse-openj9/openj9#19167 - eclipse-openj9/openj9#19412 Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
TestHandshake has been fixed by - eclipse-openj9/openj9#19167 - eclipse-openj9/openj9#19412 Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
There are still failures on jdk17 https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/40267/
|
To address the synchronization issues, RI introduced the new async handshake approach only in JDK22: https://bugs.openjdk.org/browse/JDK-8310644. JDK17 still uses the old impl; the async handshake approach will need to be manually backported to JDK17 and JDK21 to address the issues reported in #13211 (comment). |
I don't think we should bother fixing jdk17 since it's an incubator. Pls exclude the test there again. |
The new async handshake approach to fix synchronization issues was introduced only in JDK22. JDK17 and JDK21 still use the old implementation, which has synchronization issues. Thus, TestHandshake is being excluded in JDK17 and JDK21. Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
The new async handshake approach to fix synchronization issues was introduced only in JDK22. See https://bugs.openjdk.org/browse/JDK-8310644 for more details on the async handshake approach. JDK17 and JDK21 still use the old implementation, which has synchronization issues. Thus, TestHandshake is being excluded in JDK17 and JDK21. Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
Discussed with @tajila offline. We should also exclude it for JDK21. We will re-enable it once the async handshake approach is ported to JDK21 by the RI. Opened adoptium/aqa-tests#5279 to exclude TestHandshake in JDK17 and JDK21. |
Acknowledging this may never happen, since it's preview in jdk21. |
The new async handshake approach to fix synchronization issues was introduced only in JDK22. See https://bugs.openjdk.org/browse/JDK-8310644 for more details on the async handshake approach. JDK17 and JDK21 still use the old implementation, which has synchronization issues. Thus, TestHandshake is being excluded in JDK17 and JDK21. Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
TestHandshake has been fixed by - eclipse-openj9/openj9#19167 - eclipse-openj9/openj9#19412 Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
My understanding is the problem is not totally resolved and there is still work to be done for this, an interface between the JIT and VM to determine if a method has been inlined. We won't close this but move it forward. |
A 50x jdk22 grinder of java/foreign/TestHandshake.java on Windows passed, which is better than past results #13211 (comment) |
TestHandshake has been fixed by - eclipse-openj9/openj9#19167 - eclipse-openj9/openj9#19412 Related: eclipse-openj9/openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
There are gaps where async exceptions are not processed in time (e.g. JIT compiled code in a loop). Threads will wait in closeScope0 until J9VMThread->scopedError (async exception) is transferred to J9VMThread->currentException. The wait prevents a MemorySession to be closed until no more operations are being performed on it. Related: eclipse-openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
JDK22 x86-64_windows(
50x grinder - 3/50 failed
|
There are gaps where async exceptions are not processed in time (e.g. JIT compiled code in a loop). Threads will wait in closeScope0 until J9VMThread->scopedError (async exception) is transferred to J9VMThread->currentException. The wait prevents a MemorySession to be closed until no more operations are being performed on it. Related: eclipse-openj9#13211 Signed-off-by: Babneet Singh <[email protected]>
Failure link
https://openj9-jenkins.osuosl.org/job/Test_openjdk16_j9_sanity.openjdk_ppc64_aix_Nightly/55/consoleFull
Rerun in Grinder
Optional info
Failure output (captured from console output)
The text was updated successfully, but these errors were encountered: