-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenJDK crash in Compiled_method=UnicodeCasingTest.test #17515
Comments
@0xdaryl fyi |
5x x 5 grinder https://openj9-jenkins.osuosl.org/job/Grinder/2480/ |
@zl-wang : please assign for investigation |
@bhavanisn please take a look |
The crash was not duplicated in the grinder. |
@pshipton As the crash is not reproducible is there a way to get the coredump files ? In parallel, I am also trying to reproduce the issue. |
The core file is found in openjdk_test_output.tar.gz at |
I see it now. I ran |
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Release/62 - cent7-ppcle-3
|
There is not yet a break through still, but posting this data just to see if anything looks familiar with any crashes before.
Next: Currently I am going through on the propagation of the values in those registers. It's a big chase. Will post if I have some solid info. |
Out of the multiple chases on this core file, below is one of it: Two consecutive instructions which is loading to same register
|
r23 is expected to be an array object: the crashing load is trying to load the array length. the first store to 0x390 frame offset is a stack slot initialization (for GC ... if it remained garbage, it can crash the GC) or originating from bytecode? at least, you can look into that for better understanding later. listing all 0x390 accesses is not enough ... you need to see where else r23 value is established (either loaded into r23 or calculated into r23). hopefully, the two ld into r23 from 0x390 offset are all there establishing r23 value. then, we can assert that NULL r23 came from the first store (to 0x390). this at last can help us say what the problem is. |
if you can look back up to way earlier trees (ILs), i believed you can see the NULLChk which must have been optimized out. which optimization removed it? that can point to the problem too. |
@bhavanisn @zl-wang Any new news on this issue? |
it is not re-producible (in 300 runs of grinder) … only investigating based on the core file. i think it is justified to defer it to 0.41. |
As the trace file and instructions is pretty big to analyze it would take time. It seems like we might be missing to set the initial value to gr23 (which holds the java/lang/String type) to the slack slot 0x390. In the instructions it is seen only to initilized to Place where jitcompilation logs are stored: |
Small update on investigation: Below is the related IL: Crashing global register (gr23) initialization is loading from
Temp slot 108 Initialization:
After
At this point the load from
Talking to optimizer team (@hzongaro ) to further investigate for which trace file with While running tests and in case this test fails, if we can enable to collect the trace with this option will be useful. Is there a way to do that ?
|
As we do not have enough information to debug and it is not reproducible locally or on grinder, can we close this and re-open if it reoccurs ? |
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_testList_2/29 - p10rhel005
|
Reopening as there is another crash, although it's a bit different. |
No luck reproducing the failure with In the meanwhile, a fix won't make it into the 0.46 release, so I will move this out to 0.48. |
I forced Before General Store Sinking:
After General Store Sinking:
From the trace information for General Store SInking for these blocks:
So it looks like the store to However, the anchored reference to |
Internal build
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Release_testList_1/15/ - ubu24-ppc64le-2
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_1/91 - ubu22-ppc64le-3
|
Several months ago, @vijaysun-omr and I discussed my observation that trees were left behind that contained references to stores that had been sunk passed those references. A later pass of Dead Trees Elimination would ordinarily remove those trees but failed to in this case, because that optimization decided to stop prematurely:
At the time he asked how Dead Store Elimination handles similar situations as it might remove a store that in turn allows another store to be removed. I've finally come back to this issue, and it appears that a similar problem must have been encountered in Dead Store Elimination about ten years ago. That optimization contains logic that looks for "unsafe" references that appear in trees beneath I will look at refactoring the code that deals with unsafe references from Dead Store Elimination so that it can be used by General Store Sinking as well. |
I managed to create a reduced test case that reproduces this problem, with a heavy dose of compiler options to set up the scenario in the IL:
Update: Removed the |
I'll need to ensure the fix handles stores that have a non-zero reference count correctly. This will need to move out to 0.51. |
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly/541 - cent7-ppcle-1
jdk_lang_1
java/lang/String/UnicodeCasingTest.java
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly/541/openjdk_test_output.tar.gz
The text was updated successfully, but these errors were encountered: