Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8349105: Pagination: exception initializing in a background thread #1698

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

andy-goryachev-oracle
Copy link
Contributor

@andy-goryachev-oracle andy-goryachev-oracle commented Feb 6, 2025

Root Cause

Animation gets started in a background thread, which causes the animation handler to run in the FX application thread, thus creating simultaneous access to the control's fields (list of children in this case).

Solution

Postpone the animation unless running in the FX application thread. There is no functional difference if the component is created/used in the FX application thread.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8349105: Pagination: exception initializing in a background thread (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1698/head:pull/1698
$ git checkout pull/1698

Update a local copy of the PR:
$ git checkout pull/1698
$ git pull https://git.openjdk.org/jfx.git pull/1698/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1698

View PR using the GUI difftool:
$ git pr show -t 1698

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1698.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 6, 2025

👋 Welcome back angorya! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 6, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@andy-goryachev-oracle andy-goryachev-oracle marked this pull request as ready for review February 6, 2025 20:55
@openjdk openjdk bot added the rfr Ready for review label Feb 6, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 6, 2025

Webrevs

@kevinrushforth
Copy link
Member

/reviewers 2

@kevinrushforth kevinrushforth self-requested a review February 6, 2025 23:11
@openjdk
Copy link

openjdk bot commented Feb 6, 2025

@kevinrushforth
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test runs out of heap space for me on Windows with or without your fix using the default heap size. This may point to a leak, although bumping up the memory causes it to pass with your fix.

The fix looks good, although the test doesn't catch the bug most of the time (even when bumping up the heap).

@andy-goryachev-oracle
Copy link
Contributor Author

The test runs out of heap space for me on Windows

This may be a memory leak, or it could simply be expected - after all, it is a stress test and we do create heavy objects in a tight loop.

I think it's worth investigating in the context of this ticket. Thank you!

@kevinrushforth
Copy link
Member

This may be a memory leak, or it could simply be expected - after all, it is a stress test and we do create heavy objects in a tight loop.

I think it's worth investigating in the context of this ticket. Thank you!

Yep, it does needs to be investigated for this PR, since we can't enable a test that will cause OOM when running the tests normally.

I'll be interested to know what you find.

@andy-goryachev-oracle
Copy link
Contributor Author

I am unable to reproduce OOM on Windows 11 with the fix (I did see the OOM earlier when it was failing before the fix).

I've tried the unmodified test, as well as the extended version with the DURATION doubled to 10 seconds and added @RepeatedTest(value = 5) to the pagination() test. Same extended test passes just fine on macOS.

A cursory look at the Pagination/PaginationSkin code reveals no obvious candidates for memory leaks. Maybe try running it again, now that it has the text layout fixes as well?

@kevinrushforth
Copy link
Member

I still see the OOM. Are you running it from gradle? Here is the command line that will reproduce it, after first building with: "gradle sdk shims":

gradle --info -PTEST_ONLY=true -PFULL_TEST=true -PUSE_ROBOT=true \
    :systemTests:test --tests NodeInitializationStressTest.pagination

@kevinrushforth
Copy link
Member

I also see the OOM on Mac.

@andy-goryachev-oracle
Copy link
Contributor Author

I see the problem with gradle (I was running it in Eclipse). We apparently set -Xmx512m which is simply inadequate. We should at least double that, or even go to 2GB.
Where is it being set?

@kevinrushforth
Copy link
Member

I see the problem with gradle (I was running it in Eclipse). We apparently set -Xmx512m which is simply inadequate. We should at least double that, or even go to 2GB.

Where is it being set?

The default is set by the gradle installation itself, which is not controlled by us. I know you can bump the memory with a gradle option on the command line (which we do for our CI builds, but we just set it to 512Mb).

We can trivially change our CI test script, but I'd be hesitant to require every developer to set that on the command line. A better solution might be to set the max heap to 1Gb in the system tests project in build.gradle.

@andy-goryachev-oracle
Copy link
Contributor Author

A better solution might be to set the max heap to 1Gb in the system tests project in build.gradle.

created https://bugs.openjdk.org/browse/JDK-8349679

@kevinrushforth
Copy link
Member

A quick test suggests that there is still something up with Pagination that increasing the memory won't fully resolve. Even bumping it up to 2Gb fails sometimes on my Mac (e.g., if I increase the duration to 10 sec). I added -verbose:gc and the memory shoots up to almost 2Gb and doesn't really ever go down. None of the other tests come even close to hitting a memory problem, and when a GC happens, the memory in use drops down to about 12-15 Mbytes (not surprising given that the entire test suite runs with the default heap size if pagination is disabled).

More study is needed, so let's pick this up next week.

@andy-goryachev-oracle
Copy link
Contributor Author

it looks like the amount of garbage generated during this test exceeds the ability of gc to collect it.

adding System.gc() to the test allows it to run even after -Xmx500m, though I still would like to increase the heap size in JDK-8349679

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the explicit call to System.gc() this now passes for me. I was also able to reproduce the failure without your fix using the latest version of the test, and confirm that it passes with your fix.

I think the frequency of calling System.gc() is probably too high. Perhaps calling it every 10 or 100 times through the loop would suffice (and then it would be a better stress test of object allocation, since System.gc() effectively introduces delay)? That might be difficult to do without modifying the test method, so we could consider this for a follow-up.

Somewhat related, I'm convinced that there is a leak somewhere. If I run just the pagination test for 100 seconds rather than 5, it's easy to see that the memory keeps growing -- with or without the fix for this bug. We should file a follow-up bug for this.

@andy-goryachev-oracle
Copy link
Contributor Author

You are probably right about System.gc() being called too often. I would say though that the purpose of this test is to uncover initialization issues rather than finding other problems with the controls, so it's probably ok as is.

For the OOME - there seems to be no leak, since the memory usage drops down to initial values either using System.gc() in the loop, or when running in the IDE with the large heap: the memory fills up rather quickly, then drops again down to low value when one hits "Run GC" in the VirtualVM for example.

(we also have no leak detected in the skin test and no memory issues logged against Pagination in JBS)

@kevinrushforth
Copy link
Member

kevinrushforth commented Feb 10, 2025

You are probably right about System.gc() being called too often. I would say though that the purpose of this test is to uncover initialization issues rather than finding other problems with the controls, so it's probably ok as is.

it still might be worth a follow-up bug. As it is, calling System.gc() each time through the loop means that you are creating far fewer objects than you would in a tight loop (by an order of magnitude based on a few quick tests I did).

For the OOME - there seems to be no leak, since the memory usage drops down to initial values either using System.gc() in the loop, or when running in the IDE with the large heap: the memory fills up rather quickly, then drops again down to low value when one hits "Run GC" in the VirtualVM for example.

When I run it with -verbose:gc I see the memory continually increasing, although slowly. I'll try it with Visual VM and see if I spot anything. In any case, if there is a leak here, it isn't a large one (and isn't directly related to this bug), so would be something to look into as a follow-up.

@andy-goryachev-oracle
Copy link
Contributor Author

Created https://bugs.openjdk.org/browse/JDK-8349750 placeholder for collecting follow-up issues.

Copy link
Member

@kevinrushforth kevinrushforth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed JDK-8349756 to track the leak. I think the call to System.gc() is effective only because it slows down the tight loop.

Comment on lines 414 to 417
// this test generates a lot of garbage quickly
if (nextBoolean(0.1)) {
System.gc();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While investigation the Pagination leak I also took a closer look at this System.gc(). The reason this prevents the OOM has nothing to do with the GC per se, but rather it slows the loops down so much that they don't generate nearly as much garbage in the 5 seconds the test runs in. You can verify that yourself by replacing the System.gc() with sleep(10) or similar (depending on the speed of your system), and get the same effect.

I don't think there is a good reason to call System.gc() at all in this loop. Once the leak is fixed, it should run fine at full speed.

@andy-goryachev-oracle
Copy link
Contributor Author

Please review #1705 before this one.

@openjdk
Copy link

openjdk bot commented Feb 11, 2025

⚠️ @andy-goryachev-oracle This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@andy-goryachev-oracle
Copy link
Contributor Author

For testing purposes, merged this PR with the memory leak fix #1705 . Once the latter is integrated, unrelated changes will disappear.

@kevinrushforth kevinrushforth self-requested a review February 12, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfr Ready for review
Development

Successfully merging this pull request may close these issues.

2 participants