Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory reporting for UI tests and exit for E4Testable on OOM #2433

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iloveeclipse
Copy link
Member

Maybe this could help understanding OOM errors on jenkins.

See #2432

Copy link
Contributor

github-actions bot commented Oct 21, 2024

Test Results

 1 821 files  ±0   1 821 suites  ±0   2h 9m 16s ⏱️ + 12m 59s
 7 712 tests ±0   7 484 ✅ +1  228 💤 ±0  0 ❌  - 1 
24 297 runs  ±0  23 550 ✅ +1  747 💤 ±0  0 ❌  - 1 

Results for commit 820b875. ± Comparison against base commit 0a3c1fd.

♻️ This comment has been updated with latest results.

@iloveeclipse
Copy link
Member Author

Looks like OOM's are happening now later, after UITestsuite is executed:

Tests run: 1659, Failures: 7, Errors: 44, Skipped: 196


Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Active Thread: Equinox Container: 97a6556b-edd7-45ed-ab09-86bf747df888"
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Worker-47" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Worker-30" java.lang.OutOfMemoryError: Java heap space

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-48"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-49"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Worker-50"

The last memory output looks like

#################################################
org.eclipse.ui.tests.preferences.ViewerItemsLimitTest
########### Memory usage reported by JVM ########
   1.073.741.824 bytes max heap
   1.034.944.512 bytes heap allocated
     659.199.560 bytes free heap
     375.744.952 bytes used heap
#################################################

The only test after that was org.eclipse.ui.tests.stress.OpenCloseTest which doesn't look suspicious on first glance.

@merks
Copy link
Contributor

merks commented Oct 21, 2024

Thank you for investing time to track down this problem.

@iloveeclipse
Copy link
Member Author

Last build had no OOM's
but multiple test fails due eclipse-platform/eclipse.platform#1592, see

https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/2/#showFailuresLink

I guess the OOM problem could be related / fixed by eclipse-jdt/eclipse.jdt.core#3126, I saw eclipse-jdt/eclipse.jdt.core@9c11818 broke lot of things in our company internal tests, including endless loops and test crashes. However, our tests rely a lot on JDT, while platform UI tests only use JDT in one or two tests indirectly...

But that could be just coincidence, I will retrigger tests once again.

@iloveeclipse
Copy link
Member Author

But that could be just coincidence, I will retrigger tests once again.

See https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/3/
No OOMs, but lot of test fails related to eclipse-platform/eclipse.platform#1592.

@iloveeclipse
Copy link
Member Author

Hmm, interestingly there are OOM errors on #2438.

@akurtakov
Copy link
Member

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO. My way to work towards this is to reduce tests to do the minimum setup needed and to reduce the usage of older versions of libs, hoping that at some point that would pay off not only in easier to understand tests but also with less stress on build machines.

@iloveeclipse
Copy link
Member Author

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO.

Not sure why do you think the test execution order is not fixed, it seem to be defined by the UiTestSuite, at least it is that what I observe in log.

My way to work towards this is to reduce tests to do the minimum setup needed and to reduce the usage of older versions of libs, hoping that at some point that would pay off not only in easier to understand tests but also with less stress on build machines.

I don't think this is applicable here, so far it looks like the OOM's appearing at the end of the suite, and so far all measurements printed didn't show any excessive memory use at all.

So either we have *something that hits memory at the test end (what???) or it is JVM that is lazy to call GC timely and crashes with OOM's just because GC has no free thread/CPU core to do the work. Later one would match the observation that we have lot of blocked threads and so also lot of fails due eclipse-platform/eclipse.platform#1592. So far I saw no OOM's after adding explicit gc() calls on teardown on this PR.

If so, adding explicit gc() calls could stabilize test execution on such poor VM's we have.

@akurtakov
Copy link
Member

akurtakov commented Oct 22, 2024

Neither test execution order nor bundle build time are fixed, this makes it almost impossible why the OOM happens on Jenkins IMO.

Not sure why do you think the test execution order is not fixed, it seem to be defined by the UiTestSuite, at least it is that what I observe in log.

"From version 4.11, JUnit will by default use a deterministic, but not predictable, order. " from https://github.com/junit-team/junit4/wiki/test-execution-order . So as long as nothing changes order stays the same but whenever there is some change that order can go totally different (in my experience).

@iloveeclipse
Copy link
Member Author

"From version 4.11, JUnit will by default use a deterministic, but not predictable, order. " from https://github.com/junit-team/junit4/wiki/test-execution-order . So as long as nothing changes order stays the same but whenever there is some change that order can go totally different (in my experience).

Sure, this is about test methods in the test class, I was talking about test classes.

@iloveeclipse iloveeclipse marked this pull request as ready for review October 22, 2024 09:03
@iloveeclipse
Copy link
Member Author

Still failing with OOMs, still till the end no sign of any memory issues.
https://ci.eclipse.org/platform/job/eclipse.platform.ui/job/PR-2433/6/consoleFull

So it could be a memory spike on shutdown (why?) or in the tycho/surefire post-processing code.

@laeubi : were there any updates on surefire recently that could be related? I remember we had a memory leak in the past after surefire update, so maybe something similar happened again?

To make sure it has no relationship to failed tests because of not deleted files, will wait for the SDK build with the fix for eclipse-platform/eclipse.platform#1593

Also exit E4Testable on OOM.

This is supposed to workaround and to understand OOM errors on jenkins.

See eclipse-platform#2432
@iloveeclipse
Copy link
Member Author

[ERROR] Failed to execute goal org.eclipse.tycho:tycho-surefire-plugin:4.0.9:test (default-test) on project org.eclipse.ui.tests: An unexpected error occurred while launching the test runtime (process returned error code 13). The process logfile /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/data/.metadata/.log might contain further details. Command-line used to launch the sub-process was /opt/tools/java/openjdk/jdk-17/latest/bin/java -Dosgi.noShutdown=false -Dosgi.os=linux -Dosgi.ws=gtk -Dosgi.arch=x86_64 --add-modules=ALL-SYSTEM -Dosgi.clean=true -ea -jar /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/.m2/repository/p2/osgi/bundle/org.eclipse.equinox.launcher/1.6.900.v20240613-2009/org.eclipse.equinox.launcher-1.6.900.v20240613-2009.jar -data /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/data -install /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work -configuration /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/work/configuration -application org.eclipse.tycho.surefire.osgibooter.uitest -testproperties /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests/target/surefire.properties in working directory /home/jenkins/agent/workspace/eclipse.platform.ui_PR-2433/tests/org.eclipse.ui.tests

We probably should simply increase heap size for tests from default (1/4 RAM == 1 GB) to at least 2 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants