Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR Capture heap dump after OOM on CI #19031

Open
wants to merge 40 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
4baadb0
ignore the GC pause thing for now
mumrah Feb 25, 2025
d1e4af0
set the jvm arg in the right place
mumrah Feb 26, 2025
641c0df
increase worker heap size
mumrah Feb 26, 2025
76147b9
reduce heap to try to get a GC pause
mumrah Feb 26, 2025
95240c1
speed up testing a bit for now
mumrah Feb 26, 2025
a22eb2a
disable caching
mumrah Feb 26, 2025
f4121f1
disable cache
mumrah Feb 26, 2025
d6e1e56
increase JUnit heap
mumrah Feb 26, 2025
fe16ad0
empty
mumrah Feb 26, 2025
75dfd65
only run flaky 17
mumrah Feb 26, 2025
ce1d399
really disable the build cache
mumrah Feb 26, 2025
59f5244
try getting a heap dump
mumrah Feb 26, 2025
3c3bbdb
empty
mumrah Feb 26, 2025
7a056c9
empty
mumrah Feb 27, 2025
3d7b910
empty
mumrah Feb 27, 2025
ac2357b
force an oom
mumrah Feb 27, 2025
5b5669b
checkstyle
mumrah Feb 27, 2025
b1d6a6e
checkstyle
mumrah Feb 27, 2025
9eddadd
always archive heap dump
mumrah Feb 27, 2025
3812530
dont retain heap dumps very long
mumrah Feb 27, 2025
dec43a7
find
mumrah Feb 27, 2025
8f977fc
wip
mumrah Feb 27, 2025
fc49738
remove forced OOM
mumrah Feb 27, 2025
98131ff
Merge remote-tracking branch 'origin/trunk' into tmp-oom-in-flaky-test
mumrah Feb 27, 2025
c86117b
wip
mumrah Feb 27, 2025
d44ea8c
empty
mumrah Feb 27, 2025
32b6815
back to 2g heap
mumrah Feb 27, 2025
46a8dad
add a bunch of runs
mumrah Feb 28, 2025
5d0dceb
fix conflict
mumrah Feb 28, 2025
dfe417d
keep trying
mumrah Feb 28, 2025
ef377e5
still trying
mumrah Feb 28, 2025
dfa1359
lower count
mumrah Feb 28, 2025
b8d1c46
remove ls
mumrah Mar 3, 2025
3d1bf90
empty
mumrah Mar 3, 2025
8f6c75c
empty
mumrah Mar 3, 2025
6829676
empty
mumrah Mar 3, 2025
9dcbe73
revert some things
mumrah Mar 3, 2025
603d6a0
empty
mumrah Mar 3, 2025
b8f66a4
Merge remote-tracking branch 'origin/trunk' into tmp-oom-in-flaky-test
mumrah Mar 3, 2025
ef070f3
increase retention
mumrah Mar 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,19 +168,17 @@ jobs:
fi

test:
needs: [configure, validate, load-catalog]
needs: [configure, load-catalog]
if: ${{ ! needs.configure.outputs.is-draft }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# If we change these, make sure to adjust ci-complete.yml
java: [ 23, 17 ]
run-flaky: [ true, false ]
run-new: [ true, false ]
exclude:
- run-flaky: true
run-new: true
java: [ 17 ]
run-flaky: [ true ]
run-new: [ false ]

env:
job-variation: ${{ matrix.java }}-${{ matrix.run-flaky == true && 'flaky' || 'noflaky' }}-${{ matrix.run-new == true && 'new' || 'nonew' }}
name: JUnit tests Java ${{ matrix.java }}${{ matrix.run-flaky == true && ' (flaky)' || '' }}${{ matrix.run-new == true && ' (new)' || '' }}
Expand All @@ -195,8 +193,8 @@ jobs:
uses: ./.github/actions/setup-gradle
with:
java-version: ${{ matrix.java }}
gradle-cache-read-only: ${{ !inputs.is-trunk }}
gradle-cache-write-only: ${{ inputs.is-trunk }}
gradle-cache-read-only: false
gradle-cache-write-only: false
develocity-access-key: ${{ secrets.DEVELOCITY_ACCESS_KEY }}

# If the load-catalog job failed, we won't be able to download the artifact. Since we don't want this to fail
Expand Down
4 changes: 2 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ ext {
buildVersionFileName = "kafka-version.properties"

defaultMaxHeapSize = "2g"
defaultJvmArgs = ["-Xss4m", "-XX:+UseParallelGC"]
defaultJvmArgs = ["-Xss4m", "-XX:+UseParallelGC", "-XX:-UseGCOverheadLimit"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma WDYT about disabling this feature? From what I can tell, this will prevent a long GC pause from triggering an OOM. Instead, the build would likely just timeout (which it's doing anyways with the OOM happing in the Gradle worker).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, the build is unlikely to succeed in either case. The GC overhead thing at least gives a hint that there is a memory leak or the heap is too small. Isn't that better than a timeout with no information?


// "JEP 403: Strongly Encapsulate JDK Internals" causes some tests to fail when they try
// to access internals (often via mocking libraries). We use `--add-opens` as a workaround
Expand Down Expand Up @@ -525,7 +525,7 @@ subprojects {
maxParallelForks = maxTestForks
ignoreFailures = userIgnoreFailures

maxHeapSize = defaultMaxHeapSize
maxHeapSize = "3g"
jvmArgs = defaultJvmArgs

// KAFKA-17433 Used by deflake.yml github action to repeat individual tests
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@ scalaVersion=2.13.15
# Adding swaggerVersion in gradle.properties to have a single version in place for swagger
swaggerVersion=2.2.25
task=build
org.gradle.jvmargs=-Xmx4g -Xss4m -XX:+UseParallelGC
org.gradle.jvmargs=-Xmx4g -Xss4m -XX:+UseParallelGC -XX:-UseGCOverheadLimit
org.gradle.parallel=true
Loading