Add additional details on Gradle Check failures autocut issues #13950

prudhvigodithi · 2024-06-03T18:56:50Z

Is your feature request related to a problem? Please describe

Coming from #11217 and #3713 we now have the the failure analytics for OpenSearch Gradle Check failures. For more details check the Developer Guide. The next step is to highlight the flaky failures as GitHub Issues so they can be prioritized and fixed. Additionally, when a test fails during the creation of a new PR but is not related to the PR's code changes, this process will help make the contributor aware of the existing flaky test.

Describe the solution you'd like

From the OpenSearch Gradle Check Metrics OpenSearch Gradle Check Metrics dashboard create an issue (with specific format) on failing tests that are part of post merge actions. The failing tests from post merge actions which are executed after the PR is merged are for sure the flaky tests.

Related component

Build

Describe alternatives you've considered

Today the gradle check failure issues are created from https://github.com/opensearch-project/OpenSearch/blob/main/.github/workflows/gradle-check.yml#L161-L168 which sometimes fails to execute https://github.com/opensearch-project/OpenSearch/actions/runs/9320653340/job/25657907035, so clean up the issues and disable the functionality in the gradle-check.yml.

Additional context

Coming from @dblock #11217 (comment) we should also

For any failed test, lookup an existing flaky test issue, if it exists, comment.
For any new failure, highlight it in comments with something like "new flaky test? please check and open one manually".

msfroh · 2024-06-03T22:52:16Z

I added a comment on #11217 (comment), highlighting that we need to avoid tainted data from tests broken by changes in an open PR, but that get fixed before the PR is merged. (Short story: I added a commit to fix 1 test and broke over 1000 other tests. Nobody else saw it though, because my mistake was limited to my open PR.)

Maybe a good heuristic could look at tests that fail across multiple PRs in some time window? Alternatively, if we have a job that just runs Gradle check continuously (without changing any code), it could be a good canary to collect "true" failures.

prudhvigodithi · 2024-06-03T23:13:38Z

Short story: I added a commit to fix 1 test and broke over 1000 other tests. Nobody else saw it though, because my mistake was limited to my open PR.

Thanks @msfroh, the question I have is for I added a commit to fix 1 test and broke over 1000 other tests are you referring after the PR was merged and it broke 1000 other tests? AFIAK it should be caught while the PR is open itself because today we run gradle check against the commit part of the open PR and against the commit that was merged (after the PR was merged and part of post merge action).

msfroh · 2024-06-04T02:12:02Z

are you referring after the PR was merged and it broke 1000 other tests? AFIAK it should be caught while the PR is open itself because today we run gradle check against the commit part of the open PR

Correct -- it was caught while the PR was still open. I noticed that those 1000+ test failures showed up on the dashboard (and made it look like ClientYamlTestSuiteIT is the biggest source of failures in the past week).

peternied · 2024-06-05T15:33:34Z

[Triage - attendees 1 2 3 4 5 6 7]
@prudhvigodithi Thanks for creating this issue

prudhvigodithi · 2024-06-05T17:02:33Z

Maybe a good heuristic could look at tests that fail across multiple PRs in some time window? Alternatively, if we have a job that just runs Gradle check continuously (without changing any code), it could be a good canary to collect "true" failures.

Hey @msfroh thanks for your input, the idea is to create a issue on when a test is failed in Post Merge Action (after the PR is merged) not for the failed tests that are part of the open PR, so in your case the tests that are caught while the PR is open, the automaton will not create issues for these. Once the PR is merged by the maintainers after the gradle check is green, there is one more gradle check that gets triggered based on the merge commit, so whatever tests fail as part of this commit are considered flaky as they just showed green for the PR to get merged. The metrics project is collecting this data and will use it for issue creation.

Thank you

prudhvigodithi · 2024-06-05T21:30:20Z

Hey @reta coming from #14012 I can see the test org.opensearch.cache.store.disk.EhCacheDiskCacheTests.testComputeIfAbsentConcurrently failure here in the Metrics Dashboard, I dont see this test failing as part of the Post Merge Actions, but seen on failing on an open PR #13772, is there way to better mechanism to identify the Flaky tests apart from that are failing from Post Merge Actions? or its based on the judgement of PR creator when the modified code and failure tests are not corelated.
Thanks

reta · 2024-06-05T21:35:36Z

or its based on the judgement of PR creator when the modified code and failure tests are not corelated.

Hey @prudhvigodithi , this is fair point, I will close the #14012 for now since indeed the code is not merged but should not be in any correlation. Thanks for bringing it up!

prudhvigodithi · 2024-06-05T22:05:26Z

Thanks @reta, but there could be a chance where it was lucky and have not failed in post merge actions :) we should even have a mechanism to flag these type of flaky issues as well that are part of the open PR.
@getsaurabh02

prudhvigodithi · 2024-06-07T17:11:01Z

I will start with an automation by creating issues at class level in the following format. I have noticed one class can have multiple failing tests so rather than creating multiple issues for each failing test we can group at class level. For example the class IndicesRequestCacheIT has multiple failing tests for different PR's. The automation upon scheduled cron will auto refresh the list updating the issue body. The GitHub Issue will be created as follows (the Github UI will take care of linking the PR's with the issue once the issue mentions the PR):

Title:

[AUTOCUT] The Gradle check encountered flaky failures with IndicesRequestCacheIT.

Noticed the IndicesRequestCacheIT has some flaky, failing tests that failed during post-merge actions.

Git Reference	Merged Pull Request	Build Details	Test Name
`e9b6a8d`	13590	40040	`org.opensearch.indices.IndicesRequestCacheIT.testDynamicStalenessThresholdUpdate {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}`
`de13aca`	13920	39716	`org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"false"}}`
`b06d0b9`	14062	40165	`org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"false"}}` `org.opensearch.indices.IndicesRequestCacheIT.testCacheCleanupWithDefaultSettings {p0={"search.concurrent_segment_search.enabled":"false"}}` `org.opensearch.indices.IndicesRequestCacheIT.testCacheCleanupOnEqualStalenessAndThreshold {p0={"search.concurrent_segment_search.enabled":"true"}}` `org.opensearch.indices.IndicesRequestCacheIT.classMethod`

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the IndicesRequestCacheIT class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

@getsaurabh02 @dblock @reta @msfroh @andrross

reta · 2024-06-07T17:19:54Z

The GitHub Issue will be created as follows (the Github UI will take care of linking the PR's with the issue once the issue mentions the PR):

Thanks @prudhvigodithi , it looks awesome to start with

prudhvigodithi · 2024-06-11T17:07:05Z

Hey Just an update on this, I have the issues created in my fork repo with automation
https://github.com/prudhvigodithi/OpenSearch/issues/25
prudhvigodithi#24
I have the PR open opensearch-project/opensearch-build-libraries#436 and under review for pushing the library that can run this automation form our Jenkins querying the metrics cluster.

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?
Moving forward we can use something like test-failure.
Thank you
@getsaurabh02

reta · 2024-06-11T17:16:12Z

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?

I know we use > but have not context why (my guess is to highlight such labels as more important than others)

andrross · 2024-06-12T14:45:32Z

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?

I know we use > but have not context why (my guess is to highlight such labels as more important than others)

That answer is probably lost to history. I don't think it has any special meaning as far as I know.

prudhvigodithi · 2024-06-12T17:24:23Z

Thanks @reta and @andrross, in that case ya we can continue to use >test-failure, during issue creation with automation I will as well add the >test-failure label.

prudhvigodithi · 2024-06-13T21:51:25Z

The automation flagged and created the following issues (46 of them) which are identifying as flaky tests from past one month.

#14332
#14331
#14330
#14330
#14328
#14327
#14326
#14325
#14324
#14323
#14322
#14321
#14320
#14319
#14318
#14317
#14316
#14315
#14314
#14313
#14312
#14311
#14310
#14309
#14308
#14307
#14306
#14305
#14304
#14303
#14302
#14301
#14300
#14299
#14298
#14297
#14296
#14295
#14294
#14293
#14292
#14291
#14290
#14289
#14288
#14287

Adding @andrross @reta @dblock @msfroh @getsaurabh02 to please take a look.

prudhvigodithi · 2024-06-13T21:53:28Z

I just created a PR #14334 to remove the issue creation from the gradle check workflow.

prudhvigodithi · 2024-06-13T22:08:29Z

The 2nd time the automation wont create new issues but rather updates the existing issue body if an issue for a flaky test already exists. FYI Re-ran the job to validate this https://build.ci.opensearch.org/job/gradle-check-flaky-test-detector/4/console.

prudhvigodithi · 2024-06-13T22:59:36Z

Hey FYI, I have added a new visualizations to OpenSearch Gradle Check Metrics dashboard to track the trend of these flaky issues.

reta · 2024-06-14T01:27:42Z

Adding @andrross @reta @dblock @msfroh @getsaurabh02 to please take a look.

It looks pretty cool, thanks @prudhvigodithi ! Just one question, I noticed the flaky-test label is not there (but there is a new one > test-failure, that's on purpose? Thanks!

dblock · 2024-06-14T14:03:38Z

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

prudhvigodithi · 2024-06-14T15:56:23Z

It looks pretty cool, thanks @prudhvigodithi ! Just one question, I noticed the flaky-test label is not there (but there is a new one > test-failure, that's on purpose? Thanks!

Hey @reta I just referred this issue in past #14255 and added >test-failure label, I can also add the flaky-test to the automation and it will update the existing open issues as well. Please let me know. Do we need both flaky-test and >test-failure ?

prudhvigodithi · 2024-06-14T15:57:24Z

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

Sure @dblock, we should do that.

reta · 2024-06-14T16:37:07Z

Please let me know. Do we need both flaky-test and >test-failure ?

That would help to preserve any existing boards / filters, thanks @prudhvigodithi

prudhvigodithi · 2024-06-14T18:28:58Z

Can I get the following backport PR's approved please?
#14352
#14351
#14350
#14349

dblock · 2024-06-17T14:21:46Z

@prudhvigodithi done

prudhvigodithi · 2024-06-17T15:55:27Z

Thanks @dblock.

prudhvigodithi · 2024-06-17T16:16:03Z

Hey @andrross @dblock @reta, since we have the automation in place please let me know if we can close this issue ?

For:

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

There are 145 manual created issue for the flaky tests, we should go over them and close if there is already an automation issue created for the same flaky test.

@getsaurabh02

reta · 2024-06-17T16:21:21Z

Hey @andrross @dblock @reta, since we have the automation in place please let me know if we can close this issue ?

I think we could close this issue indeed, thanks @prudhvigodithi

andrross · 2024-06-17T23:53:12Z

Closing. Thanks @prudhvigodithi!

prudhvigodithi added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 3, 2024

github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Jun 3, 2024

prudhvigodithi changed the title ~~[Feature Request] Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard~~ Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard Jun 3, 2024

prudhvigodithi mentioned this issue Jun 3, 2024

Better visibility into test failures over time #11217

Closed

prudhvigodithi added this to OpenSearch Engineering Effectiveness Jun 3, 2024

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Jun 3, 2024

prudhvigodithi moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Jun 3, 2024

prudhvigodithi self-assigned this Jun 3, 2024

peternied changed the title ~~Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard~~ Add additional details on Gradle Check failures autocut issues Jun 5, 2024

peternied removed the untriaged label Jun 5, 2024

prudhvigodithi mentioned this issue Jun 11, 2024

Creating Issue Reports for Flaky Test Failures in Gradle Check opensearch-project/opensearch-build-libraries#436

Merged

prudhvigodithi pinned this issue Jun 12, 2024

prudhvigodithi unpinned this issue Jun 12, 2024

reta mentioned this issue Jun 12, 2024

[AUTOCUT] Gradle Check Failure on push to main #14240

Closed

prudhvigodithi mentioned this issue Jun 13, 2024

Update the lib to 6.5.0 and add gradle-check-flaky-test-issue-creation.jenkinsfile. opensearch-project/opensearch-build#4777

Merged

This was referenced Jun 13, 2024

Typo: wrong remote URL in gradle-check-flaky-test-issue-creation.jenkinsfile opensearch-project/opensearch-build#4778

Merged

Update Gradle check workflow to remove the Create Issue On Push Failure section. #14334

Merged

This was referenced Jun 14, 2024

Rename the library to gradleCheckFlakyTestDetector and flaky-test label to created issue. opensearch-project/opensearch-build-libraries#445

Merged

Update gradle-check-flaky-test-issue-creation.jenkinsfile. opensearch-project/opensearch-build#4782

Merged

prudhvigodithi moved this from In Progress to on hold in OpenSearch Engineering Effectiveness Jun 17, 2024

andrross closed this as completed Jun 17, 2024

github-project-automation bot moved this from on hold to Done in OpenSearch Engineering Effectiveness Jun 17, 2024

This was referenced Jun 18, 2024

Update DEVELOPER_GUIDE.md to add gradle-check-flaky-test-detector automation information #14417

Merged

[Automation Enhancement] Mechanism to close the created Gradle Check AUTOCUT flaky test issues. #14475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional details on Gradle Check failures autocut issues #13950

Add additional details on Gradle Check failures autocut issues #13950

prudhvigodithi commented Jun 3, 2024 •

edited

Loading

msfroh commented Jun 3, 2024

prudhvigodithi commented Jun 3, 2024 •

edited

Loading

msfroh commented Jun 4, 2024

peternied commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024

reta commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024 •

edited

Loading

prudhvigodithi commented Jun 7, 2024 •

edited

Loading

reta commented Jun 7, 2024

prudhvigodithi commented Jun 11, 2024 •

edited

Loading

reta commented Jun 11, 2024

andrross commented Jun 12, 2024

prudhvigodithi commented Jun 12, 2024

prudhvigodithi commented Jun 13, 2024 •

edited

Loading

prudhvigodithi commented Jun 13, 2024

prudhvigodithi commented Jun 13, 2024

prudhvigodithi commented Jun 13, 2024

reta commented Jun 14, 2024

dblock commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

reta commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

dblock commented Jun 17, 2024

prudhvigodithi commented Jun 17, 2024

prudhvigodithi commented Jun 17, 2024

reta commented Jun 17, 2024

andrross commented Jun 17, 2024

Add additional details on Gradle Check failures autocut issues #13950

Add additional details on Gradle Check failures autocut issues #13950

Comments

prudhvigodithi commented Jun 3, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

msfroh commented Jun 3, 2024

prudhvigodithi commented Jun 3, 2024 • edited Loading

msfroh commented Jun 4, 2024

peternied commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024

reta commented Jun 5, 2024

prudhvigodithi commented Jun 5, 2024 • edited Loading

prudhvigodithi commented Jun 7, 2024 • edited Loading

Title:

reta commented Jun 7, 2024

prudhvigodithi commented Jun 11, 2024 • edited Loading

reta commented Jun 11, 2024

andrross commented Jun 12, 2024

prudhvigodithi commented Jun 12, 2024

prudhvigodithi commented Jun 13, 2024 • edited Loading

prudhvigodithi commented Jun 13, 2024

prudhvigodithi commented Jun 13, 2024

prudhvigodithi commented Jun 13, 2024

reta commented Jun 14, 2024

dblock commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

reta commented Jun 14, 2024

prudhvigodithi commented Jun 14, 2024

dblock commented Jun 17, 2024

prudhvigodithi commented Jun 17, 2024

prudhvigodithi commented Jun 17, 2024

reta commented Jun 17, 2024

andrross commented Jun 17, 2024

prudhvigodithi commented Jun 3, 2024 •

edited

Loading

prudhvigodithi commented Jun 3, 2024 •

edited

Loading

prudhvigodithi commented Jun 5, 2024 •

edited

Loading

prudhvigodithi commented Jun 7, 2024 •

edited

Loading

prudhvigodithi commented Jun 11, 2024 •

edited

Loading

prudhvigodithi commented Jun 13, 2024 •

edited

Loading