Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional details on Gradle Check failures autocut issues #13950

Closed
prudhvigodithi opened this issue Jun 3, 2024 · 29 comments
Closed

Add additional details on Gradle Check failures autocut issues #13950

prudhvigodithi opened this issue Jun 3, 2024 · 29 comments
Assignees
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request

Comments

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Jun 3, 2024

Is your feature request related to a problem? Please describe

Coming from #11217 and #3713 we now have the the failure analytics for OpenSearch Gradle Check failures. For more details check the Developer Guide. The next step is to highlight the flaky failures as GitHub Issues so they can be prioritized and fixed. Additionally, when a test fails during the creation of a new PR but is not related to the PR's code changes, this process will help make the contributor aware of the existing flaky test.

Describe the solution you'd like

From the OpenSearch Gradle Check MetricsOpenSearch Gradle Check Metrics dashboard create an issue (with specific format) on failing tests that are part of post merge actions. The failing tests from post merge actions which are executed after the PR is merged are for sure the flaky tests.
Screenshot 2024-06-03 at 11 50 59 AM

Related component

Build

Describe alternatives you've considered

Today the gradle check failure issues are created from https://github.com/opensearch-project/OpenSearch/blob/main/.github/workflows/gradle-check.yml#L161-L168 which sometimes fails to execute https://github.com/opensearch-project/OpenSearch/actions/runs/9320653340/job/25657907035, so clean up the issues and disable the functionality in the gradle-check.yml.

Additional context

Coming from @dblock #11217 (comment) we should also

  1. For any failed test, lookup an existing flaky test issue, if it exists, comment.
  2. For any new failure, highlight it in comments with something like "new flaky test? please check and open one manually".
@prudhvigodithi prudhvigodithi added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 3, 2024
@github-actions github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Jun 3, 2024
@prudhvigodithi prudhvigodithi changed the title [Feature Request] Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard Jun 3, 2024
@prudhvigodithi prudhvigodithi moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Jun 3, 2024
@prudhvigodithi prudhvigodithi self-assigned this Jun 3, 2024
@msfroh
Copy link
Collaborator

msfroh commented Jun 3, 2024

I added a comment on #11217 (comment), highlighting that we need to avoid tainted data from tests broken by changes in an open PR, but that get fixed before the PR is merged. (Short story: I added a commit to fix 1 test and broke over 1000 other tests. Nobody else saw it though, because my mistake was limited to my open PR.)

Maybe a good heuristic could look at tests that fail across multiple PRs in some time window? Alternatively, if we have a job that just runs Gradle check continuously (without changing any code), it could be a good canary to collect "true" failures.

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 3, 2024

Short story: I added a commit to fix 1 test and broke over 1000 other tests. Nobody else saw it though, because my mistake was limited to my open PR.

Thanks @msfroh, the question I have is for I added a commit to fix 1 test and broke over 1000 other tests are you referring after the PR was merged and it broke 1000 other tests? AFIAK it should be caught while the PR is open itself because today we run gradle check against the commit part of the open PR and against the commit that was merged (after the PR was merged and part of post merge action).

@msfroh
Copy link
Collaborator

msfroh commented Jun 4, 2024

are you referring after the PR was merged and it broke 1000 other tests? AFIAK it should be caught while the PR is open itself because today we run gradle check against the commit part of the open PR

Correct -- it was caught while the PR was still open. I noticed that those 1000+ test failures showed up on the dashboard (and made it look like ClientYamlTestSuiteIT is the biggest source of failures in the past week).

@peternied peternied changed the title Create a meaningful GitHub issue for flaky Gradle Check failures based on the data in metrics dashboard Add additional details on Gradle Check failures autocut issues Jun 5, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@prudhvigodithi Thanks for creating this issue

@prudhvigodithi
Copy link
Member Author

Maybe a good heuristic could look at tests that fail across multiple PRs in some time window? Alternatively, if we have a job that just runs Gradle check continuously (without changing any code), it could be a good canary to collect "true" failures.

Hey @msfroh thanks for your input, the idea is to create a issue on when a test is failed in Post Merge Action (after the PR is merged) not for the failed tests that are part of the open PR, so in your case the tests that are caught while the PR is open, the automaton will not create issues for these. Once the PR is merged by the maintainers after the gradle check is green, there is one more gradle check that gets triggered based on the merge commit, so whatever tests fail as part of this commit are considered flaky as they just showed green for the PR to get merged. The metrics project is collecting this data and will use it for issue creation.
Screenshot 2024-06-05 at 10 00 42 AM
Thank you

@prudhvigodithi
Copy link
Member Author

Hey @reta coming from #14012 I can see the test org.opensearch.cache.store.disk.EhCacheDiskCacheTests.testComputeIfAbsentConcurrently failure here in the Metrics Dashboard, I dont see this test failing as part of the Post Merge Actions, but seen on failing on an open PR #13772, is there way to better mechanism to identify the Flaky tests apart from that are failing from Post Merge Actions? or its based on the judgement of PR creator when the modified code and failure tests are not corelated.
Thanks

@reta
Copy link
Collaborator

reta commented Jun 5, 2024

or its based on the judgement of PR creator when the modified code and failure tests are not corelated.

Hey @prudhvigodithi , this is fair point, I will close the #14012 for now since indeed the code is not merged but should not be in any correlation. Thanks for bringing it up!

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 5, 2024

Thanks @reta, but there could be a chance where it was lucky and have not failed in post merge actions :) we should even have a mechanism to flag these type of flaky issues as well that are part of the open PR.
@getsaurabh02

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 7, 2024

I will start with an automation by creating issues at class level in the following format. I have noticed one class can have multiple failing tests so rather than creating multiple issues for each failing test we can group at class level. For example the class IndicesRequestCacheIT has multiple failing tests for different PR's. The automation upon scheduled cron will auto refresh the list updating the issue body. The GitHub Issue will be created as follows (the Github UI will take care of linking the PR's with the issue once the issue mentions the PR):

Title:

[AUTOCUT] The Gradle check encountered flaky failures with IndicesRequestCacheIT.

Noticed the IndicesRequestCacheIT has some flaky, failing tests that failed during post-merge actions.

Git Reference Merged Pull Request Build Details Test Name
e9b6a8d 13590 40040 org.opensearch.indices.IndicesRequestCacheIT.testDynamicStalenessThresholdUpdate {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}
de13aca 13920 39716 org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"false"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"false"}}
b06d0b9 14062 40165 org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"search.concurrent_segment_search.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"false"}}

org.opensearch.indices.IndicesRequestCacheIT.testCacheCleanupWithDefaultSettings {p0={"search.concurrent_segment_search.enabled":"false"}}

org.opensearch.indices.IndicesRequestCacheIT.testCacheCleanupOnEqualStalenessAndThreshold {p0={"search.concurrent_segment_search.enabled":"true"}}

org.opensearch.indices.IndicesRequestCacheIT.classMethod

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the IndicesRequestCacheIT class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

@getsaurabh02 @dblock @reta @msfroh @andrross

@reta
Copy link
Collaborator

reta commented Jun 7, 2024

The GitHub Issue will be created as follows (the Github UI will take care of linking the PR's with the issue once the issue mentions the PR):

Thanks @prudhvigodithi , it looks awesome to start with

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 11, 2024

Hey Just an update on this, I have the issues created in my fork repo with automation
https://github.com/prudhvigodithi/OpenSearch/issues/25
prudhvigodithi#24
I have the PR open opensearch-project/opensearch-build-libraries#436 and under review for pushing the library that can run this automation form our Jenkins querying the metrics cluster.

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?
Moving forward we can use something like test-failure.
Thank you
@getsaurabh02

@reta
Copy link
Collaborator

reta commented Jun 11, 2024

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?

I know we use > but have not context why (my guess is to highlight such labels as more important than others)

@andrross
Copy link
Member

Also what I noticed is the label used today is >test-failure coming from the template, just curious if > is an error or was purposely added? @andrross @dblock @reta @peternied ?

I know we use > but have not context why (my guess is to highlight such labels as more important than others)

That answer is probably lost to history. I don't think it has any special meaning as far as I know.

@prudhvigodithi prudhvigodithi pinned this issue Jun 12, 2024
@prudhvigodithi prudhvigodithi unpinned this issue Jun 12, 2024
@prudhvigodithi
Copy link
Member Author

Thanks @reta and @andrross, in that case ya we can continue to use >test-failure, during issue creation with automation I will as well add the >test-failure label.

@prudhvigodithi
Copy link
Member Author

prudhvigodithi commented Jun 13, 2024

The automation flagged and created the following issues (46 of them) which are identifying as flaky tests from past one month.

#14332
#14331
#14330
#14330
#14328
#14327
#14326
#14325
#14324
#14323
#14322
#14321
#14320
#14319
#14318
#14317
#14316
#14315
#14314
#14313
#14312
#14311
#14310
#14309
#14308
#14307
#14306
#14305
#14304
#14303
#14302
#14301
#14300
#14299
#14298
#14297
#14296
#14295
#14294
#14293
#14292
#14291
#14290
#14289
#14288
#14287

Adding @andrross @reta @dblock @msfroh @getsaurabh02 to please take a look.

@prudhvigodithi
Copy link
Member Author

I just created a PR #14334 to remove the issue creation from the gradle check workflow.

@prudhvigodithi
Copy link
Member Author

The 2nd time the automation wont create new issues but rather updates the existing issue body if an issue for a flaky test already exists. FYI Re-ran the job to validate this https://build.ci.opensearch.org/job/gradle-check-flaky-test-detector/4/console.

@prudhvigodithi
Copy link
Member Author

Hey FYI, I have added a new visualizations to OpenSearch Gradle Check Metrics dashboard to track the trend of these flaky issues.

Screenshot 2024-06-13 at 3 58 49 PM

@reta
Copy link
Collaborator

reta commented Jun 14, 2024

Adding @andrross @reta @dblock @msfroh @getsaurabh02 to please take a look.

It looks pretty cool, thanks @prudhvigodithi ! Just one question, I noticed the flaky-test label is not there (but there is a new one > test-failure, that's on purpose? Thanks!

@dblock
Copy link
Member

dblock commented Jun 14, 2024

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

@prudhvigodithi
Copy link
Member Author

It looks pretty cool, thanks @prudhvigodithi ! Just one question, I noticed the flaky-test label is not there (but there is a new one > test-failure, that's on purpose? Thanks!

Hey @reta I just referred this issue in past #14255 and added >test-failure label, I can also add the flaky-test to the automation and it will update the existing open issues as well. Please let me know. Do we need both flaky-test and >test-failure ?

@prudhvigodithi
Copy link
Member Author

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

Sure @dblock, we should do that.

@reta
Copy link
Collaborator

reta commented Jun 14, 2024

Please let me know. Do we need both flaky-test and >test-failure ?

That would help to preserve any existing boards / filters, thanks @prudhvigodithi

@prudhvigodithi
Copy link
Member Author

Can I get the following backport PR's approved please?
#14352
#14351
#14350
#14349

@dblock
Copy link
Member

dblock commented Jun 17, 2024

@prudhvigodithi done

@prudhvigodithi
Copy link
Member Author

Thanks @dblock.

@prudhvigodithi
Copy link
Member Author

Hey @andrross @dblock @reta, since we have the automation in place please let me know if we can close this issue ?

For:

Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs?

There are 145 manual created issue for the flaky tests, we should go over them and close if there is already an automation issue created for the same flaky test.

@getsaurabh02

@prudhvigodithi prudhvigodithi moved this from In Progress to on hold in OpenSearch Engineering Effectiveness Jun 17, 2024
@reta
Copy link
Collaborator

reta commented Jun 17, 2024

Hey @andrross @dblock @reta, since we have the automation in place please let me know if we can close this issue ?

I think we could close this issue indeed, thanks @prudhvigodithi

@andrross
Copy link
Member

Closing. Thanks @prudhvigodithi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request
Projects
Development

No branches or pull requests

6 participants