Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/dns: update TestDNSResolver_ExponentialBackoff to not return error before last resolution attempt #8061

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Feb 3, 2025

Fixes: #7760

The TestDNSResolver_ExponentialBackoff test verifies that the DNS resolver correctly implements exponential backoff when encountering errors. It simulates a failing resolver and forces it to retry 10 times. The test overrides the standard time.After function, used for backoff delays, with a custom implementation overrideTimeAfterFuncWithChannel which uses two channels: durChan to receive the intended backoff duration from the resolver, and timeChan to signal the resolver that the waiting period is over. In each iteration of the test loop (representing a retry attempt), the test first reads from durChan to verify if the back off happened. Because of the test setup, this read usually gets the backoff duration from the previous resolution attempt (including the first attempt from resolver build). The test then immediately unblocks the resolver by sending a signal to timeChan.

The problem arose in the last iteration. The loop would exit without reading the last backoff duration from durChan, and right after, the test would set an atomic bool flag returnNilErr to true to make the resolver's update function succeed. However, a race condition existed: if the resolver happened to start its 10th and final resolution attempt before this flag was set to true, that attempt would fail, trigger a backoff and consequently put a value into durChan. The test expected this final attempt to always succeed because of the flag being set right after exiting the loop. The test then checked if durChan was empty, and if it wasn't (meaning an unexpected backoff occurred), the test would flake. The fix resolves this by ensuring that the flag to force a successful update is set before the last resolution attempt begins. This guarantees that the last attempt will always succeed, eliminating the possibility of an unwanted backoff and ensuring that durChan remains empty, thus preventing the test from flaking.

Forge run without fix: https://fusion2.corp.google.com/invocations/4f183c2c-f5cf-4fee-a18d-d5c4da2479bc/targets/%2F%2Fthird_party%2Fgolang%2Fgrpc%2Finternal%2Fresolver%2Fdns:dns_test/tests

Forge run with fix: https://fusion2.corp.google.com/invocations/3bab8892-5a85-4758-bc3f-8437191bffbe/targets/%2F%2Fthird_party%2Fgolang%2Fgrpc%2Finternal%2Fresolver%2Fdns:dns_test/log

RELEASE NOTES: None

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.44%. Comparing base (7dbf12e) to head (8abc4f3).
Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8061   +/-   ##
=======================================
  Coverage   82.43%   82.44%           
=======================================
  Files         388      388           
  Lines       39062    39062           
=======================================
+ Hits        32201    32204    +3     
+ Misses       5546     5541    -5     
- Partials     1315     1317    +2     

see 22 files with indirect coverage changes

@purnesh42H purnesh42H added Type: Testing Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. labels Feb 3, 2025
@purnesh42H purnesh42H added this to the 1.71 Release milestone Feb 3, 2025
@purnesh42H purnesh42H force-pushed the dns-test-exp-back-off-flake branch from 1ba51b7 to 8abc4f3 Compare February 3, 2025 06:43
@easwars easwars assigned purnesh42H and unassigned easwars Feb 3, 2025
@purnesh42H purnesh42H merged commit 947e2a4 into grpc:master Feb 4, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Type: Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky test]: Test/DNSResolver_ExponentialBackoff
3 participants