-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infra Flake Detection Built on Wrong Assumptions #447
Comments
Explanation of a possible solution I'm interested in exploring: Basically, an infra flake happens when we have a handful of tests fail at a close time unexpectedly. The issue here lies in the word "unexpected". If a test is failing one every 5 times, no failure could be considered "unexpected". What is "unexpected" I feel we can only deduce from looking at a single test's history. It becomes a time series, and I am thinking of making an autoregressive model or moving average model, to capture the fact that more recent previous failures will make a failure more likely for a single test. This way we can sort of quantify unexpectedness. If we have some sort of baseline for when a test failing is "unexpected", then all that would be left would be to do some analysis to see how well this baseline works, and find a way to identify several unexpected failures happening at once. All of the above has a decent chance of totally failing though, this is a tough dataset. One issue that keeps coming up while pondering how to classify infra flakes is that it is hard to decide if a test fails as a direct result of another test failing or both tests fail as a result of an infra flake. The distinction is tough, and maybe not possible with this type of dataset. I'm going to ignore this problem for now. |
Also, it may not be all that necessary to make any type of time series model. I think we could get decent results by just simply taking a # failures / # attempts as a metric first. But I do think I'll end up trying both, building off the simple model first. The time series model definitely has potential to be a lot stronger so I'll have to do some sort of comparison at the end. |
And FWIW, I don't believe any left-right is necessary for an infra flake. It seems to happen occasionally because infrastructure is flaky in a dynamic way, and a test can pass but fail an hour later because of it. However, it is also the case that infrastructure would have an issue just for one hour, failing many tests, then everything is fine the next time tests come around. |
Is your feature request related to a problem? Please describe.
Currently, we are trying to detect "infra flakes" by looking for a waterfall pattern. This might not be as well-motivated as we originally thought. Currently the tests are ordered by most recent failure, meaning that a waterfall pattern could only occur at the beginning, and an infra flake in the middle would look like random tests failing.
Describe the solution you'd like
An updated model that uses some sort of statistical techniques to find infra flakes that doesn't rely on the order that the tests come in.
Additional context
detailed explanation of infra flakes are here: #1
The text was updated successfully, but these errors were encountered: