Infra Flake Detection Built on Wrong Assumptions #447

antter · 2021-11-16T17:08:00Z

Is your feature request related to a problem? Please describe.
Currently, we are trying to detect "infra flakes" by looking for a waterfall pattern. This might not be as well-motivated as we originally thought. Currently the tests are ordered by most recent failure, meaning that a waterfall pattern could only occur at the beginning, and an infra flake in the middle would look like random tests failing.

Describe the solution you'd like
An updated model that uses some sort of statistical techniques to find infra flakes that doesn't rely on the order that the tests come in.

Additional context
detailed explanation of infra flakes are here: #1

antter · 2021-11-17T19:47:39Z

Explanation of a possible solution I'm interested in exploring:

Basically, an infra flake happens when we have a handful of tests fail at a close time unexpectedly. The issue here lies in the word "unexpected". If a test is failing one every 5 times, no failure could be considered "unexpected". What is "unexpected" I feel we can only deduce from looking at a single test's history. It becomes a time series, and I am thinking of making an autoregressive model or moving average model, to capture the fact that more recent previous failures will make a failure more likely for a single test. This way we can sort of quantify unexpectedness.

If we have some sort of baseline for when a test failing is "unexpected", then all that would be left would be to do some analysis to see how well this baseline works, and find a way to identify several unexpected failures happening at once.

All of the above has a decent chance of totally failing though, this is a tough dataset.

One issue that keeps coming up while pondering how to classify infra flakes is that it is hard to decide if a test fails as a direct result of another test failing or both tests fail as a result of an infra flake. The distinction is tough, and maybe not possible with this type of dataset. I'm going to ignore this problem for now.

antter · 2021-11-17T20:55:54Z

Also, it may not be all that necessary to make any type of time series model. I think we could get decent results by just simply taking a # failures / # attempts as a metric first. But I do think I'll end up trying both, building off the simple model first. The time series model definitely has potential to be a lot stronger so I'll have to do some sort of comparison at the end.

antter · 2021-11-17T21:49:02Z

And FWIW, I don't believe any left-right is necessary for an infra flake. It seems to happen occasionally because infrastructure is flaky in a dynamic way, and a test can pass but fail an hour later because of it. However, it is also the case that infrastructure would have an issue just for one hour, failing many tests, then everything is fine the next time tests come around.

antter self-assigned this Nov 16, 2021

MichaelClifford assigned Shreyanand Nov 17, 2021

This was referenced Nov 17, 2021

first pass at exploring infra flakes #448

Open

Analyze log ouputs of failed tests, especially those that fail in a column unexpectedly #449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infra Flake Detection Built on Wrong Assumptions #447

Infra Flake Detection Built on Wrong Assumptions #447

antter commented Nov 16, 2021

antter commented Nov 17, 2021

antter commented Nov 17, 2021

antter commented Nov 17, 2021

Infra Flake Detection Built on Wrong Assumptions #447

Infra Flake Detection Built on Wrong Assumptions #447

Comments

antter commented Nov 16, 2021

antter commented Nov 17, 2021

antter commented Nov 17, 2021

antter commented Nov 17, 2021