Improvements to the daily checker buildfarm #57

Crola1702 · 2024-06-14T12:02:45Z

Description

Throughout the green buildfarm subproject development, we've found some issues with current daily checker workflow. The current prioritization method of the checker workflow.

Some considerations we should have for improved daily checker workflow:

Prioritization:

Consecutive test failures:
- Errors occurring three times in a row will be prioritized
- Errors that have appeared three times at least in a 2 week windows, will also be prioritized
Flaky test categorization:
- ~7%: or errors that happened only once: Not reported
- 7-20%: reported an announced in the weekly meetings. Consider as known issues
- 20%-100%: reported and assigned a develioper to take a look
Keep track of disabled (skipped?) tests

Check buildfarm script

Modification to report tests failing 3 times in a row
Gazebo jobs should trigger a warning if failing 2 times in a row and not debbuilders
Warnings should be addressed, so we should monitor often jobs failing for 5 times in a row
We should monitor the output of a single sql script that give us jobs that dont have any success:

Sample report.

This is an example of how new reports should look like:

Sample report:

Buildfarmer log

Probably skip for saturday and sunday (show big log diff on Monday)

New X items to investigate (+/- Y): ?? No new issues!

Show new reports that didn’t existed yesterday

Build regressions:

Show only consistent regressions
For build regressions should keep just 1 time in a row

Issue in job : failed X times in a row
Issue in job : happened Y times in the last 2 weeks (flaky)

Ignore ClosedChanel exception ones

Test regressions:

Issue in job : failed X times in a row
Issue in job : happened Y times in the last 2 weeks (flaky)

Warnings:

Job contains warnings (+/- X)

Continue investigating: X items (+/- Y):

Show reports that still exists from yesterday

Build regressions:

Show only consistent regressions
For build regressions should keep just 1 time in a row

Issue in job : failed X times in a row
Issue in job : happened Y times in the last 2 weeks (flaky)

Test regressions:

Issue in job : failed X times in a row
Issue in job : happened Y times in the last 2 weeks (flaky)

Warnings:

Job contains warnings

Old issues:

Show known issues

Jobs to check:

Job hasn’t passed in days

Reported issues

Integration with gh cli

Issue hasn’t been updated in days
Issue hasn’t happened in days. should check!

Disabled/Skipped tests:

Total: (+/- X)

The text was updated successfully, but these errors were encountered:

Blast545 · 2024-06-18T13:26:31Z

Thanks for tracking everything we discussed to track, I think there's nothing missing there.

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

Comparing against our current implementation:

### Build regressions
      * Builds failing today
### Test regressions 
      * Builds with regressions today

I think we could have something along the lines of:


## Higher priority
### Jobs with two build consecutive build regression (X items)
      * link to jobs  + sorting by number of days
### Build regressions in latest build of each job (Y items)
      * Builds 
### Jobs without any success and at least 1 entry
      * Link to job
### Jobs not green at least 3 times in a row
      * Builds

## Prioritized test regressions
### Tests failing 3 times in a row
      * Test name and flakiness per build
### Tests failing 3 times in a 2 week window
      * Test name and flakiness per build

## Test regressions all
### Test regressions without an issue
      * Test name and flakiness per build
### Keeping track
      * Issues

We should do our best to keep the "Higher priority " and the "Prioritized test regressions" as clean / organized as possible, to make it possible for people without any buildfarmer payload to browse it.

WDYT? @Crola1702
cc: @claraberendsen

Crola1702 · 2024-06-18T16:00:09Z

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

My idea adding the "New X items" was not letting new problems entered the buildfarm jobs. I don't think we'll ignore issues because they are reported, that's the reason behind Reported issues section (show which issues we should update re-check to keep them up-to-date. Also, now that I think about it, we can probably add an "All Reported issues" section, like a daily or weekly report of what's being reported).

If I was checking the report, I would check the number diff and see Reported issues (+10), then update/close them with new information

### Build regressions in latest build of each job (Y items)

I'm not sure what do you mean "in the latest build of each job"

### Jobs without any success and at least 1 entry

I don't think this is something we should add as higher priority.

As I've said before, IMO, jobs that haven't passed since ever are closer to "maintenance" tasks (keep buildfarm green) instead of priority tasks (report new regressions to dev teams)

Also, there are multiple jobs that haven't passed in a lot of time. Additionally, when new releases land, new jobs copy the state of its parents (e.g., Jazzy release from Rolling release or gz-sim- from gz-sim-main), and it would add more verbosity to this output. I don't think that amount of verbosity should go in the higher priority items

### Jobs not green at least 3 times in a row

This should be divided in build regressions and test regressions. I see some cases where there is an order of (BR, TR, TR or TR, BR, TR) that doesn't seem to be important to investigate. I rather prioritize unstable builds 3 times in a row (warnings or test regressions), and build failures are prioritized above (Jobs with two build consecutive build regression).

### Tests failing 3 times in a row

Covered in my comment above

In general, I think it would be valuable to add Higher Priority section, having new build regressions and adding consistent test regressions there. Jobs without any success would be on the "Old issues" (maybe rename to "Maintenance" section?), as they're not a priority on the daily basis.

I think it is worth keeping the New X items to investigate (+/- Y): ?? No new issues! because the reasons I mentioned above. And have the "Continue Investigating" section renamed to "Investigation priorities" and have the Higher Priority section there.

Crola1702 · 2024-06-18T20:18:40Z

Report format:

# Urgent investigations
	New items will have a **NEW** sufix added
	
	## Build regressions (all) (+/- X)
		* (known build regressions are ignored)
		
	## Not reported consistent Test regressions (3+ consective times) (+/- X)
		* [TREAT WARNINGS AS TEST REGRESSIONS]
		* (known test regressions are ignored)
	
	## Not reported flaky test regressions (3+ times in a 2 week window) (+/- X)
		* (known test regressions are ignored)

# Maintenance
	
	## Jobs that have fail for {x (number of buidls < [x]), all time}
		* "All time" first
		* Sorted by number of fails
	
	## Reported issues
		* Issue hasn’t been updated in days. Should check!
		* Issue hasn’t happened in days. Should close!
		* Issue doen't have assignee (Next iteration 2)
		
	## Disabled tests: (number)
		* Which (Iteration 2)

# Pending investigations
	
	## Build regressions Known
		* All build regressions that don't fit in the constriants above
	
	## Test regressions All not reported
		* All test regressions that don't fit in the constriants above

	## Test regressions reported

Features priority (for first iteration)

Urgent investigations
Maintenance (jobs that have fail for…)
Pedning Test regressions (check_buildfarm output)

nuclearsandwich · 2024-06-27T18:16:03Z

(known build regressions are ignored)

I think it's great to report the breaking news in order to make sure that new issues get the most eyeballs while they're fresh. Going along with @Blast545's concern in #62, I think rather than "ignoring" known regressions, listing them in an appendix (and double points for using markdown anchors to link to that appendix) will help keep those from falling off.

Crola1702 · 2024-07-03T19:32:05Z

@Blast545 #64 is what I've been working on for formatting.

Ideas about formatting of next sections include:

Pretty titles: Some capitalization and spacing to make titles more readable
Format as tables with internal lists and summaries: There are really long outputs, and these can make the report pretty much unreadable.
Group test consistent regressions: Consistent test regressions can be grouped by job name and age, assuming that two regressions that are happening in the same job and have the same amount of consecutive failures are part of a same issue.
Group flaky test regressions: Flaky test regressions can be grouped by similar flakiness amount, that is, if two regressions are happening in the same set of jobs, they might be the same issue (as they're flaky we can't assure it, but it's the most probable statement)

Blast545 · 2024-07-04T05:32:18Z

Group flaky test regressions

We already had that, right? Did the previous report took too much time to run? I wonder if we can re-use part of that code in the new formatter.

Crola1702 assigned Blast545 and Crola1702 Jun 14, 2024

Crola1702 mentioned this issue Jun 25, 2024

Generate report script (urgent section) #62

Merged

Blast545 unassigned Crola1702 Jul 1, 2024

Crola1702 mentioned this issue Jul 1, 2024

Report formatter base #64

Merged

This was referenced Jul 31, 2024

Report: Jobs last success date subsection #75

Merged

Daily Report: Known test regressions report subsection #77

Merged

Crola1702 mentioned this issue Sep 3, 2024

Report Section: Test regressions all #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to the daily checker buildfarm #57

Improvements to the daily checker buildfarm #57

Crola1702 commented Jun 14, 2024 •

edited

Loading

Buildfarmer log

New X items to investigate (+/- Y): ?? No new issues!

Build regressions:

Test regressions:

Warnings:

Continue investigating: X items (+/- Y):

Build regressions:

Test regressions:

Warnings:

Old issues:

Jobs to check:

Reported issues

Disabled/Skipped tests:

Blast545 commented Jun 18, 2024

Crola1702 commented Jun 18, 2024

Crola1702 commented Jun 18, 2024

nuclearsandwich commented Jun 27, 2024

Crola1702 commented Jul 3, 2024

Blast545 commented Jul 4, 2024

Improvements to the daily checker buildfarm #57

Improvements to the daily checker buildfarm #57

Comments

Crola1702 commented Jun 14, 2024 • edited Loading

Description

Prioritization:

Check buildfarm script

Sample report.

Buildfarmer log

New X items to investigate (+/- Y): ?? No new issues!

Build regressions:

Test regressions:

Warnings:

Continue investigating: X items (+/- Y):

Build regressions:

Test regressions:

Warnings:

Old issues:

Jobs to check:

Reported issues

Disabled/Skipped tests:

Blast545 commented Jun 18, 2024

Crola1702 commented Jun 18, 2024

Crola1702 commented Jun 18, 2024

nuclearsandwich commented Jun 27, 2024

Crola1702 commented Jul 3, 2024

Blast545 commented Jul 4, 2024

Crola1702 commented Jun 14, 2024 •

edited

Loading