Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the daily checker buildfarm #57

Open
Crola1702 opened this issue Jun 14, 2024 · 6 comments
Open

Improvements to the daily checker buildfarm #57

Crola1702 opened this issue Jun 14, 2024 · 6 comments
Assignees

Comments

@Crola1702
Copy link
Contributor

Crola1702 commented Jun 14, 2024

Description

Throughout the green buildfarm subproject development, we've found some issues with current daily checker workflow. The current prioritization method of the checker workflow.

Some considerations we should have for improved daily checker workflow:

Prioritization:

  • Consecutive test failures:
    • Errors occurring three times in a row will be prioritized
    • Errors that have appeared three times at least in a 2 week windows, will also be prioritized
  • Flaky test categorization:
    • ~7%: or errors that happened only once: Not reported
    • 7-20%: reported an announced in the weekly meetings. Consider as known issues
    • 20%-100%: reported and assigned a develioper to take a look
  • Keep track of disabled (skipped?) tests

Check buildfarm script

  • Modification to report tests failing 3 times in a row
  • Gazebo jobs should trigger a warning if failing 2 times in a row and not debbuilders
  • Warnings should be addressed, so we should monitor often jobs failing for 5 times in a row
  • We should monitor the output of a single sql script that give us jobs that dont have any success:

Sample report.

This is an example of how new reports should look like:

Sample report:

Buildfarmer log

Probably skip for saturday and sunday (show big log diff on Monday)

New X items to investigate (+/- Y): ?? No new issues!

Show new reports that didn’t existed yesterday

Build regressions:

Show only consistent regressions
For build regressions should keep just 1 time in a row

  • Issue in job : failed X times in a row

  • Issue in job : happened Y times in the last 2 weeks (flaky)

    Ignore ClosedChanel exception ones

Test regressions:

  • Issue in job : failed X times in a row
  • Issue in job : happened Y times in the last 2 weeks (flaky)

Warnings:

  • Job contains warnings (+/- X)

Continue investigating: X items (+/- Y):

Show reports that still exists from yesterday

Build regressions:

Show only consistent regressions
For build regressions should keep just 1 time in a row

  • Issue in job : failed X times in a row
  • Issue in job : happened Y times in the last 2 weeks (flaky)

Test regressions:

  • Issue in job : failed X times in a row
  • Issue in job : happened Y times in the last 2 weeks (flaky)

Warnings:

  • Job contains warnings

Old issues:

Show known issues

Jobs to check:

  • Job hasn’t passed in days

Reported issues

Integration with gh cli

  • Issue hasn’t been updated in days
  • Issue hasn’t happened in days. should check!

Disabled/Skipped tests:

  • Total: (+/- X)
@Blast545
Copy link
Contributor

Thanks for tracking everything we discussed to track, I think there's nothing missing there.

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

Comparing against our current implementation:

### Build regressions
      * Builds failing today
### Test regressions 
      * Builds with regressions today

I think we could have something along the lines of:


## Higher priority
### Jobs with two build consecutive build regression (X items)
      * link to jobs  + sorting by number of days
### Build regressions in latest build of each job (Y items)
      * Builds 
### Jobs without any success and at least 1 entry
      * Link to job
### Jobs not green at least 3 times in a row
      * Builds

## Prioritized test regressions
### Tests failing 3 times in a row
      * Test name and flakiness per build
### Tests failing 3 times in a 2 week window
      * Test name and flakiness per build

## Test regressions all
### Test regressions without an issue
      * Test name and flakiness per build
### Keeping track
      * Issues

We should do our best to keep the "Higher priority " and the "Prioritized test regressions" as clean / organized as possible, to make it possible for people without any buildfarmer payload to browse it.

WDYT? @Crola1702
cc: @claraberendsen

@Crola1702
Copy link
Contributor Author

In terms of the categorization, I think the wording New X items to investigate (+/- Y): ?? No new issues! can be confusing for the buildfarmer investigating issues and we can have a problem if we ignore issues just because those are reported.

My idea adding the "New X items" was not letting new problems entered the buildfarm jobs. I don't think we'll ignore issues because they are reported, that's the reason behind Reported issues section (show which issues we should update re-check to keep them up-to-date. Also, now that I think about it, we can probably add an "All Reported issues" section, like a daily or weekly report of what's being reported).

If I was checking the report, I would check the number diff and see Reported issues (+10), then update/close them with new information

### Build regressions in latest build of each job (Y items)

I'm not sure what do you mean "in the latest build of each job"

### Jobs without any success and at least 1 entry

I don't think this is something we should add as higher priority.

As I've said before, IMO, jobs that haven't passed since ever are closer to "maintenance" tasks (keep buildfarm green) instead of priority tasks (report new regressions to dev teams)

Also, there are multiple jobs that haven't passed in a lot of time. Additionally, when new releases land, new jobs copy the state of its parents (e.g., Jazzy release from Rolling release or gz-sim- from gz-sim-main), and it would add more verbosity to this output. I don't think that amount of verbosity should go in the higher priority items

### Jobs not green at least 3 times in a row

This should be divided in build regressions and test regressions. I see some cases where there is an order of (BR, TR, TR or TR, BR, TR) that doesn't seem to be important to investigate. I rather prioritize unstable builds 3 times in a row (warnings or test regressions), and build failures are prioritized above (Jobs with two build consecutive build regression).

### Tests failing 3 times in a row

Covered in my comment above


In general, I think it would be valuable to add Higher Priority section, having new build regressions and adding consistent test regressions there. Jobs without any success would be on the "Old issues" (maybe rename to "Maintenance" section?), as they're not a priority on the daily basis.

I think it is worth keeping the New X items to investigate (+/- Y): ?? No new issues! because the reasons I mentioned above. And have the "Continue Investigating" section renamed to "Investigation priorities" and have the Higher Priority section there.

@Crola1702
Copy link
Contributor Author

Report format:

# Urgent investigations
	New items will have a **NEW** sufix added
	
	## Build regressions (all) (+/- X)
		* (known build regressions are ignored)
		
	## Not reported consistent Test regressions (3+ consective times) (+/- X)
		* [TREAT WARNINGS AS TEST REGRESSIONS]
		* (known test regressions are ignored)
	
	## Not reported flaky test regressions (3+ times in a 2 week window) (+/- X)
		* (known test regressions are ignored)

# Maintenance
	
	## Jobs that have fail for {x (number of buidls < [x]), all time}
		* "All time" first
		* Sorted by number of fails
	
	## Reported issues
		* Issue hasn’t been updated in days. Should check!
		* Issue hasn’t happened in days. Should close!
		* Issue doen't have assignee (Next iteration 2)
		
	## Disabled tests: (number)
		* Which (Iteration 2)

# Pending investigations
	
	## Build regressions Known
		* All build regressions that don't fit in the constriants above
	
	## Test regressions All not reported
		* All test regressions that don't fit in the constriants above

	## Test regressions reported

Features priority (for first iteration)

  1. Urgent investigations
  2. Maintenance (jobs that have fail for…)
  3. Pedning Test regressions (check_buildfarm output)

@nuclearsandwich
Copy link
Member

  • (known build regressions are ignored)

I think it's great to report the breaking news in order to make sure that new issues get the most eyeballs while they're fresh. Going along with @Blast545's concern in #62, I think rather than "ignoring" known regressions, listing them in an appendix (and double points for using markdown anchors to link to that appendix) will help keep those from falling off.

@Crola1702
Copy link
Contributor Author

@Blast545 #64 is what I've been working on for formatting.

Ideas about formatting of next sections include:

  • Pretty titles: Some capitalization and spacing to make titles more readable
  • Format as tables with internal lists and summaries: There are really long outputs, and these can make the report pretty much unreadable.
  • Group test consistent regressions: Consistent test regressions can be grouped by job name and age, assuming that two regressions that are happening in the same job and have the same amount of consecutive failures are part of a same issue.
  • Group flaky test regressions: Flaky test regressions can be grouped by similar flakiness amount, that is, if two regressions are happening in the same set of jobs, they might be the same issue (as they're flaky we can't assure it, but it's the most probable statement)

@Blast545
Copy link
Contributor

Blast545 commented Jul 4, 2024

Group flaky test regressions

We already had that, right? Did the previous report took too much time to run? I wonder if we can re-use part of that code in the new formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants