-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Remember failed workflows/td files and run them first in CI #31263
base: main
Are you sure you want to change the base?
Conversation
b09a6c1
to
cf37060
Compare
# raise | ||
# We could also keep running, but then runtime is still | ||
# slow when a test fails, and the annotation only shows up | ||
# after the test finished: | ||
exceptions.append(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is what I'm unsure about in this PR. We collect the test failures at the end of the test run and only then mark the test as failed and write the annotation. Changing this would be a really major change.
So we currently have two options:
- Stop the run after the first part (workflow, td/slt file) failed, and then immediately get feedback on that one, but not on the remaining parts.
- Keep running all parts, and get later feedback with all.
I'm currently going with the second approach, but that means you only notice an early failure if you check the logs manually.
So I don't see a way around changing the annotation logic to be able to annotate during runs already, if we want faster feedback and still run all parts after failures. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth running all tests, to avoid death by a thousand cuts.
If someone cancels a workflow, can we still collect any failures that happened so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, perfect. I already tend to watch CI for early failures and can cancel as necessary, so this seems like the best option. Nice!
cf37060
to
7558f85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really comment on the code quality of all the runner stuff, but the policy described seems really sensible to me. Thank you for the quick turnaround!
We can give merging this a try and see if it works well, will merge it tomorrow if there are no complaints. Another test run meanwhile: |
to be able to run them first in the next run on a PR
7558f85
to
a7e3622
Compare
We have three priorities to determine the order of parts (workflows, td/slt files):
When fetching the order fails (timeout 15 seconds), we just run in the same order as before.
This uses the Materialize Production Analytics database, so also another nice use case for dog fooding
Fixes: https://github.com/MaterializeInc/database-issues/issues/8935
Verification runs:
https://buildkite.com/materialize/test/builds/98151
https://buildkite.com/materialize/nightly/builds/11053
https://buildkite.com/materialize/nightly/builds/11051
https://buildkite.com/materialize/release-qualification/builds/739
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.