Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement repair queue #692

Merged
merged 5 commits into from
Jan 25, 2024
Merged

Implement repair queue #692

merged 5 commits into from
Jan 25, 2024

Conversation

morimoto-cybozu
Copy link
Contributor

Signed-off-by: morimoto-cybozu [email protected]

@morimoto-cybozu morimoto-cybozu self-assigned this Dec 25, 2023
@morimoto-cybozu morimoto-cybozu force-pushed the repair-queue branch 13 times, most recently from 7f482c6 to e1841a9 Compare December 26, 2023 12:45
@morimoto-cybozu morimoto-cybozu changed the title Extend cluster configuration for repair operations Implement repair queue Dec 26, 2023
@morimoto-cybozu morimoto-cybozu marked this pull request as ready for review December 26, 2023 13:18
zeroalphat
zeroalphat previously approved these changes Jan 15, 2024
Copy link
Contributor

@zeroalphat zeroalphat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

zeroalphat
zeroalphat previously approved these changes Jan 17, 2024
Copy link
Contributor

@zeroalphat zeroalphat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@masa213f masa213f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delayed review.

phase.go Outdated Show resolved Hide resolved
pkg/ckecli/cmd/repair_queue_list.go Outdated Show resolved Hide resolved
docs/repair.md Outdated Show resolved Hide resolved
op/repair_drain_start.go Outdated Show resolved Hide resolved
"address": entry.Address,
log.FnError: err,
})
entry.Status = cke.RepairStatusProcessing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the reboot queue, the "queued" status is set when drain is failed.
Why is processing status used in repaire queue?

https://github.com/cybozu-go/cke/blob/v1.27.3/op/reboot.go#L552

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masa213f
I introduced the processing status for the repair queue to separate (a) the status of a queue entry and (b) the status of a step in an entry.
It would be confusing to manage both (a) and (b) in one field.

Each repair queue entry is designed to have multiple repair steps.
Currently we have three types of step statuses, waiting, draining, and watching.
When processing an entry, we should manage the loop of waiting, draining, and watching.
Furthermore, we may add new step statuses to extend the repair step's ability.
It would be hard to manage the transition diagram of the mixture of the entry statuses and the step statuses while increasing the types of step statuses.

I prefer to separate the entry statuses and the step statuses for simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the description. I understood.

When a node's repair waits for the drain in step 2, the CKE will execute the other node's repair.
And then, more entries than "max_concurrent_repairs" will be in the "processing" status.

op/status.go Outdated Show resolved Hide resolved
masa213f
masa213f previously approved these changes Jan 24, 2024
Signed-off-by: morimoto-cybozu <[email protected]>
Signed-off-by: morimoto-cybozu <[email protected]>
Signed-off-by: morimoto-cybozu <[email protected]>
Copy link
Contributor

@masa213f masa213f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@YZ775 YZ775 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zeroalphat zeroalphat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morimoto-cybozu morimoto-cybozu merged commit ad3cd0f into main Jan 25, 2024
7 checks passed
@morimoto-cybozu morimoto-cybozu deleted the repair-queue branch January 25, 2024 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants