Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress needless introspection re-attempts #413

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

juliakreger
Copy link

Introspection has undergone a number of improvements since the early
days, including better alignment with Ironic itself, and ultimately fixes
in underlying virtual machine configuration for when this tooling is
used with VMs to help prevent some issues which do crop up when using VMs
to pretend to be baremetal, specifically the enforcement of Spanning Tree.

When this legacy retry code gets invoked, a legitimate error has
occured and needs to be investigated. We don't need this code thrasing
the environment attempting to force the node through the process, as
that just complicates troubleshooting.

This change adds inline notes, and explicitly excludes the retry if the
version of OSP is >=16. If nodes have failed, the playbook now also fails
the playbook run to enable the environment to be investigated as it failed
as opposed to an artificially changed state.

Introspection has undergone a number of improvements since the early
days, including better alignment with Ironic itself, and ultimately fixes
in underlying virtual machine configuration for when this tooling is
used with VMs to help prevent some issues which do crop up when using VMs
to pretend to be baremetal, specifically the enforcement of Spanning Tree.

When this legacy retry code gets invoked, a legitimate error has
occured and needs to be investigated. We don't need this code thrasing
the environment attempting to force the node through the process, as
that just complicates troubleshooting.

This change adds inline notes, and explicitly excludes the retry if the
version of OSP is >=16. If nodes have failed, the playbook now also fails
the playbook run to enable the environment to be investigated as it failed
as opposed to an artificially changed state.
@fhubik
Copy link
Contributor

fhubik commented Jun 16, 2022

Right, we are aware of problematic introspection playbook which is being brought up from historical OSP7 times, where it compensated for buggy ironic state at that time and without the retries one basically could not pass introspection stage.

Otoh as introspection stage is critical stage common to all OSP deployments, past or current, afaik we can not affort such change introducing a risk to all CI deployments simultaneously. We plan to wait with such changes (also this one specifically, meaning introspection playbook refactoring) until IR gets osp-branched (which is WIP currently). Then we can finally draw a line between "legacy IR" and current IR state and start fixing all the techdebts in the code (without putting historical parts of CI at risk).

@juliakreger
Copy link
Author

IMHO, it is less of a critical stage component than most think. Kind of now only required in un-shipped OSP code for unrelated to deployment activities and for highly specific cases/data/tests (which have paths to navigate if data is not offered/discovered) in already shipped versions. Plus if we operate with the philosophy of fail fast, we will be able to better understand what is occurring leading to less wasted time when the jobs are reconfigured.

I think it is fine to wait, as long as we're aware and pay down this technical debt before we iterate again because it will save us time with each further iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants