-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating ClusterCatalog hangs for a while, without any signal for what's going on. #432
Comments
I believe this is a result of synchronous image pulling. The fix for this should be to design and implement async image pulling. At that point, the reconciler could kick off a background goroutine to pull an image, and then immediately update status to say "Pulling" somehow. Then later when the image pull is complete (or fails), we could signal the reconciler to tell it to reconcile again at which point it could find the result and continue unpacking/processing it. |
@joelanford you'd mentioned this call but I don't think I picked up your points that well in the call: What was the reason you mentioned again for not doing a state machine again? What you're saying here
seems much easier to do if we just This is a level-based design too.
sounds like you're suggesting an edge-based design. |
Maybe I'm misinterpreting what you mean by a state machine design, but I've seen cases where controllers are implemented as state machines where they make assumptions that the status of an object is correct and then execute a state's logic based on that assumption. But if the status is incorrect, then the logic that is executed is also incorrect and/or incomplete, and that might mean we never successfully run other parts of the state machine that we think will always run. The correct design of a controller is to read the actual and current state of the world and then do as much as possible to drive toward that desired state.
Not quite. Think of the async image puller a lot like the apiserver and our client that fetches from it, and the watches we register with it.
So then anytime we reconcile, no matter why we're reconciling, we're looking at current state (what is in our APIserver cache, what is in our image cache) and then taking action based on what we see there. |
I think I see what you mean, let me try to find some time to prototype this. I'll update here once the prototype is ready. |
I created a ClusterCatalog with a private image from redhat.registry.io, but it hung for a while without any signal about what's going on..
Notice the last unpacked (1s) vs the age (14m).
When I inspected the logs, I saw that the unpacker ran into intermittent issues, before it was able to pull the image successfully. For the first 14m though, there was no signal on the CR for what's going on (you can see the unpacker struggling if you inspect the controller logs though).
We should restructure our controller to be a state machine, so that the unpacking operation can be transitioned through the different states, instead of the entire operation being an all-in blocking operation.
The different states for the transition machine could be something like:
Initiated->Pulling->Unpacking->Unpacked
The text was updated successfully, but these errors were encountered: