Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers stuck due to nonexistent migrations #1228

Open
grzleadams opened this issue Mar 21, 2024 · 2 comments
Open

Workers stuck due to nonexistent migrations #1228

grzleadams opened this issue Mar 21, 2024 · 2 comments
Labels

Comments

@grzleadams
Copy link

Version
Pulp Operator: 1.0.0-beta.3
(The following are both important to note, as I'll describe later.)
Affected node: pulp-minimal:stable@0f66fa60566b
Unaffected node: pulp-minimal:stable@4b30cd0edbb7

Describe the bug
I'm not sure if this belongs in the operator, oci-images, or another repo, so let me know if I should open it somewhere else.
We deployed a Pulp in our K8s cluster and everything was fine. A few days later, I went to bump the worker replicas, but the new pods just sat with the following in their logs:

error: Failed to initialize NSS library
Database migration in progress. Waiting...

After a while (and stumbling across this issue), it occurred to me that the new pods were probably on nodes that hadn't previously pulled the minimal image, and sure enough the digest IDs for the images on both nodes didn't match. The API pod was on the "old" image... the new pods were on the "new" one. So, unless I'm misunderstanding what's happening here, these workers will never come up until the API pod is refreshed (with imagePullPolicy either set to Always, or with "unaffected" nodes cordoned, to ensure it'll come up on the newer image).

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Pulp operator and a Pulp instance.
  2. Wait a few days (or until there's a new re-tag of pulp-minimal:latest).
  3. Cordon nodes currently running workers.
  4. Scale workers up and ensure they come up on "fresh" nodes with the newer image.

Expected behavior
I'm not sure... it's easy enough for the user to be more explicit with the images they want to use. On the other hand, that involves providing a bunch of values in the Pulp spec that can easily be mixed up. It seems like the Operator should recognize that a "new" image exists and update all the pods (in rolling fashion, of course), to ensure the migrations actually happen.

Additional context
None... but I wanted to say thanks for all the work that goes into Pulp!

@grzleadams
Copy link
Author

grzleadams commented Mar 21, 2024

I did just notice that the docs say:

WARN: defining a different image than the one used by API pods can cause unexpected behaviors!

So maybe the key would be to move away from tagging latest for the image, since users can unintentionally fall into the situation that the docs warn about?

@git-hyagi
Copy link
Collaborator

Hi @grzleadams,

Thank you for the detailed description, and sorry for the late response!

It seems like the Operator should recognize that a "new" image exists and update all the pods (in rolling fashion, of course), to ensure the migrations actually happen.

The current logic of the operator is verifying if the spec.{image,image_version} changed and triggering a reconciliation to update the Deployments with the new image. I think the problem is because, by default, we let the ImagePullPolicy as IfNotPresent. Modifying it to Always should avoid the error of the "cached"/"old" image version with the same tag.

spec:
  image_pull_policy: Always

So maybe the key would be to move away from tagging latest for the image, since users can unintentionally fall into the situation that the docs warn about?

Hum... that is a good point. Another idea would be to enforce the pulp-minimal image with pulp-operator image, like, instead of allowing users to define which version of pulp they would like to install, installing pulp-operator v1 would install pulpcore v1. But this would bring other issues, for example, in air gapped environments where users point to custom registries or a QA env pointing to custom images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants