You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version
Pulp Operator: 1.0.0-beta.3
(The following are both important to note, as I'll describe later.)
Affected node: pulp-minimal:stable@0f66fa60566b
Unaffected node: pulp-minimal:stable@4b30cd0edbb7
Describe the bug
I'm not sure if this belongs in the operator, oci-images, or another repo, so let me know if I should open it somewhere else.
We deployed a Pulp in our K8s cluster and everything was fine. A few days later, I went to bump the worker replicas, but the new pods just sat with the following in their logs:
error: Failed to initialize NSS library
Database migration in progress. Waiting...
After a while (and stumbling across this issue), it occurred to me that the new pods were probably on nodes that hadn't previously pulled the minimal image, and sure enough the digest IDs for the images on both nodes didn't match. The API pod was on the "old" image... the new pods were on the "new" one. So, unless I'm misunderstanding what's happening here, these workers will never come up until the API pod is refreshed (with imagePullPolicy either set to Always, or with "unaffected" nodes cordoned, to ensure it'll come up on the newer image).
To Reproduce
Steps to reproduce the behavior:
Deploy Pulp operator and a Pulp instance.
Wait a few days (or until there's a new re-tag of pulp-minimal:latest).
Cordon nodes currently running workers.
Scale workers up and ensure they come up on "fresh" nodes with the newer image.
Expected behavior
I'm not sure... it's easy enough for the user to be more explicit with the images they want to use. On the other hand, that involves providing a bunch of values in the Pulp spec that can easily be mixed up. It seems like the Operator should recognize that a "new" image exists and update all the pods (in rolling fashion, of course), to ensure the migrations actually happen.
Additional context
None... but I wanted to say thanks for all the work that goes into Pulp!
The text was updated successfully, but these errors were encountered:
WARN: defining a different image than the one used by API pods can cause unexpected behaviors!
So maybe the key would be to move away from tagging latest for the image, since users can unintentionally fall into the situation that the docs warn about?
Thank you for the detailed description, and sorry for the late response!
It seems like the Operator should recognize that a "new" image exists and update all the pods (in rolling fashion, of course), to ensure the migrations actually happen.
The current logic of the operator is verifying if the spec.{image,image_version} changed and triggering a reconciliation to update the Deployments with the new image. I think the problem is because, by default, we let the ImagePullPolicy as IfNotPresent. Modifying it to Always should avoid the error of the "cached"/"old" image version with the same tag.
spec:
image_pull_policy: Always
So maybe the key would be to move away from tagging latest for the image, since users can unintentionally fall into the situation that the docs warn about?
Hum... that is a good point. Another idea would be to enforce the pulp-minimal image with pulp-operator image, like, instead of allowing users to define which version of pulp they would like to install, installing pulp-operator v1 would install pulpcore v1. But this would bring other issues, for example, in air gapped environments where users point to custom registries or a QA env pointing to custom images.
Version
Pulp Operator: 1.0.0-beta.3
(The following are both important to note, as I'll describe later.)
Affected node: pulp-minimal:stable@0f66fa60566b
Unaffected node: pulp-minimal:stable@4b30cd0edbb7
Describe the bug
I'm not sure if this belongs in the operator, oci-images, or another repo, so let me know if I should open it somewhere else.
We deployed a Pulp in our K8s cluster and everything was fine. A few days later, I went to bump the worker replicas, but the new pods just sat with the following in their logs:
After a while (and stumbling across this issue), it occurred to me that the new pods were probably on nodes that hadn't previously pulled the minimal image, and sure enough the digest IDs for the images on both nodes didn't match. The API pod was on the "old" image... the new pods were on the "new" one. So, unless I'm misunderstanding what's happening here, these workers will never come up until the API pod is refreshed (with imagePullPolicy either set to
Always
, or with "unaffected" nodes cordoned, to ensure it'll come up on the newer image).To Reproduce
Steps to reproduce the behavior:
Expected behavior
I'm not sure... it's easy enough for the user to be more explicit with the images they want to use. On the other hand, that involves providing a bunch of values in the Pulp spec that can easily be mixed up. It seems like the Operator should recognize that a "new" image exists and update all the pods (in rolling fashion, of course), to ensure the migrations actually happen.
Additional context
None... but I wanted to say thanks for all the work that goes into Pulp!
The text was updated successfully, but these errors were encountered: