Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux_bootstrap_git removal dangling CRDs #498

Closed
ldunkum opened this issue Jun 15, 2023 · 12 comments
Closed

flux_bootstrap_git removal dangling CRDs #498

ldunkum opened this issue Jun 15, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@ldunkum
Copy link

ldunkum commented Jun 15, 2023

We had to taint and reapply the flux_bootstrap_git resource in one of our environments where we use ImagePolicies. The ImagePolicy was stuck in terminating state the ImagePolicies, ImageRepositories and Receivers CRDs were still present.

Provider version: 0.25.3

Output of terraform apply:

flux_bootstrap_git.main: Destroying... [id=flux-system]
flux_bootstrap_git.main: Still destroying... [id=flux-system, 10s elapsed]
flux_bootstrap_git.main: Still destroying... [id=flux-system, 20s elapsed]
flux_bootstrap_git.main: Still destroying... [id=flux-system, 30s elapsed]
flux_bootstrap_git.main: Destruction complete after 40s
flux_bootstrap_git.main: Creating...
flux_bootstrap_git.main: Still creating... [10s elapsed]
flux_bootstrap_git.main: Still creating... [20s elapsed]
flux_bootstrap_git.main: Still creating... [30s elapsed]
flux_bootstrap_git.main: Still creating... [40s elapsed]
flux_bootstrap_git.main: Still creating... [50s elapsed]
flux_bootstrap_git.main: Still creating... [1m0s elapsed]
flux_bootstrap_git.main: Still creating... [1m10s elapsed]
flux_bootstrap_git.main: Still creating... [1m20s elapsed]

Error: Bootstrap run error

  with flux_bootstrap_git.main,
  on flux.tf line 46, in resource "flux_bootstrap_git" "main":
  46: resource "flux_bootstrap_git" "main" {

timeout waiting for:
[CustomResourceDefinition/imagepolicies.image.toolkit.fluxcd.io status:
'Terminating',
CustomResourceDefinition/receivers.notification.toolkit.fluxcd.io status:
'Terminating',
CustomResourceDefinition/imagerepositories.image.toolkit.fluxcd.io status:
'Terminating', Namespace/flux-system status: 'Terminating']

Using flux uninstall resolved the problem and allowed us to reapply the resource.

I found this issue in the image-automation-controller repo which seems to be related.

Edit: Just a little info, it just failed in another env with the CRD providers as well.

@ldunkum ldunkum changed the title flux_bootstrap_git removal dangling CRDs flux_bootstrap_git removal dangling CRDs Jun 15, 2023
@alextricity25
Copy link

I'm seeing the same issue when deleting imagepolicies and imagerepositories CRDs. We use Pulumi to install the flux2-2.12.3 helm chart, and we also use Pulumi to uninstall the chart when we need to clean up our ephemeral environments.
Pulumi fails when attempting to destroy these CRDs.

DEFAULT 2024-02-27T19:56:26.185784Z - kubernetes:image.toolkit.fluxcd.io/v1beta2:ImageRepository xxxx-docker-image-repository deleting (0s)
DEFAULT 2024-02-27T19:56:26.185790Z @ destroying...............................................................................................................................................................................................................................................................................................................
DEFAULT 2024-02-27T19:56:26.185812Z - kubernetes:image.toolkit.fluxcd.io/v1beta2:ImageRepository xxxx-docker-image-repository deleting (300s) error: 'xxxx-docker' timed out waiting to be Ready

@swade1987
Copy link
Member

Hello @ldunkum ,

I hope you're doing well! I'm the newest contributor to this repository, and I'm currently in the process of issue grooming to ensure that all concerns are addressed promptly and efficiently.

I noticed this issue you reported and wanted to check in with you to see if it's still affecting your work. Your feedback is invaluable to us, and any additional insights or updates you can share would be greatly appreciated to help us understand and solve the problem more effectively.

If this issue has been resolved, could you please share how it was fixed? This information could be incredibly helpful to others in the community facing similar problems. It would also allow us to close this issue with a clear resolution.
In case the issue is still open and troubling you, let's work together to find a solution. Your satisfaction and the smooth functioning of our project are our top priorities.

Thank you for your time and contributions to our community. Looking forward to your response!

Best regards,

Steve


If your issue still persists can I propose upgrading to the latest version of the flux provider and reporting back.

@ldunkum
Copy link
Author

ldunkum commented Mar 29, 2024

Hi Steve,

thanks for your reply, I'm on vacation at the moment and can't access my work computer to check.
I'll report back in the second week of April.

Cheers

@swade1987 swade1987 added the request for feedback Feedback is requested from users label Apr 8, 2024
@ldunkum
Copy link
Author

ldunkum commented Apr 10, 2024

I just tried this with the newest provider version (v1.2.3) and still encountered an error, although this might be a different one. On the first try everything seemed to be fine, but I had forgotten to set up the cluster with RBAC and couldn't access it, however tainting & re-applying flux_bootstrap_git apparently worked fine.
Afterwards, I finished setting up the cluster, the list includes all important steps thereafter:

  • Created a cluster, installed flux with the flux_bootstrap_git resource
  • tainted the flux_bootstrap_git resource and applied
  • this left some dangling CRDs, however this only affected CRDs which had already had some existing resources (HelmRelease & HelmRepository)
  • ran flux uninstall, this removed all CRDs, however the namespace was still stuck in Terminating state
  • didn't see any finalizers and forcefully removed namespace
  • installed flux with flux_bootstrap_git resource
Screenshot

flux_bootstrap_git_dangling_resources

So the original error might be fixed, however the behaviour is still different from flux uninstall.

@swade1987
Copy link
Member

@ldunkum thanks for this I am trying to work through your flow here, why did you need to taint flux_bootstrap_git?

Also, could you please provide me with the flux_bootstrap_git resource configuration and run tree in the repository it's syncing so I can see what's running within there?

@ldunkum
Copy link
Author

ldunkum commented Apr 10, 2024

Well in this case, it was just to try if the error still exists. Originally I think it had to with the error described in #499.

Here's our flux_bootstrap_git config:

provider "flux" {
  kubernetes = {...}

  git = {
    url = "ssh://[email protected]/${var.repository_name}.git"

    ssh = {
      username    = "git"
      private_key = tls_private_key.this.private_key_pem
    }
  }
}

resource "flux_bootstrap_git" "main" {
  depends_on = [github_repository_deploy_key.main]

  version                = var.flux_version
  path                   = "clusters/${var.tenant_name}"
  components             = var.flux_components
  components_extra       = var.flux_components_extra
  kustomization_override = <<EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - target:
      kind: Deployment
    patch: |
      - op: add
        path: /spec/template/spec/containers/0/resources/requests/memory
        value: 128Mi
      - op: add
        path: /spec/template/spec/tolerations
        value:
          - key: arch
            operator: Equal
            value: arm64
            effect: NoSchedule
EOF
}

The repository is quite large as it contains all base configurations and overlays, it has around 700 dirs and 1200 files, here's the relevant snippet where flux_bootstrap_git syncs to:

├── test-cluster
│   ├── other-kustomizations.yaml
│   ├── flux-system
│   │   ├── gotk-components.yaml
│   │   ├── gotk-sync.yaml
│   │   └── kustomization.yaml
│   ├── kustomization.yaml

@swade1987
Copy link
Member

So what we are saying here is that if, for some reason, we taint the flux_bootstrap_git resource and then attempt to run terraform apply, we get issues. Is that the correct assumption?

@ldunkum
Copy link
Author

ldunkum commented Apr 11, 2024

Exactly! Again, I'm not sure if the original error is still the same, but the process of taint & reapply does not give the same result as flux uninstalll & flux install and I had to resort to manual action to remove flux artifacts.

@swade1987
Copy link
Member

@ldunkum, that is expected as flux uninstall and is really terraform destroy in the Terraform world.

@ldunkum
Copy link
Author

ldunkum commented Apr 15, 2024

@swade1987 I'm sorry, I don't really know what you mean. When we taint the resource, it is destroyed and recreated, this should be the same as terraform destroy & terraform apply.

@stefanprodan stefanprodan added bug Something isn't working and removed request for feedback Feedback is requested from users labels Apr 15, 2024
@stefanprodan
Copy link
Member

stefanprodan commented Apr 15, 2024

Yes this is a bug, the provider doesn't call uninstall the right way as the CLI does. I'll fix it.

@stefanprodan stefanprodan self-assigned this Apr 15, 2024
@stefanprodan
Copy link
Member

Ok this was already fixed in #657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants