Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cluster api operator #221

Open
abrahamhwj opened this issue May 27, 2024 · 17 comments
Open

Support cluster api operator #221

abrahamhwj opened this issue May 27, 2024 · 17 comments
Labels
enhancement New feature or request kind/feature

Comments

@abrahamhwj
Copy link

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Support cluster api operator, Install PVE provider with InfrastructureProvider CRD without clusterctl tool.
If already supported, hope to update the document to guide how to operate.
Currently the cluster api operator doc with a link to PVE provider doc, but this doc only for clusterctl

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-proxmox version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@abrahamhwj abrahamhwj added enhancement New feature or request kind/feature labels May 27, 2024
@mcbenjemaa
Copy link
Member

Thanks for addressing this.
If you want, you can work on this

@isZumpo
Copy link
Contributor

isZumpo commented May 30, 2024

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

@pborn-ionos
Copy link
Contributor

@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.

@abrahamhwj
Copy link
Author

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.

@isZumpo
Copy link
Contributor

isZumpo commented Jun 3, 2024

@isZumpo would you mind raising a PR to add this to our documentation? I suppose that's what the OP is wondering about.

Sure, let us see if we can put something together for that. Suppose it might be best to start here in the chat and then based on how it goes for @abrahamhwj write some documentation about it :)

I use the cluster API operator to spin up proxmox as my InfrastructureProvider. Is there anything in particular that you are wondering about?

If possible, I would like to manage the creation of the cluster through the Cluster API Operator instead of using clusterctl. I would appreciate it if some assistance could be provided. However, I've just started using PVE, so I think maybe I need to operate according to the usage.md to familiarize myself with the technical principles.

Sure, highly recommend using the cluster API operator, it is very nice having everything as YAML files in your gitops repository rather than having to execute clusterctl commands. I am using the cluster API operator helm chart to deploy the cluster API operator using argocd. Will give you the whole thing:

Chart.yaml

....
dependencies:
- name: cluster-api-operator
  version: 0.10.1
  repository: https://kubernetes-sigs.github.io/cluster-api-operator

values.yaml

cluster-api-operator:
  core: "cluster-api:v1.7.1"
  controlPlane: "kubeadm:v1.4.2"
  bootstrap: "kubeadm:v1.4.2"
  manager:
    featureGates:
      kubeadm:
        EXP_CLUSTER_RESOURCE_SET: true
        ClusterTopology: true
      core:
        ClusterTopology: true

templates/proxmox-infrastructure

apiVersion: v1
kind: Namespace
metadata:
  name: proxmox-infrastructure-system
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: proxmox-variables
  namespace: proxmox-infrastructure-system
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: akeyless-secret-store
  target:
    name: proxmox-variables
    creationPolicy: Owner
  dataFrom:
  - extract:
      key: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
 name: proxmox
 namespace: proxmox-infrastructure-system
spec:
 version: v0.4.0
 configSecret:
   name: proxmox-variables
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: IPAMProvider
metadata:
 name: in-cluster
 namespace: proxmox-infrastructure-system
spec:
 version: v0.1.0

In my setup, I am using the external secrets operator to generate the secret named proxmox-variables, containing the required variables to setup the proxmox operator. If you don't use external secrets you can just create it manually instead, it should look like this in the end:

PROXMOX_URL: "https://pve.example:8006"                       # The Proxmox VE host
PROXMOX_TOKEN: "root@pam!capi"                                # The Proxmox VE TokenID for authentication
PROXMOX_SECRET: "REDACTED"                                    # The secret associated with the TokenID

My setup also contains the IPAMProvider, I had issues running without it.

Now with this setup you should be able to deploy your cluster objects

@abrahamhwj
Copy link
Author

@isZumpo Thank you very much for your guidance.
However, I am currently encountering an issue.
After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:
image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

@isZumpo
Copy link
Contributor

isZumpo commented Jun 4, 2024

@isZumpo Thank you very much for your guidance.
However, I am currently encountering an issue.
After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows:
image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

Try taking a look at the logs of the different mentioned managers. I have found especially the logs of capmox-controller-manager to be very valuable.

@abrahamhwj
Copy link
Author

@isZumpo
Logs capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager:
image
“Failed to watch *v1beta1.MachinePool” I did not create machinePool resource, so I ignored the error
"Could not connect to workload cluster to fetch status" before cluster initialization, I think this error is normal?

Logs capmox-system/capmox-controller-manager:
image
Logs capi-ipam-in-cluster-system/capi-ipam-in-cluster-controller-manager
image
Logs capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager
image
cloud-init appears to be functioning normally, but the IP address and DNS configuration of the VM are not taking effect.
image

@65278
Copy link
Collaborator

65278 commented Jun 10, 2024

@isZumpo Thank you very much for your guidance. However, I am currently encountering an issue. After creating a cluster, I can create virtual machines, but it seems to be stuck in the initialization phase. The cluster status is as follows: image

capmox-controller-manager, kubeadm-control-plane-controller-manager, capi-kubeadm-bootstrap-controller-manager, ipam-in-cluster-controller-manager all did not show any error logs.

Do you have any suggestions?

Since the control plane is waiting for KubeAdmInit, it's likely that your virtual machines have no networking (at least towards cluster api). capi-kubeadm-control-plane-controller-manager tells you: Get \"https://192.168.3.220:6443/api/v1?timeout=10s\": dial tcp 192.168.3.220:6443: connect: no route to host".
Please add a route from your cluster-api host to the subnet containing 192.168.3.220, otherwise KubeAdmInit can't finish.
In general, cluster-api can not deploy a cluster without having a route to that cluster.

@abrahamhwj
Copy link
Author

@65278
If the IP 192.168.3.220 is configured, it should be able to communicate with the VM where the cluster API is located since they are all under the same router and in the same subnet, as follows:
PVE host: 192.168.3.200
Cluster API host: 192.168.3.201
VIP: 192.168.3.220
VM: 192.168.3.221~230
Gateway: 192.168.3.1
Prefix: 24
From the status of the VMs, it seems that the network configuration of the VMs was not correctly initialized by Cloud-Init. The VMs were not configured with IP addresses, but I don't know what caused this issue and didn't see any related error logs.

@65278
Copy link
Collaborator

65278 commented Jun 11, 2024

That's always the most difficult to debug part. cloud-init does write error messages to console, but they'll not be very specific.
Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then try netplan apply and see what error messages pop up. In general, we only support netplan api v2 with passthrough. Simple configurations for cloud-init may work, but we haven't tried them at all.
One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.

@abrahamhwj
Copy link
Author

@65278 Thank you for your reply. I attempted to manually configure the IP and account password via CLI commands on the PVE Host, and it successfully allowed me to log in. After configuring the address, I was able to ping it from the host where the cluster API resides, which suggests that the network configuration is likely correct.
As for the netplan API v2, I haven't had experience with it before, so I may need to familiarize myself with it first to be certain.

@65278
Copy link
Collaborator

65278 commented Jun 12, 2024

Make a template that has netplan installed, and cloud-init should do the right thing: https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html#networking-config-version-2
We've got an open ticket about more cloud-init network rendering (talos is incompatible for example). We have no opportunity to test this at the moment, but we have an issue for it: #94
You can contribute a working cloud-init without netplan renderer if you like.

@abrahamhwj
Copy link
Author

abrahamhwj commented Jun 15, 2024

That's always the most difficult to debug part. cloud-init does write error messages to console, but they'll not be very specific. Apart of that, you could preload your template rootfs with a passwd entry for root and login from console, then try netplan apply and see what error messages pop up. In general, we only support netplan api v2 with passthrough. Simple configurations for cloud-init may work, but we haven't tried them at all. One further thing to check out is if your proxmox network bridge is actually up and connected to the right interface.

I reviewed some of CAPMOX's code and documentation on how Cloud-init works. Based on troubleshooting my test environment, the reason could be as follows:

  1. The CD-ROM injected by CAPMOX is at '/dev/sr0'. When PVE enables Cloud-init, the CD-ROM it injects is at '/dev/sr1'.
  2. During system startup, Cloud-init always reads from /dev/sr1 first. This causes the injected configuration by CAPMOX not to be executed by Cloud-init. Therefore, CAPMOX indicates that the node and cluster status are READY, but in reality, there is no effective configuration on the virtual machine.
    PVE:8.2.2
    OS:Ubuntu Server 20.04 LTS

I am very grateful for the CAPMOX project and everyone's enthusiastic responses.
I have learned a lot about Cluster API, PVE, and Cloud-init.
Although I would love to contribute, I am just an ordinary user. I can do some testing or walk through some simple code, but I don't have much experience in code development.

If you have any test suggestions, you can let me know and I will be happy to try them.

@mcbenjemaa
Copy link
Member

You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox,
Otherwise, that will overwrite the config of CAPMOX.
No need to pre-set up the Cloud-init Drive.
Just use an empty CD ROM at ide0, and CAPMOX will do the job.

@abrahamhwj
Copy link
Author

You will need to make sure that your VM template doesn't have Cloud-init Driver provided by Proxmox, Otherwise, that will overwrite the config of CAPMOX. No need to pre-set up the Cloud-init Drive. Just use an empty CD ROM at ide0, and CAPMOX will do the job.

Thank you for your Response

Should the Virtual Machine Template be Preconfigured with the K8S Deployment Environment, Such as Installing containerd, kubeadm, kubectl, kubelet etc.? I couldn't find the related scripts.

If these are not prepared, cloud-init initialization will fail and reconcile stoped.

@mcbenjemaa
Copy link
Member

@abrahamhwj Yes, you will need to build a VM template first.
as stated in our docs: https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/main/docs/Usage.md#dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request kind/feature
Projects
None yet
Development

No branches or pull requests

5 participants