Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling internalIP routing via arg #370

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sepich
Copy link

@sepich sepich commented Oct 28, 2023

Thanks for Kilo!
This is somehow related to #127
We use cloud instances with kilo, where nodes come and go, and there is no way to set annotation on them before kilo starts.
Default way of work leaves the way it is now, but adding cli arg do disable internalIP routing to wg at all.

@squat
Copy link
Owner

squat commented Oct 28, 2023

Hi @sepich :) thanks for the PR and for the additional cleanup! Setting the CLI flag on the DaemonSet to disable routing packets over internal IPs entirely effectively turns the mesh into a full-mesh, where all nodes are connected only via WireGuard tunnels using their public IPs.

If that's the case, why not change the Kilo topology to use a full mesh?

@sepich
Copy link
Author

sepich commented Oct 28, 2023

Sorry if I missing something. We have nodes which have both internal and external IPs at same time. We do not need nodes in different locations to know each others internalIPs, we want full mesh over external IP, and start kilo like this:

            - --kubeconfig=/etc/kubernetes/kubeconfig
            - --hostname=$(NODE_NAME)
            - --cni=false
            - --compatibility=cilium
            - --local=false
            - --subnet=...
            - --mesh-granularity=full
            - --route-internal-ip=false
            - --ipam=cilium-cluster-pool
            # dont disturb pods on redeploy
            - --clean-up=false
            - --clean-up-interface=false

Without --route-internal-ip=false I still see internal IPs in wg show and in ip r which complicate things. Just kilo subnet and podCIDRs are enough.

@squat
Copy link
Owner

squat commented Oct 29, 2023

ah I see! You really just want to have no knowledge of internal IPs at all. Hmm. Does the discovery of internal IPs break things for your network? Maybe internal IPs are reused and that causes clashes in the routes or WG config? Why is having unused internal IPs in your routes bad? Basically I want to figure out what the benefit of disabling the discovery / routing of internal IPs is and if it is worth the complexity of adding another flag for configuring Kilo. The fewer configuration options the better :)

@sepich
Copy link
Author

sepich commented Oct 30, 2023

if it is worth the complexity of adding another flag for configuring Kilo

We have several locations, and in each location there is internal subnet where nodes can communicate with each other. So, there are 3 IPs on nodes - private, public and kilo. On top of kilo0 we run cilium in "direct routing mode" (no encapsulation). We can specifically decide which interface to use. Like setting --node-ip={{kilo0.ip}} for kubelet. Or running static etcd manifests on internal IPs.
Now imagine what's happens when all internal IPs got announced via kilo0. That means etcd nodes having fast internal interconnection would be encapsulated to VPN. That seems like unneeded complexity, and when one runs something in prod - he wants it to be as simple as possible. I'd definitely choose one more cli arg, then troubleshooting all this routing/firewall mess.

And again, maybe I misunderstand some of your concepts. But unfortunately I have to +1 for #342 as current docs are just marketing materials. There are no any technical details, and I have to read code to be able to configure current setup. Would be nice to understand design decisions in each case, like "announce internal IPs".

@squat
Copy link
Owner

squat commented Oct 31, 2023

That means etcd nodes having fast internal interconnection would be encapsulated to VPN.

Thanks for explaining. Yes, that's indeed a really good thing to avoid! Normally this would be avoided by putting all of the nodes in the same "location" so that Kilo is knows about packets to and from the address but doesn't tunnel them over the VPN.

Is there a reason why you can't rely on Kilo topologies to automatically place your cloud nodes into the same Kilo location and in this was avoid encapsulating traffic to internal IPs within a location? You mentioned that your cloud nodes come up and down and that you can't annotate them beforehand, however, what about simply relying on the Kubernetes-standard topology labels that are set by the cloud-controller-manager [0]? If you are not running a cloud-controller-manager on your cloud nodes, then that explains why you don't have access to topology labels and why Kilo can't use them to determine the topology of the mesh.

Sorry for insisting further; I'm hoping to really understand the details of your deployment to see how to best improve Kilo for you and everyone else.

[0] https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesioregion

@sepich
Copy link
Author

sepich commented Oct 31, 2023

Is there a reason why you can't rely on Kilo topologies

Sorry, but what difference it should make? I've set kilo.squat.ai/location to same value to nodes in same location. And I still see kilo creates internalIP routes to wg on these nodes.
I use --mesh-granularity=full as I don't want some single leader to be elected, I need full-mesh.

@squat
Copy link
Owner

squat commented Oct 31, 2023

The difference is that when you rely on the standard topology labels you don't need to manually annotate nodes to specify a location, which sounded like an annoying source of toil for your cluster.

Ideally, when you let the cloud controller manager set the topology labels, then your Kilo topology can be configured automatically and you can let internal IPs be known to Kilo, because packets within a location won't be tunneled over the VPN.

However, it sounds like you have an additional constraint, which is that you don't want to rely on having a single leader to act as the gateway for a group of nodes. However, if all nodes within a location could act as gateways for their own traffic to other locations, then this would be satisfactory for you. Is that right?

In that case, it sounds like the ideal scenario would be to merge #328, which allows for all nodes in a location to be gateways and traffic internal to a location is not encapsulated. This eliminates issues of network bandwidth limitations and alleviates concerns of downtime, between when a leader becomes unresponsive and another node is elected the leader.

Until that is merged, I agree that having an option to never tunnel traffic to internal IPs would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants