Merge pull request #3738 from telepresenceio/thallgren/client-only-nat

Introduce Virtual Network Address Translation (VNAT)
telepresenceio · Dec 6, 2024 · 45d59af · 45d59af
2 parents a7b9dd2 + 00d3b1e
commit 45d59af
Show file tree

Hide file tree

Showing 40 changed files with 1,142 additions and 349 deletions.
diff --git a/CHANGELOG.yml b/CHANGELOG.yml
@@ -36,6 +36,21 @@ items:
   - version: 2.21.0
     date: TBD
     notes:
+      - type: feature
+        title: Automatic subnet conflict avoidance
+        body: ->
+          Telepresence not only detects when the cluster's subnets are in conflict with subnets on the workstation, it
+          will also avoid such conflicts by doing network address translations, placing a conflicting subnet in a
+          virtual subnet.
+        docs: https://telepresence.io/docs/reference/vpn
+      - type: feature
+        title: Virtual Address Translation (VNAT).
+        body: ->
+          It is now possible to use a virtual subnet without routing the affected IPs to a specific workload. A new
+          `telepresence connect --vnat CIDR` flag was added that will perform virtual network address translation of
+          cluster IPs. This flag is very similar to the `--proxy-via CIDR=WORKLOAD` introduced in 2.19, but without
+          the need to specify a workload.
+        docs: https://telepresence.io/docs/reference/vpn
       - type: feature
         title: Intercepts targeting a specific container
         body: ->
@@ -131,6 +146,17 @@ items:
           To achieve this, Telepresence temporarily adds the necessary network to the containerized daemon. This allows the new
           container to join the same network. Additionally, Telepresence starts extra socat containers to handle port mapping,
           ensuring that the desired ports are exposed to the local environment.
+      - type: feature
+        title: Prevent recursion in the Telepresence Virtual Network Interface (VIF)
+        body: >-
+          Network problems may arise when running Kubernetes locally (e.g., Docker Desktop, Kind, Minikube, k3s),
+          because the VIF on the host is also accessible from the cluster's nodes. A request that isn't handled by a
+          cluster resource might be routed back into the VIF and cause a recursion.
+
+          These recursions can now be prevented by setting the client configuration property
+          `routing.recursionBlockDuration` so that new connection attempts are temporarily blocked for a specific
+          IP:PORT pair immediately after an initial attempt, thereby effectively ending the recursion.
+        docs: https://telepresence.io/docs/howtos/cluster-in-vm
       - type: feature
         title: Allow Helm chart to be included as a sub-chart
         body: >-

diff --git a/cmd/traffic/cmd/manager/config/config.go b/cmd/traffic/cmd/manager/config/config.go
@@ -45,7 +45,7 @@ func (c *config) Run(ctx context.Context) error {
 	dlog.Infof(ctx, "Started watcher for ConfigMap %s", cfgConfigMapName)
 	defer dlog.Infof(ctx, "Ended watcher for ConfigMap %s", cfgConfigMapName)
 
-	// The Watch will perform a http GET call to the kubernetes API server, and that connection will not remain open forever
+	// The WatchConfig will perform a http GET call to the kubernetes API server, and that connection will not remain open forever
 	// so when it closes, the watch must start over. This goes on until the context is cancelled.
 	api := k8sapi.GetK8sInterface(ctx).CoreV1()
 	for ctx.Err() == nil {

diff --git a/docs/images/vpn-proxy-via.jpg b/docs/images/vpn-proxy-via.jpg
diff --git a/docs/images/vpn-vnat.jpg b/docs/images/vpn-vnat.jpg
diff --git a/docs/reference/config.md b/docs/reference/config.md
@@ -51,7 +51,6 @@ Values for `client.cluster` controls aspects on how client's connection to the t
 | `mappedNamespaces`        | Namespaces that will be mapped by default.                         | [sequence][yaml-seq] of [strings][yaml-str] | `[]`               |
 | `connectFromRootDaeamon`  | Make connections to the cluster directly from the root daemon.     | [boolean][yaml-bool]                        | `true`             |
 | `agentPortForward`        | Let telepresence-client use port-forwards directly to agents       | [boolean][yaml-bool]                        | `true`             |
-| `virtualIPSubnet`         | The CIDR to use when generating virtual IPs                        | [CIDR][cidr]                                | platform dependent |
 
 ### DNS
 
@@ -208,12 +207,14 @@ Then all of the `alsoProxySubnets` of `10.0.0.0/16` will be proxied, with the ex
 
 These are the valid fields for the `client.routing` key:
 
-| Field                     | Description                                                                            | Type                    | 
-|---------------------------|----------------------------------------------------------------------------------------|-------------------------|
-| `alsoProxySubnets`        | Proxy these subnets in addition to the service and pod subnets                         | [CIDR][cidr]            | 
-| `neverProxySubnets`       | Do not proxy these subnets                                                             | [CIDR][cidr]            | 
-| `allowConflictingSubnets` | Give Telepresence precedence when these subnets conflict with other network interfaces | [CIDR][cidr]            | 
-| `recursionBlockDuration`  | Prevent recursion in VIF for this duration after a connect                             | [duration][go-duration] | 
+| Field                     | Description                                                                            | Type                    | Default            |
+|---------------------------|----------------------------------------------------------------------------------------|-------------------------|--------------------|
+| `alsoProxySubnets`        | Proxy these subnets in addition to the service and pod subnets                         | [CIDR][cidr]            |                    |
+| `neverProxySubnets`       | Do not proxy these subnets                                                             | [CIDR][cidr]            |                    |
+| `allowConflictingSubnets` | Give Telepresence precedence when these subnets conflict with other network interfaces | [CIDR][cidr]            |                    |
+| `recursionBlockDuration`  | Prevent recursion in VIF for this duration after a connect                             | [duration][go-duration] |                    |
+| `virtualSubnet`           | The CIDR to use when generating virtual IPs                                            | [CIDR][cidr]            | platform dependent |
+| `autoResolveConflicts`    | Auto resolve conflicts using a virtual subnet                                          | [bool][yaml-bool]       | true               |
 
 
 ### Timeouts

diff --git a/docs/reference/vpn.md b/docs/reference/vpn.md
@@ -4,10 +4,15 @@ title: Telepresence and VPNs
 
 # Telepresence and VPNs
 
-It is often important to set up Kubernetes API server endpoints to be only accessible via a VPN.
-In setups like these, users need to connect first to their VPN, and then use Telepresence to connect
-to their cluster. As Telepresence uses many of the same underlying technologies that VPNs use,
-the two can sometimes conflict. This page will help you identify and resolve such VPN conflicts.
+Telepresence creates a virtual network interface (VIF) when it connects. This VIF is configured to route the cluster's 
+service subnet and pod subnets so that the user can access resources in the cluster. It's not uncommon that the
+workstation where Telepresence runs already has network interfaces that route subnets that will overlap. Such
+conflicts must be resolved deterministically.
+
+Unless configured otherwise, Telepresence will resolve subnet conflicts by simply moving the cluster's subnet using
+network address translation. For a majority of use-cases, this will be enough.
+
+For more info, see the section on how to [avoid the conflict](#avoiding-the-conflict) below.
 
 ## VPN Configuration
 
@@ -39,7 +44,7 @@ cluster will place resources in. Let's imagine your cluster is configured to pla
 
 ![VPN Kubernetes config](../images/vpn-k8s-config.jpg)
 
-## Telepresence conflicts
+# Telepresence conflicts
 
 When you run `telepresence connect` to connect to a cluster, it talks to the API server
 to figure out what pod and service CIDRs it needs to map in your machine. If it detects
@@ -53,53 +58,31 @@ telepresence connect: error: connector.Connect: failed to connect to root daemon
 
 Telepresence offers three different ways to resolve this:
 
-- [Allow the conflict](#allowing-the-conflict) in a controlled manner
 - [Avoid the conflict](#avoiding-the-conflict) using the `--proxy-via` connect flag
+- [Allow the conflict](#allowing-the-conflict) in a controlled manner
 - [Use docker](#using-docker) to make telepresence run in a container with its own network config
 
-### Allowing the conflict
-
-One way to resolve this, is to carefully consider what your network layout looks like, and
-then allow Telepresence to override the conflicting subnets.
-Telepresence is refusing to map them, because mapping them could render certain hosts that
-are inside the VPN completely unreachable. However, you (or your network admin) know better
-than anyone how hosts are spread out inside your VPN.
-Even if the private route routes ALL of `10.0.0.0/8`, it's possible that hosts are only
-being spun up in one of the subblocks of the `/8` space. Let's say, for example,
-that you happen to know that all your hosts in the VPN are bunched up in the first
-half of the space -- `10.0.0.0/9` (and that you know that any new hosts will
-only be assigned IP addresses from the `/9` block). In this case you
-can configure Telepresence to override the other half of this CIDR block, which is where the
-services and pods happen to be.
-To do this, all you have to do is configure the `client.routing.allowConflictingSubnets` flag
-in the Telepresence helm chart. You can do this directly via `telepresence helm upgrade`:
 
-```console
-$ telepresence helm upgrade --set client.routing.allowConflictingSubnets="{10.128.0.0/9}"
-```
+## Avoiding the conflict
 
-You can also choose to be more specific about this, and only allow the CIDRs that you KNOW
-are in use by the cluster:
-
-```console
-$ telepresence helm upgrade --set client.routing.allowConflictingSubnets="{10.130.0.0/16,10.132.0.0/16}"
-```
-
-The end result of this (assuming an allow list of `/9`) will be a configuration like this:
-
-![VPN Telepresence](../images/vpn-with-tele.jpg)
+Telepresence can perform Virtual Network Address Translation (henceforth referred to as VNAT) of the cluster's subnets
+when routing them from the workstation, thus moving those subnets so that conflicts are avoided. Unless configured not
+to, Telepresence will use VNAT by default when it detects conflicts.
 
-### Avoiding the conflict
+VNAT is enabled by passing a `--vnat` flag (introduced in Telepresence 2.21) to`teleprence connect`. When using this
+flag, Telepresence will take the following  actions:
 
-An alternative to allowing the conflict is to remap the cluster's CIDRs to virtual CIRDs
-on the workstation by passing a `--proxy-via` flag to `teleprence connect`.
+- The local DNS-server will translate any IP contained in a VNAT subnet to a virtual IP.
+- All access to a virtual IP will be translated back to its original when routed to the cluster. 
+- The container environment retrieved when using `ingest` or `intercept` will be mangled, so that all IPs contained
+   in VNAT subnets are replaced with corresponding virtual IPs.
 
-The `telepresence connect` flag `--proxy-via`, introduced in Telepresence 2.19, will allow the local DNS-server to translate cluster subnets to virtual subnets on the workstation, and the VIF to do the reverse translation. The syntax for this new flag, which can be repeated, is:
+The `--vnat` flag can be repeated to make Telepresence translate more than one subnet.
 
 ```console
-$ telepresence connect --proxy-via CIDR=WORKLOAD
+$ telepresence connect --vnat CIDR
 ```
-Cluster DNS responses matching CIDR to virtual IPs that are routed (with reverse translation) via WORKLOAD. The CIDR can also be a symbolic name that identifies a subnet or list of subnets:
+The CIDR can also be a symbolic name that identifies a well-known subnet or list of subnets:
 
 | Symbol    | Meaning                             |
 |-----------|-------------------------------------|
@@ -108,38 +91,128 @@ Cluster DNS responses matching CIDR to virtual IPs that are routed (with reverse
 | `pods`    | The cluster's pod subnets.          | 
 | `all`     | All of the above.                   |
 
-The WORKLOAD is the deployment, replicaset, statefulset, or argo-rollout in the cluster whose agent will be used for targeting the routed subnets.
 
-This is useful in two situations:
+### Virtual Subnet Configuration
 
-1. The cluster's subnets collide with subnets otherwise available on the workstation. This is common when using a VPN, in particular if the VPN has a small subnet mask, making the subnet itself very large. The new `--proxy-via` flag can be used as an alternative to [allowing the conflict](#allowing-the-conflict) to take place, give Telepresence precedence, and thus hide the corresponding subnets from the conflicting subnet. The `--proxy-via` will instead reroute the cluster's subnet and hence, avoid the conflict.
-2. The cluster's DNS is configured with domains that resolve to loop-back addresses (this is sometimes the case when the cluster uses a mesh configured to listen to a loopback address and then reroute from there). A loop-back address is not useful on the client, but the `--proxy-via` can reroute the loop-back address to a virtual IP that the client can use.
+Telepresence will use a special subnet when it generates the virtual IPs that are used locally. On a Linux or macOS
+workstation, this subnet will be a class E subnet (not normally used for any other purposes). On Windows, the class E is
+not routed, and Telepresence will instead default to `211.55.48.0/20`.
 
-Subnet proxying is done by the client's DNS-resolver which translates the IPs returned by the cluster's DNS resolver to a virtual IP (VIP) to use on the client. Telepresence's VIF will detect when the VIP is used, and translate it back to the loop-back address on the pod.
+The default subnet used can be overridden in the client configuration.
+
+In `config.yml` on the workstation:
+```yaml
+routing:
+  virtualSubnet: 100.10.20.0/24
+```
+
+Or as a Helm chart value to be applied on all clients:
+```yaml
+client:
+  routing:
+    virtualSubnet: 100.10.20.0/24
+```
+
+#### Example
+
+Let's assume that we have a conflict between the cluster's subnets, all covered by the CIDR `10.124.0.0/9` and a VPN
+using `10.0.0.0/9`. We avoid the conflict using:
+
+```console
+$ telepresence connect --vnat all
+```
+
+The cluster's subnets are now hidden behind a virtual subnet, and the resulting configuration will look like this:
 
-#### Proxy-via and using IP-addresses directly
+![VPN Telepresence](../images/vpn-vnat.jpg)
 
-If the service is using IP-addresses instead of domain-names when connecting to other cluster resources, then such connections will fail when running locally. The `--proxy-via` relies on the local DNS-server to translate the cluster's DNS responses, so that the IP of an `A` or `AAAA` response is replaced with a virtual IP from the configured subnet. If connections are made using an IP instead of a domain-name, then no such lookup is made. Telepresence has no way of detecting the direct use of IP-addresses.
+### Proxying via a specific workload
 
-#### Virtual IP Configuration
+Telepresence is capable of routing all traffic to a VNAT to a specific workload. This is particularly useful when the
+cluster's DNS is configured with domains that resolve to loop-back addresses. This is sometimes the case when the
+cluster uses a mesh configured to listen to a loopback address and then reroute from there.
 
-Telepresence will use a special subnet when it generates the virtual IPs that are used locally. On a Linux or macOS workstation, this subnet will be
-a class E subnet (not normally used for any other purposes). On Windows, the class E is not routed, and Telepresence will instead default to `211.55.48.0/20`.
+The `--proxy-via` flag (introduced in Telepresenc 2.19) is similar to `--vnat`, but the argument must be in the form
+CIDR=WORKLOAD. When using this flag, all traffic to the given CIDR will be routed via the given workstation.
 
-The default can be changed using the configuration `cluster.virtualIPSubnet`.
+The WORKLOAD is the deployment, replicaset, statefulset, or argo-rollout in the cluster whose traffic-agent will be used
+for targeting the routed subnets.
 
 #### Example
 
-Let's assume that we have a conflict between the cluster's subnets, all covered by the CIDR `10.124.0.0/9` and a VPN using `10.0.0.0/9`. We avoid the conflict using:
+Let's assume that we have a conflict between the cluster's subnets, all covered by the CIDR `10.124.0.0/9` and a VPN
+using `10.0.0.0/9`. We avoid the conflict using:
 
 ```console
 $ telepresence connect --proxy-via all=echo
 ```
 
-The cluster's subnets are now hidden behind a virtual subnet, and the resulting configuration will look like this:
+The cluster's subnets are now hidden behind a virtual subnet, and all traffic is routed to the echo workload.
+
+### Caveats when using VNAT
+
+Telepresence may not accurately detect cluster-side IP addresses being used by services running locally on a workstation
+in certain scenarios. This limitation arises when local services obtain IP addresses from remote sources such as
+databases or configmaps, or when IP addresses are sent to it in API calls.
+
+### Disabling default VNAT
+
+The default behavior of using VNAT to resolve conflicts can be disabled by adding the following to the client config. 
+
+In `config.yml` on the workstation:
+```yaml
+routing:
+  autoResolveConflicts: false
+```
+
+Or as a Helm chart value to be applied on all clients:
+```yaml
+client:
+  routing:
+    autoResolveConflicts: false
+```
+
+Explicitly allowing all conflicts will also effectively prevent the default VNAT behavior.
+
+## Allowing the conflict
+
+A conflict can be resolved by carefully considering what your network layout looks like, and then allow Telepresence to
+override the conflicting subnets. Telepresence is refusing to map them, because mapping them could render certain hosts
+that are inside the VPN completely unreachable. However, you (or your network admin) know better than anyone how hosts
+are spread out inside your VPN.
 
-![VPN Telepresence](../images/vpn-proxy-via.jpg)
+Even if the private route routes ALL of `10.0.0.0/8`, it's possible that hosts are only being spun up in one of the
+sub-blocks of the `/8` space. Let's say, for example, that you happen to know that all your hosts in the VPN are bunched
+up in the first half of the space -- `10.0.0.0/9` (and that you know that any new hosts will only be assigned IP
+addresses from the `/9` block). In this case you can configure Telepresence to override the other half of this CIDR
+block, which is where the services and pods happen to be.
+
+To do this, all you have to do is configure the `client.routing.allowConflictingSubnets` flag in the Telepresence helm
+chart. You can do this directly via `telepresence helm upgrade`:
+
+In `config.yml` on the workstation:
+```yaml
+routing:
+  allowConflictingSubnets: 10.128.0.0/9
+```
+
+Or as a Helm chart configuration value to be applied on all clients:
+```yaml
+client:
+  routing:
+    allowConflictingSubnets: 10.128.0.0/9
+```
+
+Or pass the Helm chart configuration using the `--set` flag
+```console
+$ telepresence helm upgrade --set client.routing.allowConflictingSubnets="{10.128.0.0/9}"
+```
+
+The end result of this (assuming an allowlist of `/9`) will be a configuration like this:
+
+![VPN Telepresence](../images/vpn-with-tele.jpg)
 
 ### Using docker
 
-Use `telepresence connect --docker` to make the Telepresence daemon containerized, which means that it has its own network configuration and therefore no conflict with a VPN. Read more about docker [here](docker-run.md).
+Use `telepresence connect --docker` to make the Telepresence daemon containerized, which means that it has its own
+network configuration and therefore no conflict with a VPN. Read more about docker [here](docker-run.md).