From 4c0e78ae56dc70eca301cde804ed0e05cd81a2ac Mon Sep 17 00:00:00 2001 From: Hongliang Liu Date: Fri, 8 Mar 2024 12:20:06 +0800 Subject: [PATCH] Update OVS pipeline document Signed-off-by: Hongliang Liu --- docs/design/ovs-pipeline.md | 111 ++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 63 deletions(-) diff --git a/docs/design/ovs-pipeline.md b/docs/design/ovs-pipeline.md index 12f4bb4d77a..be0f26fc199 100644 --- a/docs/design/ovs-pipeline.md +++ b/docs/design/ovs-pipeline.md @@ -70,7 +70,7 @@ The document references version v1.15 of Antrea. the appropriate remote gateway. This enables each vSwitch to act as a "proxy" for the local gateway when receiving tunnelled traffic and directly take care of the packet forwarding. At the moment, we use a hard-coded value of `aa:bb:cc:dd:ee:ff`. -- *Virtual Service IP*: a virtual IP address used as source IP address for hair-pin Service connections through Antrea +- *Virtual Service IP*: a virtual IP address used as source IP address for hairpin Service connections through Antrea gateway port. At the moment, we use a hard-coded value of `169.254.0.253`. - *Virtual NodePort DNAT IP*: a virtual IP address used as DNAT IP address for NodePort Service connections through Antrea gateway port. At the moment, we use a hard-coded value of `169.254.0.252`. @@ -1328,7 +1328,7 @@ of the connection. This flow is required to handle the following cases when Antr the destination MAC, reply traffic from the backend will go directly to the originating Pod without going first through the gateway and kube-proxy. This means that the reply traffic will arrive at the originating Pod with the incorrect source IP (it will be set to the backend's IP instead of the Service IP). -- When hair-pinning is involved, i.e. connections between 2 local Pods, for which NAT is performed. One example is a +- When hairpin is involved, i.e. connections between 2 local Pods, for which NAT is performed. One example is a Pod accessing a NodePort Service for which externalTrafficPolicy is set to Local using the local Node's IP address, as there will be no SNAT for such traffic. Another example could be hostPort support, depending on how the feature is implemented. @@ -1368,7 +1368,7 @@ MAC address of the local Antrea gateway. Flow 10 matches packets from Service connections that are originated from the local Antrea gateway and destined to the external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark` and `ServiceCTMark`. The destination MAC address is then set to that of the local Antrea gateway. Additionally, `ToGatewayRegMark`, which will be -used with `FromGatewayRegMark` together to identify hair-pinning connections in table [SNATMark], is loaded. Finally, +used with `FromGatewayRegMark` together to identify hairpin connections in table [SNATMark], is loaded. Finally, the packets are forwarded to table [L3DecTTL]. Flow 11 is the table-miss flow, matching packets originated from local Pods and destined to the external network, and @@ -1458,30 +1458,28 @@ If you dump the flows for this table, you may see the following: 5. table=SNATMark, priority=0 actions=goto_table:SNAT ``` -Flow 1 matches the first packet of connections, with `FromGatewayRegMark` and `ToGatewayRegMark`, which means that -both input and output ports are the local Antrea gateway port. Such hair-pinning connection will be SNAT'd with the -*Virtual Service IP* in table [SNAT]. Before forwarding the packets to table [SNAT], `ConnSNATCTMark`, indicating that -the connection requires SNAT, and `HairpinCTMark` indicating that this is a hair-pinning connection, are persisted. They -will be consumed in table [SNAT]. - -Flow 2 matches the first packet of connections, with `FromGatewayRegMark` and `ToTunnelRegMark`, which means that -the input port is the local Antrea gateway and output port is a tunnel. Such connection should be SNAT'd with the IP -address of the local Antrea gateway in table [SNAT]. Before forwarding the packets to table [SNAT], -`ToExternalAddressRegMark` and `NotDSRServiceRegMark` are loaded, indicating that the packets are destined to a -Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode. `ConnSNATCTMark` is also -persisted. - -Flow 3-4 match packets whose source and destination are the same local Pod. Such hair-pin connection should be SNAT'd - with the IP address of the local Antrea gateway. - - Match condition `ct_state=+new+trk` is the same as flow 1. - - Match condition `nw_src=` and `nw_dst=` are to match packets whose source and - destination are both the IP address of a local Pod. - - Action `ct` is the same as flow 1. -- Flow 5 is the auto-generated flow. +Flow 1 matches the first packet of hairpin Service connections, identified by `FromGatewayRegMark` and `ToGatewayRegMark`, +indicating that both input and output ports of the connections are the local Antrea gateway port. Such hairpin +connections will undergo SNAT with the *Virtual Service IP* in table [SNAT]. Before forwarding the packets to table +[SNAT], `ConnSNATCTMark`, indicating that the connection requires SNAT, and `HairpinCTMark` indicating that this is +a hairpin connection, are persisted to mark the connections. These two ct marks will be consumed in table [SNAT]. + +Flow 2 matches the first packet of Service connections requiring SNAT, identified by `FromGatewayRegMark` and +`ToTunnelRegMark`, indicating that the input port is the local Antrea gateway and output port is a tunnel. Such +connections will undergo SNAT with the IP address of the local Antrea gateway in table [SNAT]. Before forwarding the +packets to table [SNAT], `ToExternalAddressRegMark` and `NotDSRServiceRegMark` are loaded, indicating that the packets +are destined to a Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode. +Additionally, `ConnSNATCTMark`, indicating that the connection requires SNAT, is persisted to mark the connections. + +Flow 3-4 match the first packet of hairpin Service connections, identified by the same source and destination IP +addresses. Such hairpin connections will undergo with the IP address of the local Antrea gateway in table [SNAT]. +Similar to flow 1, `ConnSNATCTMark` and `HairpinCTMark` are persisted to mark the connections. + +Flow 5 is the table-miss flow. ### SNAT -This table performs SNAT for connections requiring SNAT within the OVS pipeline. +This table performs SNAT for connections requiring SNAT within the pipeline. If you dump the flows for this table, you should see the following: @@ -1493,41 +1491,28 @@ If you dump the flows for this table, you should see the following: 5. table=SNAT, priority=0 actions=goto_table:L2ForwardingCalc ``` -- Flow 1 is to match packets from hair-pin connections initiated through the the local Antrea gateway port. Such connection - should be SNAT'd with the virtual Service IP. - - Match condition `ct_state=+new+trk` is to match the first packet from connections tracked in `CtZone`. - - Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark` in `CtZone`, indicating that this is hair-pin connection. - - Match condition `reg0=0x2/0xf` is to match `FromGatewayRegMark`, indicating that packets from connections initiated - through the the local Antrea gateway port - - Action `ct` is applied to matched packets with the commit parameter to perform SNAT and persist some ct marks in - `SNATCtZone`. - - Field `commit` means to commit connection to the connection tracking module. - - Field `table=L2ForwardingCalc` is the table where packets will be forwarded. - - Field `zone=65521` is to commit connection to `SNATCtZone`. - - Field `nat(src=169.254.0.253)` is to perform SNAT with virtual Service IP `169.254.0.253`. - - Field `exec` is to persist some ct marks in `SNATCtZone`. - - Action `set_field:0x10/0x10->ct_mark` is to load `ServiceCTMark` in `SNATCtZone`, indicating that this is a - Service connection. - - Action `set_field:0x40/0x40->ct_mark` is to load `HairpinCTMark` in `SNATCtZone`, indicating that this is a - hair-pin connection. -- Flow 2 is to match packets from hair-pin connections initiated through a local Pod. Such connection should be SNAT'd - with the IP address of the local Antrea gateway. - - Match conditions `ct_state=+new+trk` and `ct_mark=0x40/0x40` are the same as flow 1. - - Match condition `reg0=0x3/0xf` is to match `FromLocalRegMark`, indicating that packets from connections initiated - through a local Pod. - - Action `ct` is the same as flow 1 except that `nat(src=10.10.0.1)` is used instead of `nat(src=169.254.0.253)` since - the connection should be SNAT'd with the IP address of the local Antrea gateway. -- Flow 3 is to match the subsequent request packets of connection whose first request packet has been committed in - `SNATCtZone`, then invoke `ct` action on the packets again to recover "tracked" state in `SNATCtZone`. - - Match condition `ct_state=-new-rpl+trk` is to match request "tracked" packets, but not new (the first packet. - - Match condition `ct_mark=0x20/0x20` is to match `ConnSNATCTMark`, indicating that the connection requires SNAT. - - Action `ct` is applied to matched packets to recover "tracked" state in `SNATCtZone`. -- Flow 4 is to match the first packet of connections (non-hairpin) destined to external Service IP initiated through the - Antrea gateway, and the Endpoint is a remote Pod, then perform SNAT in `SNATCtZone` with the Antrea gateway IP. - - Match conditions `ct_state=+new+trk` and `ct_mark=0x20/0x20` are the same as flow 3. - - Match condition `reg0=0x2/0xf` is the same as flow 2. - - Action `ct` is the same as flow 2 except that `HairpinCTMark` is not loaded since this is not a hair-pin connection. -- Flow 5 is the auto-generated flow. +Flow 1 matches the first packet of hairpin Service connections through the local Antrea gateway, identified by +`HairpinCTMark` and `FromGatewayRegMark`. It performs SNAT with the *Virtual Service IP* `169.254.0.253` and forwards +the SNAT'd packets to table [L2ForwardingCalc]. It's worth noting that before SNAT, the "tracked" state of packets is +associated with `CtZone`. After SNAT, their "track" state is associated with `SNATCtZone`, and `ServiceCTMark` and +`HairpinCTMark` persisted in `CtZone` are not accessible anymore. As a result, `ServiceCTMark` and `HairpinCTMark` +need to be persisted once again, but this time they are persisted in `SNATCtZone` for subsequent tables to consume. + +Flow 2 matches the first packet of hairpin Service connection originated from local Pods, identified by `HairpinCTMark` +and `FromLocalRegMark`. It performs SNAT with the IP address of the local Antrea gateway and forwards the SNAT'd packets +to table [L2ForwardingCalc]. Similar to flow 1, `ServiceCTMark` and `HairpinCTMark` are persisted in `SNATCtZone`. + +Flow 3 matches the subsequent request packets of connection whose first request packet has been performed SNAT and then +invoke `ct` action on the packets again to restore tha "tracked" state in `SNATCtZone`. The packets with appropriate +"tracked" state are forwarded to table [L2ForwardingCalc]. + +Flow 4 matches the first packet of Service connections requiring SNAT, identified by `ConnSNATCTMark` and +`FromGatewayRegMark`, indicating the connection is destined to an external Service IP initiated through the +Antrea gateway and the Endpoint is a remote Pod. It performs SNAT with the IP address of the local Antrea gateway and +forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to other flow 1 or 2, `ServiceCTMark` is persisted in +`SNATCtZone`. + +Flow 5 is the table-miss flow. ### L2ForwardingCalc @@ -1591,9 +1576,9 @@ If you dump the flows for this table, you should see the following: Antrea gateway. - Match condition `reg0=0x10/0xf0` is to match `ToTunnelRegMark`, indicating that packets are destined to tunnel. - Match condition `reg0=0x40/0xf0` is to match `ToUplinkRegMark`, indicating that packets are destined to uplink. -- Flow 5 is to match packets from hair-pin connections and forward them to table [ConntrackCommit] directly to bypass +- Flow 5 is to match packets from hairpin connections and forward them to table [ConntrackCommit] directly to bypass all tables for ingress security. - - Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark`, indicating that packets are from hair-pin connections. + - Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark`, indicating that packets are from hairpin connections. ### AntreaPolicyIngressRule @@ -1769,7 +1754,7 @@ Flow 2 is table-miss flow. This is the final table in the pipeline, responsible for handling the output of packets from OVS. It addresses the following cases: -1. Output packets from hair-pin connections to the ingress port where the packets are received. +1. Output packets from hairpin connections to the ingress port where the packets are received. 2. Output packets to an OVS port. 3. Output packets to controller (Antrea Agent). 4. Drop packets. @@ -1784,7 +1769,7 @@ If you dump the flows for this table, you should see the following: 5. table=Output, priority=0 actions=drop ``` -Flow 1 for case 1. It matches packets from hair-pin connections by matching `HairpinCTMark`. +Flow 1 for case 1. It matches packets from hairpin connections by matching `HairpinCTMark`. Flow 2 for case 2. It matches packets by matching `OutputToOFPortRegMark` and outputs them to the OVS port specified by the value stored in `TargetOFPortField`.