Skip to content

Commit

Permalink
Update OVS pipeline document
Browse files Browse the repository at this point in the history
Signed-off-by: Hongliang Liu <[email protected]>
  • Loading branch information
hongliangl committed Mar 8, 2024
1 parent 200ef55 commit 4c0e78a
Showing 1 changed file with 48 additions and 63 deletions.
111 changes: 48 additions & 63 deletions docs/design/ovs-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ The document references version v1.15 of Antrea.
the appropriate remote gateway. This enables each vSwitch to act as a "proxy" for the local gateway when receiving
tunnelled traffic and directly take care of the packet forwarding. At the moment, we use a hard-coded value of
`aa:bb:cc:dd:ee:ff`.
- *Virtual Service IP*: a virtual IP address used as source IP address for hair-pin Service connections through Antrea
- *Virtual Service IP*: a virtual IP address used as source IP address for hairpin Service connections through Antrea
gateway port. At the moment, we use a hard-coded value of `169.254.0.253`.
- *Virtual NodePort DNAT IP*: a virtual IP address used as DNAT IP address for NodePort Service connections through
Antrea gateway port. At the moment, we use a hard-coded value of `169.254.0.252`.
Expand Down Expand Up @@ -1328,7 +1328,7 @@ of the connection. This flow is required to handle the following cases when Antr
the destination MAC, reply traffic from the backend will go directly to the originating Pod without going first
through the gateway and kube-proxy. This means that the reply traffic will arrive at the originating Pod with the
incorrect source IP (it will be set to the backend's IP instead of the Service IP).
- When hair-pinning is involved, i.e. connections between 2 local Pods, for which NAT is performed. One example is a
- When hairpin is involved, i.e. connections between 2 local Pods, for which NAT is performed. One example is a
Pod accessing a NodePort Service for which externalTrafficPolicy is set to Local using the local Node's IP address,
as there will be no SNAT for such traffic. Another example could be hostPort support, depending on how the feature
is implemented.
Expand Down Expand Up @@ -1368,7 +1368,7 @@ MAC address of the local Antrea gateway.
Flow 10 matches packets from Service connections that are originated from the local Antrea gateway and destined to the
external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark` and `ServiceCTMark`. The
destination MAC address is then set to that of the local Antrea gateway. Additionally, `ToGatewayRegMark`, which will be
used with `FromGatewayRegMark` together to identify hair-pinning connections in table [SNATMark], is loaded. Finally,
used with `FromGatewayRegMark` together to identify hairpin connections in table [SNATMark], is loaded. Finally,
the packets are forwarded to table [L3DecTTL].

Flow 11 is the table-miss flow, matching packets originated from local Pods and destined to the external network, and
Expand Down Expand Up @@ -1458,30 +1458,28 @@ If you dump the flows for this table, you may see the following:
5. table=SNATMark, priority=0 actions=goto_table:SNAT
```

Flow 1 matches the first packet of connections, with `FromGatewayRegMark` and `ToGatewayRegMark`, which means that
both input and output ports are the local Antrea gateway port. Such hair-pinning connection will be SNAT'd with the
*Virtual Service IP* in table [SNAT]. Before forwarding the packets to table [SNAT], `ConnSNATCTMark`, indicating that
the connection requires SNAT, and `HairpinCTMark` indicating that this is a hair-pinning connection, are persisted. They
will be consumed in table [SNAT].

Flow 2 matches the first packet of connections, with `FromGatewayRegMark` and `ToTunnelRegMark`, which means that
the input port is the local Antrea gateway and output port is a tunnel. Such connection should be SNAT'd with the IP
address of the local Antrea gateway in table [SNAT]. Before forwarding the packets to table [SNAT],
`ToExternalAddressRegMark` and `NotDSRServiceRegMark` are loaded, indicating that the packets are destined to a
Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode. `ConnSNATCTMark` is also
persisted.

Flow 3-4 match packets whose source and destination are the same local Pod. Such hair-pin connection should be SNAT'd
with the IP address of the local Antrea gateway.
- Match condition `ct_state=+new+trk` is the same as flow 1.
- Match condition `nw_src=<POD_IP_ADDRESS>` and `nw_dst=<POD_IP_ADDRESS>` are to match packets whose source and
destination are both the IP address of a local Pod.
- Action `ct` is the same as flow 1.
- Flow 5 is the auto-generated flow.
Flow 1 matches the first packet of hairpin Service connections, identified by `FromGatewayRegMark` and `ToGatewayRegMark`,
indicating that both input and output ports of the connections are the local Antrea gateway port. Such hairpin
connections will undergo SNAT with the *Virtual Service IP* in table [SNAT]. Before forwarding the packets to table
[SNAT], `ConnSNATCTMark`, indicating that the connection requires SNAT, and `HairpinCTMark` indicating that this is
a hairpin connection, are persisted to mark the connections. These two ct marks will be consumed in table [SNAT].

Flow 2 matches the first packet of Service connections requiring SNAT, identified by `FromGatewayRegMark` and
`ToTunnelRegMark`, indicating that the input port is the local Antrea gateway and output port is a tunnel. Such
connections will undergo SNAT with the IP address of the local Antrea gateway in table [SNAT]. Before forwarding the
packets to table [SNAT], `ToExternalAddressRegMark` and `NotDSRServiceRegMark` are loaded, indicating that the packets
are destined to a Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode.
Additionally, `ConnSNATCTMark`, indicating that the connection requires SNAT, is persisted to mark the connections.

Flow 3-4 match the first packet of hairpin Service connections, identified by the same source and destination IP
addresses. Such hairpin connections will undergo with the IP address of the local Antrea gateway in table [SNAT].
Similar to flow 1, `ConnSNATCTMark` and `HairpinCTMark` are persisted to mark the connections.

Flow 5 is the table-miss flow.

### SNAT

This table performs SNAT for connections requiring SNAT within the OVS pipeline.
This table performs SNAT for connections requiring SNAT within the pipeline.

If you dump the flows for this table, you should see the following:

Expand All @@ -1493,41 +1491,28 @@ If you dump the flows for this table, you should see the following:
5. table=SNAT, priority=0 actions=goto_table:L2ForwardingCalc
```

- Flow 1 is to match packets from hair-pin connections initiated through the the local Antrea gateway port. Such connection
should be SNAT'd with the virtual Service IP.
- Match condition `ct_state=+new+trk` is to match the first packet from connections tracked in `CtZone`.
- Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark` in `CtZone`, indicating that this is hair-pin connection.
- Match condition `reg0=0x2/0xf` is to match `FromGatewayRegMark`, indicating that packets from connections initiated
through the the local Antrea gateway port
- Action `ct` is applied to matched packets with the commit parameter to perform SNAT and persist some ct marks in
`SNATCtZone`.
- Field `commit` means to commit connection to the connection tracking module.
- Field `table=L2ForwardingCalc` is the table where packets will be forwarded.
- Field `zone=65521` is to commit connection to `SNATCtZone`.
- Field `nat(src=169.254.0.253)` is to perform SNAT with virtual Service IP `169.254.0.253`.
- Field `exec` is to persist some ct marks in `SNATCtZone`.
- Action `set_field:0x10/0x10->ct_mark` is to load `ServiceCTMark` in `SNATCtZone`, indicating that this is a
Service connection.
- Action `set_field:0x40/0x40->ct_mark` is to load `HairpinCTMark` in `SNATCtZone`, indicating that this is a
hair-pin connection.
- Flow 2 is to match packets from hair-pin connections initiated through a local Pod. Such connection should be SNAT'd
with the IP address of the local Antrea gateway.
- Match conditions `ct_state=+new+trk` and `ct_mark=0x40/0x40` are the same as flow 1.
- Match condition `reg0=0x3/0xf` is to match `FromLocalRegMark`, indicating that packets from connections initiated
through a local Pod.
- Action `ct` is the same as flow 1 except that `nat(src=10.10.0.1)` is used instead of `nat(src=169.254.0.253)` since
the connection should be SNAT'd with the IP address of the local Antrea gateway.
- Flow 3 is to match the subsequent request packets of connection whose first request packet has been committed in
`SNATCtZone`, then invoke `ct` action on the packets again to recover "tracked" state in `SNATCtZone`.
- Match condition `ct_state=-new-rpl+trk` is to match request "tracked" packets, but not new (the first packet.
- Match condition `ct_mark=0x20/0x20` is to match `ConnSNATCTMark`, indicating that the connection requires SNAT.
- Action `ct` is applied to matched packets to recover "tracked" state in `SNATCtZone`.
- Flow 4 is to match the first packet of connections (non-hairpin) destined to external Service IP initiated through the
Antrea gateway, and the Endpoint is a remote Pod, then perform SNAT in `SNATCtZone` with the Antrea gateway IP.
- Match conditions `ct_state=+new+trk` and `ct_mark=0x20/0x20` are the same as flow 3.
- Match condition `reg0=0x2/0xf` is the same as flow 2.
- Action `ct` is the same as flow 2 except that `HairpinCTMark` is not loaded since this is not a hair-pin connection.
- Flow 5 is the auto-generated flow.
Flow 1 matches the first packet of hairpin Service connections through the local Antrea gateway, identified by
`HairpinCTMark` and `FromGatewayRegMark`. It performs SNAT with the *Virtual Service IP* `169.254.0.253` and forwards
the SNAT'd packets to table [L2ForwardingCalc]. It's worth noting that before SNAT, the "tracked" state of packets is
associated with `CtZone`. After SNAT, their "track" state is associated with `SNATCtZone`, and `ServiceCTMark` and
`HairpinCTMark` persisted in `CtZone` are not accessible anymore. As a result, `ServiceCTMark` and `HairpinCTMark`
need to be persisted once again, but this time they are persisted in `SNATCtZone` for subsequent tables to consume.

Flow 2 matches the first packet of hairpin Service connection originated from local Pods, identified by `HairpinCTMark`
and `FromLocalRegMark`. It performs SNAT with the IP address of the local Antrea gateway and forwards the SNAT'd packets
to table [L2ForwardingCalc]. Similar to flow 1, `ServiceCTMark` and `HairpinCTMark` are persisted in `SNATCtZone`.

Flow 3 matches the subsequent request packets of connection whose first request packet has been performed SNAT and then
invoke `ct` action on the packets again to restore tha "tracked" state in `SNATCtZone`. The packets with appropriate
"tracked" state are forwarded to table [L2ForwardingCalc].

Flow 4 matches the first packet of Service connections requiring SNAT, identified by `ConnSNATCTMark` and
`FromGatewayRegMark`, indicating the connection is destined to an external Service IP initiated through the
Antrea gateway and the Endpoint is a remote Pod. It performs SNAT with the IP address of the local Antrea gateway and
forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to other flow 1 or 2, `ServiceCTMark` is persisted in
`SNATCtZone`.

Flow 5 is the table-miss flow.

### L2ForwardingCalc

Expand Down Expand Up @@ -1591,9 +1576,9 @@ If you dump the flows for this table, you should see the following:
Antrea gateway.
- Match condition `reg0=0x10/0xf0` is to match `ToTunnelRegMark`, indicating that packets are destined to tunnel.
- Match condition `reg0=0x40/0xf0` is to match `ToUplinkRegMark`, indicating that packets are destined to uplink.
- Flow 5 is to match packets from hair-pin connections and forward them to table [ConntrackCommit] directly to bypass
- Flow 5 is to match packets from hairpin connections and forward them to table [ConntrackCommit] directly to bypass
all tables for ingress security.
- Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark`, indicating that packets are from hair-pin connections.
- Match condition `ct_mark=0x40/0x40` is to match `HairpinCTMark`, indicating that packets are from hairpin connections.

### AntreaPolicyIngressRule

Expand Down Expand Up @@ -1769,7 +1754,7 @@ Flow 2 is table-miss flow.
This is the final table in the pipeline, responsible for handling the output of packets from OVS. It addresses the
following cases:

1. Output packets from hair-pin connections to the ingress port where the packets are received.
1. Output packets from hairpin connections to the ingress port where the packets are received.
2. Output packets to an OVS port.
3. Output packets to controller (Antrea Agent).
4. Drop packets.
Expand All @@ -1784,7 +1769,7 @@ If you dump the flows for this table, you should see the following:
5. table=Output, priority=0 actions=drop
```

Flow 1 for case 1. It matches packets from hair-pin connections by matching `HairpinCTMark`.
Flow 1 for case 1. It matches packets from hairpin connections by matching `HairpinCTMark`.

Flow 2 for case 2. It matches packets by matching `OutputToOFPortRegMark` and outputs them to the OVS port specified by
the value stored in `TargetOFPortField`.
Expand Down

0 comments on commit 4c0e78a

Please sign in to comment.