From fe8a6a53e5ef657a50ff91a7741e772845d95387 Mon Sep 17 00:00:00 2001 From: Hongliang Liu <75655411+hongliangl@users.noreply.github.com> Date: Sat, 9 Mar 2024 22:02:23 +0800 Subject: [PATCH] Update ovs-pipeline.md --- docs/design/ovs-pipeline.md | 364 ++++++++++++++++++------------------ 1 file changed, 182 insertions(+), 182 deletions(-) diff --git a/docs/design/ovs-pipeline.md b/docs/design/ovs-pipeline.md index ba15c294136..c3f5675b907 100644 --- a/docs/design/ovs-pipeline.md +++ b/docs/design/ovs-pipeline.md @@ -30,9 +30,9 @@ The document references version v1.15 of Antrea. to `ClientIP` (default is `None`). See [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service/) for more information about session affinity. -### Openflow +### OpenFlow -- *table-miss flow*: a "catch-all" flow in a OpenFlow table, which is used if no other flow is matched. If the table-miss +- *table-miss flow*: a "catch-all" flow in an OpenFlow table, which is used if no other flow is matched. If the table-miss flow does not exist, by default packets unmatched by flow entries are dropped (discarded). - *action `conjunction`*: an efficient way in OVS to implement conjunctive matches, that is a match for which we have multiple fields, each one with a set of acceptable values. See [OVS @@ -41,18 +41,18 @@ The document references version v1.15 of Antrea. the switch". That is, if a flow uses this action, then the packets in the flow go through the switch in the same way that they would if OpenFlow was not configured on the switch. Antrea uses this action to process ARP traffic as a regular learning L2 switch would. -- *action `group`*: an action which is used to process forwarding decisions on multiple OVS ports. Examples include: +- *action `group`*: an action that is used to process forwarding decisions on multiple OVS ports. Examples include: load-balancing, multicast, and active/standby. See [OVS group action](https://docs.openvswitch.org/en/latest/ref/ovs-actions.7/#the-group-action)for more information. - *action `IN_PORT`*: an action to output packets to the port on which it was received. This is the only standard way to output the packet to the input port. -- *action `ct`*: an action to commit connections to connection tracking module, which can be used by OVS to match on +- *action `ct`*: an action to commit connections to connection tracking module, which can be used by OVS to match the state of a TCP, UDP, ICMP, etc., connection. See the [OVS Conntrack tutorial](https://docs.openvswitch.org/en/latest/tutorials/ovs-conntrack/) for more information. - *reg mark*: a reg mark is a value stored in an OVS register, serving to convey information for a packet across the pipeline. Explore all reg marks used in the pipeline in the [Registers](#registers) section. - *ct mark*: a ct mark is a value stored in the OVS connection tracking mark, serving to convey information for a - connection throughout its lifecycle across the pipeline. Explore all ct mark used in the pipeline in the [Ct + connection throughout its lifecycle across the pipeline. Explore all ct marks used in the pipeline in the [Ct Marks](#ct-marks) section. - *ct label*: it is similar to *ct label*, serving to convey information for a connection throughout its lifecycle across the pipeline. Explore all ct labels used in the pipeline in the [Ct Labels](#ct-labels) section. @@ -62,22 +62,22 @@ The document references version v1.15 of Antrea. ### Misc -- *dmac table*: a traditional L2 switch has a "dmac" table which maps learned destination MAC address to the appropriate - egress port. It is often the same physical table as the "smac" table (which matches on the source MAC address and - initiate MAC learning if the address is unknown). -- *Global Virtual MAC*: a virtual MAC address which is used as the destination MAC for all tunnelled traffic across all +- *dmac table*: a traditional L2 switch has a "dmac" table that maps the learned destination MAC address to the appropriate + egress port. It is often the same physical table as the "smac" table (which matches the source MAC address and + initiates MAC learning if the address is unknown). +- *Global Virtual MAC*: a virtual MAC address that is used as the destination MAC for all tunneled traffic across all Nodes. This simplifies networking by enabling all Nodes to use this MAC address instead of the actual MAC address of the appropriate remote gateway. This enables each vSwitch to act as a "proxy" for the local gateway when receiving - tunnelled traffic and directly take care of the packet forwarding. At the moment, we use a hard-coded value of + tunneled traffic and directly take care of the packet forwarding. At the moment, we use a hard-coded value of `aa:bb:cc:dd:ee:ff`. -- *Virtual Service IP*: a virtual IP address used as source IP address for hairpin Service connections through Antrea +- *Virtual Service IP*: a virtual IP address used as the source IP address for hairpin Service connections through the Antrea gateway port. At the moment, we use a hard-coded value of `169.254.0.253`. -- *Virtual NodePort DNAT IP*: a virtual IP address used as DNAT IP address for NodePort Service connections through +- *Virtual NodePort DNAT IP*: a virtual IP address used as a DNAT IP address for NodePort Service connections through Antrea gateway port. At the moment, we use a hard-coded value of `169.254.0.252`. ## Dumping the Flows / Groups -This guide includes a representative flow dump for every table in the pipeline, in order to illustrate the function of +This guide includes a representative flow dump for every table in the pipeline, to illustrate the function of each table. If you have a cluster running Antrea, you can dump the flows or groups on a given Node as follows: ```bash @@ -121,9 +121,9 @@ to the registers we use. | | | | 0x3 | ToLocalRegMark | Packet destination is local Pod port. | | | | | 0x4 | ToUplinkRegMark | Packet destination is uplink port. | | | | | 0x5 | ToBridgeRegMark | Packet destination is local bridge port. | -| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source / destination MAC address does not need to be rewritten. | -| | | | 0b1 | RewriteMACRegMark | Packet's source / destination MAC address needs to be rewritten. | -| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop / Reject) by Antrea NetworkPolicy. | +| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source/destination MAC address does not need to be rewritten. | +| | | | 0b1 | RewriteMACRegMark | Packet's source/destination MAC address needs to be rewritten. | +| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop/Reject) by Antrea NetworkPolicy. | | | bits 11-12 | APDispositionField | 0b00 | DispositionAllowRegMark | Indicates Antrea NetworkPolicy disposition: allow. | | | | | 0b01 | DispositionDropRegMark | Indicates Antrea NetworkPolicy disposition: drop. | | | | | 0b11 | DispositionPassRegMark | Indicates Antrea NetworkPolicy disposition: pass. | @@ -136,24 +136,24 @@ to the registers we use. | | bits 25-32 | PacketInOperationField | | | Field to store NetworkPolicy packetIn operation. | | NXM_NX_REG1 | bits 0-31 | TargetOFPortField | | | Egress OVS port of packet. | | NXM_NX_REG2 | bits 0-31 | SwapField | | | Swap values in flow fields in OpenFlow actions. | -| | | PacketInTableField | | | OVS table where it was decided to send packet to controller (Antrea Agent). | +| | | PacketInTableField | | | OVS table where it was decided to send packets to the controller (Antrea Agent). | | NXM_NX_REG3 | bits 0-31 | EndpointIPField | | | Field to store IPv4 address of selected Service Endpoint. | | | | APConjIDField | | | Field to store Conjunction ID for Antrea Policy. | -| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP / UDP / SCTP port of a Service's selected Endpoint. | +| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP/UDP/SCTP port of a Service's selected Endpoint. | | | bits 16-18 | ServiceEPStateField | 0b001 | EpToSelectRegMark | Packet needs to do Service Endpoint selection. | | | bits 16-18 | ServiceEPStateField | 0b010 | EpSelectedRegMark | Packet has done Service Endpoint selection. | | | bits 16-18 | ServiceEPStateField | 0b011 | EpToLearnRegMark | Packet has done Service Endpoint selection and the selected Endpoint needs to be cached. | | | bits 0-18 | EpUnionField | | | The union value of EndpointPortField and ServiceEPStateField. | -| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined to a Service of type NodePort. | +| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined for a Service of type NodePort. | | | bit 20 | | 0b1 | AntreaFlexibleIPAMRegMark | Packet is from local Antrea IPAM Pod. | | | bit 20 | | 0b0 | NotAntreaFlexibleIPAMRegMark | Packet is not from local Antrea IPAM Pod. | -| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined to a Service's external IP. | +| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined for a Service's external IP. | | | bits 22-23 | TrafficControlActionField | 0b01 | TrafficControlMirrorRegMark | Indicates packet needs to be mirrored (used by TrafficControl). | | | | | 0b10 | TrafficControlRedirectRegMark | Indicates packet needs to be redirected (used by TrafficControl). | -| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined to a Service which is using other other Service as Endpoints. | -| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined to a Service working in DSR mode. | -| | | | 0b0 | NotDSRServiceRegMark | Packet is destined to a Service working not in DSR mode. | -| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined to a Service selecting a remote non-hostNetwork Endpoint. | +| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined for a Service that is using other other Services as Endpoints. | +| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined for a Service working in DSR mode. | +| | | | 0b0 | NotDSRServiceRegMark | Packet is destined for a Service working not in DSR mode. | +| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined for a Service selecting a remote non-hostNetwork Endpoint. | | | bit 27 | | 0b1 | FromExternalRegMark | Packet is from Antrea gateway, but its source IP is not the gateway IP. | | NXM_NX_REG5 | bits 0-31 | TFEgressConjIDField | | | Egress conjunction ID hit by TraceFlow packet. | | NXM_NX_REG6 | bits 0-31 | TFIngressConjIDField | | | Ingress conjunction ID hit by TraceFlow packet. | @@ -174,8 +174,8 @@ we assign friendly names to the bits we use. | Field Range | Field Name | Ct Mark Value | Ct Mark Name | Description | |-------------|-----------------------|---------------|--------------------|-----------------------------------------------------------------| -| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is Antrea gateway port. | -| | | 0b0101 | FromBridgeCTMark | Connection source is local bridge port. | +| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is the Antrea gateway port. | +| | | 0b0101 | FromBridgeCTMark | Connection source is the local bridge port. | | bit 4 | | 0b1 | ServiceCTMark | Connection is for Service. | | | | 0b0 | NotServiceCTMark | Connection is not for Service. | | bit 5 | | 0b1 | ConnSNATCTMark | SNAT'd connection for Service. | @@ -245,7 +245,7 @@ spec: port: 3306 ``` -This Kubernetes NetworkPolicy is applied to Pods with label the `app: web`in the `default` Namespace. For these +This Kubernetes NetworkPolicy is applied to Pods with label the `app: web` in the `default` Namespace. For these Pods, it only allows TCP traffic on port 80 from Pods with label `app: client` and to Pods with label `app: db`. Because Antrea will only install OVS flows for this Kubernetes NetworkPolicy on Nodes that have Pods selected by the policy, we have scheduled an `app: web` Pod on the current Node. It received IP address `10.10.0.19` from the Antrea CNI, so you @@ -255,11 +255,11 @@ will see the IP address showed up in the OVS flows. Like Kubernetes NetworkPolicy, several tables of the pipeline are dedicated to [Kubernetes Service] (https://kubernetes.io/docs/concepts/services-networking/service/) implementation (table [NodePortMark], -[SessionAffinity], [ServiceLB] and [EndpointDNAT]). +[SessionAffinity], [ServiceLB], and [EndpointDNAT]). By enabling `proxyAll`, ClusterIP, NodePort, LoadBalancer, and ExternalIP are all supported. Otherwise, only in-cluster -ClusterIP is supported. For the present document, we will use Kubernetes Service examples below. These Services select -Pods with label `app:web` as Endpoints. +ClusterIP is supported. For the present document, we will use the sample Kubernetes Services below. These Services select +Pods with label `app: web` as Endpoints. ### ClusterIP without Endpoint @@ -366,7 +366,7 @@ spec: ### LoadBalancer -A sample LoadBalancer Service with an ingress IP `192.168.77.150`, obtained from ingress controller. +A sample LoadBalancer Service with the ingress IP `192.168.77.150`, obtained from an ingress controller. ```yaml apiVersion: v1 @@ -389,7 +389,7 @@ status: ### LoadBalancer with Session Affinity -A sample LoadBalancer Service with an ingress IP `192.168.77.151`, obtained from ingress controller, configured with +A sample LoadBalancer Service with the ingress IP `192.168.77.151`, obtained from an ingress controller, configured with session affinity. ```yaml @@ -417,7 +417,7 @@ status: ### Service with ExternalIP -A sample Service with externalIPs set to `192.168.77.200`. +A sample Service with the external IP `192.168.77.200`. ```yaml apiVersion: v1 @@ -437,7 +437,7 @@ spec: ### Service with ExternalIP and Session Affinity -A sample Service with externalIPs set to `192.168.77.201` configured with session affinity. +A sample Service with the external IP `192.168.77.201` configured with session affinity. ```yaml apiVersion: v1 @@ -503,7 +503,7 @@ spec: - action: Drop ``` -This ACNP is applied to all Pods with `app: web` label in all Namespaces. For these Pods, it only allows TCP traffic on +This ACNP is applied to all Pods with label `app: web` in all Namespaces. For these Pods, it only allows TCP traffic on port 80 from Pods with label `app: client` and to Pods with label `app: db`. Similar to Kubernetes NetworkPolicy, Antrea will only install OVS flows for this policy on Nodes that have Pods selected by the policy. We still use `app: web` Pods mentioned above as the target of this policy. @@ -574,9 +574,12 @@ spec: ## TrafficControl Implementation [TrafficControl](../traffic-control.md) is a CRD API that manages and manipulates the transmission of Pod traffic. -Antrea creates additional dedicated table [TrafficControl] to implement TrafficControl. Use the provided TrafficControls +Antrea creates a dedicated table [TrafficControl] to implement `TrafficControl`. Use the provided TrafficControls as examples throughout the remainder of this document. +This is a TrafficControl applied to Pods with label `app: web`. For these Pods, all traffic from them, both +ingress and egress, will be redirected to the port `antrea-tc-tap0`, and returned via the port `antrea-tc-tap1`. + ```yaml apiVersion: crd.antrea.io/v1alpha2 kind: TrafficControl @@ -597,8 +600,8 @@ spec: name: antrea-tc-tap1 ``` -This TrafficControl is applied to Pods with `app: app` label. For these Pods, it redirects all traffic to the port -`antrea-tc-tap0` and all traffic will be returned from the port `antrea-tc-tap1`. +This is a TrafficControl applied to Pods with label `app: db`. For these Pods, all traffic, both +ingress and egress, will be mirrored to the port `antrea-tc-tap2`. ```yaml apiVersion: crd.antrea.io/v1alpha2 @@ -617,9 +620,6 @@ spec: name: antrea-tc-tap2 ``` -This TrafficControl is applied to Pods with `app: db` label. For these Pods, it mirrors all traffic to the port -`antrea-tc-tap2`. - ## Egress Implementation Table [EgressMark] is dedicated to the implementation of `Egress`. @@ -723,7 +723,7 @@ Flow 1 is designed for case 1, matching ARP request packets for the MAC address that both the source hardware address and the source MAC address in the ARP reply packet are set with the *Global Virtual MAC* `aa:bb:cc:dd:ee:ff`, not the actual MAC address of the remote Antrea gateway. This ensures that once the traffic is received by the remote OVS bridge, it can be directly forwarded to the appropriate Pod without actually going through -the gateway. The *Global Virtual MAC* is used as the destination MAC address for all the traffic being tunnelled or routed. +the gateway. The *Global Virtual MAC* is used as the destination MAC address for all the traffic being tunneled or routed. This flow serves as the "ARP responder" for the peer Node whose local Pod subnet is `10.10.1.0/24`. If we were to look at the routing table for the local Node, we would find the following "onlink" route: @@ -744,11 +744,11 @@ Flow 3 is the table-miss flow, which should be never used since ARP packets will ### Classifier -This table is designed to determine the "category" of packets by matching on the ingress port of the packets. It +This table is designed to determine the "category" of packets by matching the ingress port of the packets. It addresses specific cases: -1. Packets originated from the local Antrea gateway, requiring IP spoof legitimacy verification. -2. Packets originated from external network through Antrea gateway port. +1. Packets originating from the local Antrea gateway, requiring IP spoof legitimacy verification. +2. Packets originating from the external network through the Antrea gateway port. 3. Packets through an overlay tunnel. 4. Packets through a TrafficControl return port defined in a user-provided TrafficControl CR (for feature TrafficControl). 5. Packets through Antrea layer 7 tap port (for feature L7NetworkPolicy). @@ -767,13 +767,13 @@ If you dump the flows of this table, you may see the following: 8. table=Classifier, priority=0 actions=drop ``` -Flow 1 is designed for case 1, matching the source IP address `10.10.0.1` to ensure that the packets are originated from -the local Antrea gateway. `FromGatewayRegMark` is loaded to mark packet source, which is consumed in tables +Flow 1 is designed for case 1, matching the source IP address `10.10.0.1` to ensure that the packets are originating from +the local Antrea gateway. `FromGatewayRegMark` is loaded to mark the packet source, which is consumed in tables [L3Forwarding], [L3DecTTL], [SNATMark] and [SNAT]. -Flow 2 is designed for case 2, matching packets originated from the external network through Antrea gateway port and -forwarding them to table [SpoofGuard]. Since packets originated from the local Antrea gateway are matched by flow 1, -flow 2 can only match packets originated from the external network. The following reg marks are loaded: +Flow 2 is designed for case 2, matching packets originating from the external network through Antrea gateway port and +forwarding them to table [SpoofGuard]. Since packets originating from the local Antrea gateway are matched by flow 1, +flow 2 can only match packets originating from the external network. The following reg marks are loaded: - `FromGatewayRegMark`, the same as flow 1. - `FromExternalRegMark`, indicating that the packets are from the external network, not the local Node. @@ -785,22 +785,22 @@ packets from the tunnel should be seamlessly forwarded to table [UnSNAT]. The fo - `FromTunnelRegMark`, to mark packet source, consumed in table [L3Forwarding]. - `RewriteMACRegMark`, indicating that the source and destination MAC addresses of the packets should be rewritten, - consumed in table [L3Forwarding]. + and consumed in table [L3Forwarding]. Flow 4 is for case 4, matching packets from a TrafficControl return port and forwarding them to table [L3Forwarding] -to decide egress port. It's important to note that both the source and destination MAC addresses of the packets have been -set to the expected state before redirecting the packets to TrafficControl target port in table [Output]. The only -purpose of forwarding the packets to table [L3Forwarding] is to load tunnel destination IP for packets destined to remote -Nodes. This ensures that the returned packets destined to remote Nodes are forwarded through the tunnel. -`FromTCReturnRegMark`, which is used in table [TrafficControl], is loaded to mark packet source. +to decide the egress port. It's important to note that both the source and destination MAC addresses of the packets have been +set to the expected state before redirecting the packets to the TrafficControl target port in table [Output]. The only +purpose of forwarding the packets to table [L3Forwarding] is to load tunnel destination IP for packets destined for remote +Nodes. This ensures that the returned packets destined for remote Nodes are forwarded through the tunnel. +`FromTCReturnRegMark`, which is used in table [TrafficControl], is loaded to mark the packet source. -Flow 5 is for case 5. It matches packets from Antrea layer 7 tap port and forwards them to table [L3Forwarding] to decide -egress port. Like flow 4, the only purpose of forwarding the packets to table [L3Forwarding] is to load tunnel -destination IP for packets destined to remote Nodes. `FromTCReturnRegMark`, which is used in table [TrafficControl], -is also loaded to mark packet source. +Flow 5 is for case 5. It matches packets from the Antrea layer 7 tap port and forwards them to table [L3Forwarding] to decide +the egress port. Like flow 4, the only purpose of forwarding the packets to table [L3Forwarding] is to load tunnel +destination IP for packets destined for remote Nodes. `FromTCReturnRegMark`, which is used in table [TrafficControl], +is also loaded to mark the packet source. -Flows 6-7 are for case 6, matching packets from local Pods and forward them to table [SpoofGuard] to do legitimacy -verification. `FromLocalRegMark`, which is used in tables [L3Forwarding] and [SNATMark], is loaded to mark packet source. +Flows 6-7 are for case 6, matching packets from local Pods and forwarding them to table [SpoofGuard] to do legitimacy +verification. `FromLocalRegMark`, which is used in tables [L3Forwarding] and [SNATMark], is loaded to mark the packet source. Flow 8 is the table-miss flow to drop packets that are not matched by flows 1-7. @@ -823,18 +823,18 @@ If you dump the flows for this table, you may see the following: 5. table=SpoofGuard, priority=0 actions=drop ``` -Flow 1 is for case 1, matching packets received from the local Antrea gateway port without checking source IP and MAC +Flow 1 is for case 1, matching packets received from the local Antrea gateway port without checking the source IP and MAC address. There are some cases where the source IP of the packets through the local Antrea gateway port is not the local Antrea gateway IP: -- When Antrea is deployed with kube-proxy, and `AntreaProxy` is not enabled, packets from local Pods destined to Services +- When Antrea is deployed with kube-proxy, and `AntreaProxy` is not enabled, packets from local Pods destined for Services will first go through the gateway, get load-balanced by the kube-proxy data path (DNAT) then re-enter through the gateway. Then the packets are received on the gateway port with a source IP belonging to a local Pod. - When Antrea is deployed without kube-proxy, and both `AntreaProxy` and `proxyAll` are enabled, packets from the external - network destined to Services will be routed to OVS through the gateway without + network destined for Services will be routed to OVS through the gateway without changing source IP. -- When Antrea is deployed with kube-proxy, and `AntreaProxy` is enabled, packets from the external network destined to - Services will get load-balanced by the kube-proxy data path (DNAT), then routed to OVS through the gateway without SNAT. +- When Antrea is deployed with kube-proxy, and `AntreaProxy` is enabled, packets from the external network destined for + Services will get load-balanced by the kube-proxy data path (DNAT) and then routed to OVS through the gateway without SNAT. Flows 2-4 are for case 2, matching legitimate IP packets from local Pods. @@ -860,14 +860,14 @@ Flow 1 matches reply packets from SNAT'd Service connections with the *Virtual S action `ct` on them. This can be achieved by matching the *Virtual Service IP*. Flow 2 matches packets from SNAT'd Service connections by matching the local Antrea gateway IP `10.10.0.1` and invokes -action `ct` on them. However, this flow also matches request packets destined to the local Antrea gateway IP from +action `ct` on them. However, this flow also matches request packets destined for the local Antrea gateway IP from local Pods by accident. However, this is harmless since such connections were not committed with `SNATCtZone`, and therefore, connection tracking fields for the packets are unset values. Flow 3 is the table-miss flow. For reply packets from SNAT'd connections, after invoking action `ct`, the destination IP of the packets will be -restored to the original IP before SNAT, and the SNAT IP is stored in connection tracking field `ct_nw_dst`. +restored to the original IP before SNAT, and the SNAT IP is stored in the connection tracking field `ct_nw_dst`. ### ConntrackZone @@ -887,13 +887,13 @@ If you dump the flows for this table, you may see the following: Flow 1 invokes `ct` action on packets from all connections, and the packets are then forwarded to table [ConntrackStateTable] with the "tracked" state associated with `CtZone`. For request / reply packets from DNAT'd Service -connections, the destination / source IP of the packets will be restored to the original IP before DNAT. +connections, the destination/source IP of the packets will be restored to the original IP before DNAT. -Flow 2 is table-miss flow that should remain unused. +Flow 2 is the table-miss flow that should remain unused. ### ConntrackState -This table handles packets from the connections that have "tracked" state associated with `CtZone`. It addresses +This table handles packets from the connections that have a "tracked" state associated with `CtZone`. It addresses specific cases: 1. Dropping invalid packets reported by conntrack. @@ -927,7 +927,7 @@ Flow 4 is the table-miss flow for case 3, matching packets from all new connecti ### PreRoutingClassifier -This table handles the first packet from uncommitted Service connections prior to Service Endpoint selection. It +This table handles the first packet from uncommitted Service connections before Service Endpoint selection. It sequentially resubmits the packets to tables [NodePortMark] and [SessionAffinity] to do some pre-processing, including the loading of specific reg marks. Subsequently, it forwards the packets to table [ServiceLB] to perform Service Endpoint selection. @@ -939,14 +939,14 @@ If you dump the flows for this table, you should see the following: 2. table=PreRoutingClassifier, priority=0 actions=goto_table:NodePortMark ``` -Flow 1 sequentially resubmits packets to tables [NodePortMark], [SessionAffinity] and [ServiceLB]. Note that packets +Flow 1 sequentially resubmits packets to tables [NodePortMark], [SessionAffinity], and [ServiceLB]. Note that packets are forwarded to table [ServiceLB] finally. In tables [NodePortMark] and [SessionAffinity], only reg marks are loaded. Flow 2 is the table-miss flow that should remain unused. ### NodePortMark -This table is design to potentially mark packets destined to NodePort Services. It is only created when `proxyAll` is +This table is designed to potentially mark packets destined for NodePort Services. It is only created when `proxyAll` is enabled. If you dump the flows for this table, you may see the following: @@ -958,17 +958,17 @@ If you dump the flows for this table, you may see the following: 4. table=NodePortMark, priority=0 actions=goto_table:SessionAffinity ``` -Flows 1-2 match packets destined to the local Node from local Pods. `NodePortRegMark` is loaded, indicating that the -packets are potentially destined to NodePort Services. +Flows 1-2 match packets destined for the local Node from local Pods. `NodePortRegMark` is loaded, indicating that the +packets are potentially destined for NodePort Services. -Flow 3 match packets destined to the *Virtual NodePort DNAT IP*. Packets destined to NodePort Services from the local -Node or the external network are DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline. +Flow 3 match packets destined for the *Virtual NodePort DNAT IP*. Packets destined for NodePort Services from the local +Node or the external network is DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline. Flow 4 is the table-miss flow. Note that packets of NodePort Services have not been identified in this table by matching destination IP address. The identification of NodePort Services will be done finally in table [ServiceLB] by matching `NodePortRegMark` and the -specific destination port of a NodePort. +the specific destination port of a NodePort. ### SessionAffinity @@ -989,15 +989,15 @@ Affinity], to implement Service session affinity. To implement session affinity, - The hard timeout of the learned flow should be equal to the value of `service.spec.sessionAffinityConfig.clientIP.timeoutSeconds` defined in the Service. This means that during the hard timeout, this flow is present in the pipeline, and the session affinity of the Service takes effect during the timeout. -- Source IP address, destination IP address, destination port and transparent protocol are used to match packets of - connections sourced from the same client and destined to the Service during the timeout. +- Source IP address, destination IP address, destination port, and transparent protocol are used to match packets of + connections sourced from the same client and destined for the Service during the timeout. - Endpoint IP address and Endpoint port are loaded into `EndpointIPField` and `EndpointPortField` respectively. - `EpSelectedRegMark` is loaded, indicating that the Service Endpoint selection is done, and then the packets will be only matched by the last flow in table [ServiceLB]. - `RewriteMACRegMark`, which will be consumed in table [L3Forwarding], is loaded here, indicating that the source and destination MAC addresses of the packets should be rewritten. -Flow 2 is the table-miss flow to match the first packet of connections destined to Services. The loading of +Flow 2 is the table-miss flow to match the first packet of connections destined for Services. The loading of `EpToSelectRegMark`, to be consumed in table [ServiceLB], indicating that the packet needs to do Service Endpoint selection. @@ -1042,42 +1042,42 @@ If you dump the flows for this table, you may see the following: 14. table=ServiceLB, priority=0 actions=goto_table:EndpointDNAT ``` -Flow 1 or flow 2 is designed for case 1, matching the first packet of connections destined to a ClusterIP Service. This -is achieved by matching `EpToSelectRegMark`, which is loaded in table [SessionAffinity], clusterIP and port. The target +Flow 1 or flow 2 is designed for case 1, matching the first packet of connections destined for a ClusterIP Service. This +is achieved by matching `EpToSelectRegMark`, which is loaded in table [SessionAffinity], clusterIP, and port. The target of the packet is an OVS group where the Endpoint will be selected. Before forwarding the packet to the OVS group, `RewriteMACRegMark`, which will be consumed in table [L3Forwarding], is loaded, indicating that the source and destination MAC addresses of the packets should be rewritten. `EpSelectedRegMark`, which will be consumed in table [EndpointDNAT], is also loaded, indicating that the Endpoint is selected. Note that the Service Endpoint selection is not completed yet, -as it will be done in the target OVS group. The action is set here for the purpose of supporting more Endpoints in an +as it will be done in the target OVS group. The action is set here to support more Endpoints in an OVS group. Refer to PR [#2101](https://github.com/antrea-io/antrea/pull/2101) for more information. -Flow 3 is the initial process for case 2, matching the first packet of connections destined to a ClusterIP Service configured +Flow 3 is the initial process for case 2, matching the first packet of connections destined for a ClusterIP Service configured with session affinity. This is achieved by matching the conditions similar to flow 1. Like flow 1, the target of the -flow is also an OVS group and `RewriteMACRegMark` is loaded. The difference is that `EpToLearnRegMark` is loaded, rather +flow is also an OVS group, and `RewriteMACRegMark` is loaded. The difference is that `EpToLearnRegMark` is loaded, rather than `EpSelectedRegMark`, indicating that the selected Endpoint needs to be cached. Flow 4 is the final process for case 2, matching the packet previously matched by flow 2, sent back from the target OVS group after selecting an Endpoint. Then a learned flow will be generated in table [SessionAffinity] to match the packets of the subsequent connections from the same client IP, ensuring that the packets are always forwarded to the same Endpoint -selected by the first time. `EpSelectedRegMark`, which will be used in table [EndpointDNAT], is loaded, indicating that +selected the first time. `EpSelectedRegMark`, which will be used in table [EndpointDNAT], is loaded, indicating that Service Endpoint selection has been done. -Flow 5 is for case 3, matching the first packet of connections destined to a NodePort Service. This is achieved by matching -`EpToSelectRegMark` loaded in table [SessionAffinity], `NodePortRegMark` loaded in table [NodePortMark] and NodePort port. +Flow 5 is for case 3, matching the first packet of connections destined for a NodePort Service. This is achieved by matching +`EpToSelectRegMark` loaded in table [SessionAffinity], `NodePortRegMark` loaded in table [NodePortMark], and NodePort port. Similar to flows 1-2, `RewriteMACRegMark` and `EpSelectedRegMark` are also loaded. -Flows 6-7 are for case 4, processing the first packet of connections destined to a NodePort Service configured with +Flows 6-7 are for case 4, processing the first packet of connections destined for a NodePort Service configured with session affinity, similar to flows 3-4. -Flow 8 is for case 5, processing the first packet of connections destined to a LoadBalancer Service, similar to flow 1. +Flow 8 is for case 5, processing the first packet of connections destined for a LoadBalancer Service, similar to flow 1. -Flows 9-10 are for case 6, processing the first packet of connections destined to a LoadBalancer Service configured with +Flows 9-10 are for case 6, processing the first packet of connections destined for a LoadBalancer Service configured with session affinity, similar to flows 3-4. -Flow 11 is for case 7, processing the first packet of connections destined to a Service with an external IP, similar to +Flow 11 is for case 7, processing the first packet of connections destined for a Service with an external IP, similar to flow 1. -Flows 12-13 are for case 8, processing the first packet of connections destined to a Service configured with session +Flows 12-13 are for case 8, processing the first packet of connections destined for a Service configured with session affinity with an external IP, similar to flows 3-4. Flow 14 is the table-miss flow. @@ -1103,14 +1103,14 @@ Endpoints share the same target group as group 9. Group 2 is the destination of packets matched by flow 2, designed for a Service with Endpoints. The group has 2 buckets, indicating the availability of 2 selectable Endpoints. Each bucket has an equal chance of being chosen since they have -the same weights. For every bucket, Endpoint IP and Endpoint port are loaded into `EndpointIPField` and +the same weights. For every bucket, the Endpoint IP and Endpoint port are loaded into `EndpointIPField` and `EndpointPortField`, respectively. These loaded values are consumed in table [EndpointDNAT], the destination of the -packets, where DNAT is performed. It's worth noting that `RemoteEndpointRegMark` is loaded for remote Endpoint, +packets, where DNAT is performed. It's worth noting that `RemoteEndpointRegMark` is loaded for remote Endpoints, such as in bucket 1. Group 3 is the destination of packets matched by flow 3, designed for a Service that has Endpoints and is configured with session affinity. The group closely resembles group 2, except that the destination of the packets is table -[ServiceLB], rather than table [EndpointDNAT]. After sent back to table [ServiceLB], they will be matched by flow 4. +[ServiceLB], rather than table [EndpointDNAT]. After being sent back to table [ServiceLB], they will be matched by flow 4. ### EndpointDNAT @@ -1126,18 +1126,18 @@ If you dump the flows for this table, you may see the following:: 5. table=EndpointDNAT, priority=0 actions=goto_table:AntreaPolicyEgressRule ``` -Flow 1 is designed for Services without Endpoints. It identifies the first packet of connections destined to such Service -by matching `SvcNoEpRegMark`. Subsequently, the packet is forwarded to the Openflow controller (Antrea Agent) for +Flow 1 is designed for Services without Endpoints. It identifies the first packet of connections destined for such Service +by matching `SvcNoEpRegMark`. Subsequently, the packet is forwarded to the OpenFlow controller (Antrea Agent) for further processing. Flows 2-3 are designed for Services that have selected an Endpoint. These flows identify the first packet of connections -destined to such Services by matching `EndpointPortField`, which stores the Endpoint IP, and `EpUnionField` (a combination -of `EndpointPortField` storing the Endpoint port and `EpSelectedRegMark`). Then `ct` action is invoked on packet, +destined for such Services by matching `EndpointPortField`, which stores the Endpoint IP, and `EpUnionField` (a combination +of `EndpointPortField` storing the Endpoint port and `EpSelectedRegMark`). Then `ct` action is invoked on the packet, performing DNAT'd and forwarding it to table [ConntrackStateTable] with the "tracked" state associated with `CtZone`. It is worth noting that some bits of ct mark are persisted: - `ServiceCTMark`, to be consumed in tables [L3Forwarding] and [ConntrackCommit], indicating that the current packet and - subsequent packets of the connection is for a Service. + subsequent packets of the connection are for a Service. - The value of `PktSourceField` is persisted to `ConnSourceCTMarkField`, storing the source of the connection for the current packet and subsequent packets of the connection. @@ -1145,7 +1145,7 @@ It is worth noting that some bits of ct mark are persisted: This table is used to implement the egress rules across all Antrea-native NetworkPolicies, except for NetworkPolicies that are created in the Baseline Tier. Antrea-native NetworkPolicies created in the Baseline Tier will be enforced after -Kubernetes NetworkPolicies, and their egress rules are installed in tables [EgressDefault] and [EgressRule] respectively, +Kubernetes NetworkPolicies and their egress rules are installed in tables [EgressDefault] and [EgressRule] respectively, i.e. ```text @@ -1156,9 +1156,9 @@ Antrea-native NetworkPolicy other Tiers -> AntreaPolicyEgressRule Antrea-native NetworkPolicy relies on the OVS built-in `conjunction` action to implement policies efficiently. This enables us to do a conjunctive match across multiple dimensions (source IP, destination IP, port) efficiently without -"exploding" the number of flows. For our use-case, we have at most 3 dimensions. +"exploding" the number of flows. For our use case, we have at most 3 dimensions. -The only requirements on `conj_id` is for it to be a unique 32-bit integer within the table. At the moment we use a +The only requirement on `conj_id` is for it to be a unique 32-bit integer within the table. At the moment we use a single custom allocator, which is common to all tables that can have NetworkPolicy flows installed ([AntreaPolicyEgressRule], [EgressRule], [EgressDefault], [AntreaPolicyIngressRule], [IngressRule] and [EgressDefault]). @@ -1198,9 +1198,9 @@ flows are described as follows: - Flow 6 is utilized to match packets meeting all the three dimensions of `conjunction` with `conj_id` 8 and forward them to table [EgressMetric], persisting `conj_id` to `EgressRuleCTLabel` that is consumed in table [EgressMetric]. -Flows 7-9, whose priorities are all 44899, are installed for the egress rule with an `Drop` action defined after rule -`AllowToDb` in the sample policy, serving as a default rule. Unlike the default of Kubernetes NetworkPolicy, -Antrea-native NetworkPolicy have no default rule, and all rules should be explicitly defined. Hence, they are evaluated +Flows 7-9, whose priorities are all 44899, are installed for the egress rule with a `Drop` action defined after the rule +`AllowToDb` in the sample policy, serves as a default rule. Unlike the default of Kubernetes NetworkPolicy, +Antrea-native NetworkPolicy has no default rule, and all rules should be explicitly defined. Hence, they are evaluated as-is, and there is no need for a table [AntreaPolicyEgressDefault]. These flows are described as follows: - Flow 7 is utilized to match packets with the source IP address in set {10.10.0.24}, which is from the Pods selected @@ -1213,7 +1213,7 @@ Flow 10 is the table-miss flow to forward packets, which are not matched by othe ### EgressRule -For this table, you will need to keep mind the Kubernetes NetworkPolicy +For this table, you will need to keep in mind the Kubernetes NetworkPolicy [specification](#kubernetes-networkpolicy-implementation) that we are using. This table is used to implement the egress rules across all Kubernetes NetworkPolicies. If you dump the flows for this @@ -1227,7 +1227,7 @@ table, you may see the following: 5. table=EgressRule, priority=0 actions=goto_table:EgressDefaultRule ``` -Flows 1-4 are installed for egress rule in the sample Kubernetes NetworkPolicy. These flows are described as follows: +Flows 1-4 are installed for the egress rule in the sample Kubernetes NetworkPolicy. These flows are described as follows: - Flow 1 is utilized to match packets with the source IP address in set {10.10.0.24}, which has all IP addresses of the Pods selected by label `app: web`, constituting the first dimension for `conjunction` with `conj_id` 2. @@ -1243,10 +1243,10 @@ Flow 5 is the table-miss flow to forward packets, which are not matched by other ### EgressDefault This table complements table [EgressRule] for Kubernetes NetworkPolicy egress rule implementation. In Kubernetes, when a -NetworkPolicy is applied to a set of Pods, the default behavior for these Pods become "deny" (it becomes an [isolated Pod]( +NetworkPolicy is applied to a set of Pods, and the default behavior for these Pods becomes "deny" (it becomes an [isolated Pod]( https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods)). This table is in charge of dropping traffic originating from Pods to which a NetworkPolicy (with an egress rule) is -applied, and which did not match any of the allow list rules. +applied, and which did not match any of the allowed list rules. If you dump the flows for this table, you may see the following: @@ -1255,10 +1255,10 @@ If you dump the flows for this table, you may see the following: 2. table=EgressDefaultRule, priority=0 actions=goto_table:EgressMetric ``` -Flow 1, based on our sample Kubernetes NetworkPolicy, is to drop traffic originated from 10.10.0.24, an IP address +Flow 1, based on our sample Kubernetes NetworkPolicy, is to drop traffic originating from 10.10.0.24, an IP address associated with Pods selected by label `app: web`. -Flow 2 is table-miss flow to forwards packets to table [EgressMetric]. +Flow 2 is the table-miss flow to forward packets to table [EgressMetric]. This table is also used to implement Antrea-native NetworkPolicy egress rules that are created in the Baseline Tier. Since the Baseline Tier is meant to be enforced after Kubernetes NetworkPolicies, the corresponding flows will be created @@ -1291,7 +1291,7 @@ Flow 5 serves as the default drop rule for the sample Antrea-native NetworkPolic the egress rule and loaded in table [AntreaPolicyEgressRule] flow 9. It's worth noting that ct label is used in flows 1-4, while reg is used in flow 5. The distinction lies in the fact that -value persisted in ct label can be read throughout the entire lifecycle of a connection, but reg mark is only valid for +the value persisted in the ct label can be read throughout the entire lifecycle of a connection, but the reg mark is only valid for the current packet. For a connection permitted by a rule, all its packets should be collected for metrics, thus a ct label is used. For a connection denied or dropped by a rule, the first packet will be blocked, therefore a reg is enough. @@ -1316,21 +1316,21 @@ If you dump the flows for this table, you may see the following: 11. table=L3Forwarding, priority=0 actions=set_field:0x20/0xf0->reg0,goto_table:L2ForwardingCalc ``` -Flow 1 matches packets destined to the local Antrea gateway IP, rewrites them destination MAC address to that of the -local Antrea gateway, loads `ToGatewayRegMark` and forwards them to table [L3DecTTL] to decrease TTL value. It's worth -noting that the action of rewriting destination MAC address is not necessary but not harmful for Pod-to-gateway request +Flow 1 matches packets destined for the local Antrea gateway IP, rewrites their destination MAC address to that of the +local Antrea gateway, loads `ToGatewayRegMark`, and forwards them to table [L3DecTTL] to decrease TTL value. It's worth +noting that the action of rewriting the destination MAC address is not necessary but not harmful for Pod-to-gateway request packets because the destination MAC address is already the local gateway MAC address. However, it is utilized by the feature AntreaIPAM, which is not enabled by default. Flow 2 matches reply packets with corresponding ct "tracked" states and `FromGatewayCTMark` from connections initiated through the local Antrea gateway. In other words, these are connections for which the first packet of the connection -(SYN packet for TCP) was received through the local Antrea gateway. It rewrites them destination MAC address to -that of the local Antrea gateway, loads `ToGatewayRegMark` and forwards them to table [L3DecTTL]. This ensures that +(SYN packet for TCP) was received through the local Antrea gateway. It rewrites the destination MAC address to +that of the local Antrea gateway, loads `ToGatewayRegMark`, and forwards them to table [L3DecTTL]. This ensures that reply packets can be forwarded back to the local Antrea gateway in subsequent tables, guaranteeing the availability of the connection. This flow is required to handle the following cases when AntreaProxy is not disabled: - Reply traffic for connections from a local Pod to a ClusterIP Service, which are handled by kube-proxy and go through - DNAT. In this case the destination IP address of the reply traffic is the Pod which initiated the connection to the + DNAT. In this case, the destination IP address of the reply traffic is the Pod which initiated the connection to the Service (no SNAT by kube-proxy). We need to make sure that these packets are sent back through the gateway so that the source IP can be rewritten to the ClusterIP ("undo" DNAT). If we do not use connection tracking and do not rewrite the destination MAC, reply traffic from the backend will go directly to the originating Pod without going first @@ -1342,51 +1342,51 @@ of the connection. This flow is required to handle the following cases when Antr is implemented. Flow 3 matches packets from intra-Node connections (excluding Service connections) and marked with -`NotRewriteMACRegMark` that indicates that the destination and source MACs of packets should not be overwritten and +`NotRewriteMACRegMark`, indicating that the destination and source MACs of packets should not be overwritten, and forwards them to table [L2ForwardingCalc] instead of table [L3DecTTL]. The deviation is due to local Pods connections not traversing any router device or undergoing NAT process. For packets from Service or inter-Node connections, `RewriteMACRegMark`, mutually exclusive with `NotRewriteMACRegMark`, is loaded. Therefore, the packets will not be matched by the flow. -Flow 4 is designed to match packets destined to remote Pod CIDR. This involves installing a separate flow for each remote +Flow 4 is designed to match packets destined for remote Pod CIDR. This involves installing a separate flow for each remote Node, with each flow matching the destination IP address of the packets against the Pod subnet for the respective Node. For the matched packets, the source MAC address is set to that of the local Antrea gateway MAC, and the destination -MAC address is set to the *Global Virtual MAC*. The Openflow `tun_dst` field is setting to the appropriate value (i.e. +MAC address is set to the *Global Virtual MAC*. The Openflow `tun_dst` field is set to the appropriate value (i.e. the IP address of the remote Node IP). Additionally, `ToTunnelRegMark` is loaded, signifying that the packets will be -forwarded to remote Nodes through a tunnel. The matched packets are then forwarded to table [L3DecTTL] to decrease TTL +forwarded to remote Nodes through a tunnel. The matched packets are then forwarded to table [L3DecTTL] to decrease the TTL value. -Flow 5-7 matches packets destined to local Pods and marked by `RewriteMACRegMark` that signifies that the packets may +Flow 5-7 matches packets destined for local Pods and marked by `RewriteMACRegMark` that signifies that the packets may originate from Service or inter-Node connections. For the matched packets, the source MAC address is set to that of the local Antrea gateway MAC, and the destination MAC address is set to the associated local Pod MAC address. The matched -packets are then forwarded to table [L3DecTTL] to decrease TTL value. +packets are then forwarded to table [L3DecTTL] to decrease the TTL value. -Flow 8 matches request packets originated from local Pods and destined to the external network, and then forwards them +Flow 8 matches request packets originating from local Pods and destined for the external network, and then forwards them to table [EgressMark] dedicated to feature Egress. In table [EgressMark], SNAT IPs for Egress are looked up for the packets. To match the expected packets, `FromLocalRegMark` is utilized to exclude packets that are not from local Pods. -Additionally, `NotAntreaFlexibleIPAMRegMark`, mutually exclusive with `NotRewriteMACRegMark` marking packets from Antrea -IPAM Pods, is also used since Egress can only be applied to Node IPAM Pods. +Additionally, `NotAntreaFlexibleIPAMRegMark`, mutually exclusive with `AntreaFlexibleIPAMRegMark` that is used to mark +packets from Antrea IPAM Pods, is used since Egress can only be applied to Node IPAM Pods. -Flow 9 matches request packets originated from remote Pods and destined to the external network, and then forwards them +Flow 9 matches request packets originating from remote Pods and destined for the external network, and then forwards them to table [EgressMark] dedicated to feature Egress. To match the expected packets, `FromTunnelRegMark` is utilized to exclude packets that are not from remote Pods through a tunnel. Considering that the packets from remote Pods traverse a tunnel, the destination MAC address of the packets, represented by the *Global Virtual MAC*, needs to be rewritten to MAC address of the local Antrea gateway. -Flow 10 matches packets from Service connections that are originated from the local Antrea gateway and destined to the -external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark` and `ServiceCTMark`. The +Flow 10 matches packets from Service connections that are originating from the local Antrea gateway and destined for the +external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark`, and `ServiceCTMark`. The destination MAC address is then set to that of the local Antrea gateway. Additionally, `ToGatewayRegMark`, which will be used with `FromGatewayRegMark` together to identify hairpin connections in table [SNATMark], is loaded. Finally, the packets are forwarded to table [L3DecTTL]. -Flow 11 is the table-miss flow, matching packets originated from local Pods and destined to the external network, and +Flow 11 is the table-miss flow, matching packets originating from local Pods and destined for the external network, and then forwarding them to table [L2ForwardingCalc]. `ToGatewayRegMark` is loaded as the matched packets traverse the local Antrea gateway. ### EgressMark -This table is dedicated to `Egress`. It includes flows to select the right SNAT IPs for egress traffic originated from -Pods and destined to the external network. +This table is dedicated to `Egress`. It includes flows to select the right SNAT IPs for egress traffic originating from +Pods and destined for the external network. If you dump the flows for this table, you may see the following: @@ -1401,31 +1401,31 @@ If you dump the flows for this table, you may see the following: 8. table=EgressMark, priority=0 actions=set_field:0x20/0xf0->reg0,goto_table:L2ForwardingCalc ``` -Flows 1-2 match packets originated from local Pods and destined to the transport IP of remote Nodes, and then forward +Flows 1-2 match packets originating from local Pods and destined for the transport IP of remote Nodes, and then forward them to table [L2ForwardingCalc] to skip Egress SNAT. `ToGatewayRegMark` is loaded, indicating that the output port of the packets is the local Antrea gateway. -Flow 3 matches packets originated from local Pods and destined to the Service CIDR, and then forwards them to table +Flow 3 matches packets originating from local Pods and destined for the Service CIDR, and then forwards them to table [L2ForwardingCalc] to skip Egress SNAT. Similar to flows 1-2, `ToGatewayRegMark` is also loaded. -Flow 4 match packets originated from local Pods selected by the sample Egress `egress-client`, whose SNAT IP is configured +Flow 4 match packets originating from local Pods selected by the sample Egress `egress-client`, whose SNAT IP is configured on a remote Node, which means that the matched packets should be forwarded to the remote Node through a tunnel. Before sending the packets to the tunnel, the source and destination MAC addresses are set to those of the local Antrea gateway and the `Global Virtual MAC`, respectively. Additionally, `ToTunnelRegMark`, indicating that the output port is a tunnel, and `EgressSNATRegMark`, indicating that packets should undergo SNAT on a remote Node, are loaded. Finally, the packets are forwarded to table [L2ForwardingCalc]. -Flow 5 matches the first packet of connections originated from remote Pods selected by the sample Egress `egress-web` +Flow 5 matches the first packet of connections originating from remote Pods selected by the sample Egress `egress-web` whose SNAT IP is configured on the local Node, and then loads an 8-bit ID allocated for the associated SNAT IP defined in the sample Egress to the `pkt_mark`, which will be identified by iptables on the local Node to perform SNAT with the SNAT IP. Subsequently, `ToGatewayRegMark`, indicating that the output port is the local Antrea gateway, is loaded. Finally, the packets are forwarded to table [L2ForwardingCalc]. -Flow 6 matches the first packet of connections originated from local Pods selected by the sample Egress `egress-web`, +Flow 6 matches the first packet of connections originating from local Pods selected by the sample Egress `egress-web`, whose SNAT IP is configured on the local Node. Similar to flow 4, the 8-bit ID is loaded to `pkt_mark`, `ToGatewayRegMark` is loaded, and the packets are forwarded to table [L2ForwardingCalc] finally. -Flow 7 drops packets tunnelled from remote Nodes (identified with `FromTunnelRegMark`, indicating that the packets are +Flow 7 drops packets tunneled from remote Nodes (identified with `FromTunnelRegMark`, indicating that the packets are from remote Pods through a tunnel). The packets are not matched by any flows 1-6, which means that they are here unexpectedly and should be dropped. @@ -1444,13 +1444,13 @@ If you dump the flows for this table, you may see the following: 3. table=L3DecTTL, priority=0 actions=goto_table:SNATMark ``` -Flow 1 matches packets with `FromGatewayRegMark`, which means that these packets enter OVS pipeline from the local -Antrea gateway , as the host IP stack should have decremented the TTL already for such packets, TTL should not be +Flow 1 matches packets with `FromGatewayRegMark`, which means that these packets enter the OVS pipeline from the local +Antrea gateway, as the host IP stack should have decremented the TTL already for such packets, TTL should not be decremented again. Flow 2 is to decrement TTL for packets which are not matched by flow 1. -Flow 3 is table-miss flow that should remain unused. +Flow 3 is the table-miss flow that should remain unused. ### SNATMark @@ -1467,16 +1467,16 @@ If you dump the flows for this table, you may see the following: ``` Flow 1 matches the first packet of hairpin Service connections, identified by `FromGatewayRegMark` and `ToGatewayRegMark`, -indicating that both input and output ports of the connections are the local Antrea gateway port. Such hairpin +indicating that both the input and output ports of the connections are the local Antrea gateway port. Such hairpin connections will undergo SNAT with the *Virtual Service IP* in table [SNAT]. Before forwarding the packets to table [SNAT], `ConnSNATCTMark`, indicating that the connection requires SNAT, and `HairpinCTMark` indicating that this is a hairpin connection, are persisted to mark the connections. These two ct marks will be consumed in table [SNAT]. Flow 2 matches the first packet of Service connections requiring SNAT, identified by `FromGatewayRegMark` and -`ToTunnelRegMark`, indicating that the input port is the local Antrea gateway and output port is a tunnel. Such +`ToTunnelRegMark`, indicating that the input port is the local Antrea gateway and the output port is a tunnel. Such connections will undergo SNAT with the IP address of the local Antrea gateway in table [SNAT]. Before forwarding the packets to table [SNAT], `ToExternalAddressRegMark` and `NotDSRServiceRegMark` are loaded, indicating that the packets -are destined to a Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode. +are destined for a Service's external IP, like NodePort, LoadBalancerIP or ExternalIP, but it is not DSR mode. Additionally, `ConnSNATCTMark`, indicating that the connection requires SNAT, is persisted to mark the connections. Flow 3-4 match the first packet of hairpin Service connections, identified by the same source and destination IP @@ -1502,20 +1502,20 @@ If you dump the flows for this table, you should see the following: Flow 1 matches the first packet of hairpin Service connections through the local Antrea gateway, identified by `HairpinCTMark` and `FromGatewayRegMark`. It performs SNAT with the *Virtual Service IP* `169.254.0.253` and forwards the SNAT'd packets to table [L2ForwardingCalc]. It's worth noting that before SNAT, the "tracked" state of packets is -associated with `CtZone`. After SNAT, their "track" state is associated with `SNATCtZone`, and `ServiceCTMark` and +associated with `CtZone`. After SNAT, their "track" state is associated with `SNATCtZone`, `ServiceCTMark` and `HairpinCTMark` persisted in `CtZone` are not accessible anymore. As a result, `ServiceCTMark` and `HairpinCTMark` need to be persisted once again, but this time they are persisted in `SNATCtZone` for subsequent tables to consume. -Flow 2 matches the first packet of hairpin Service connection originated from local Pods, identified by `HairpinCTMark` +Flow 2 matches the first packet of hairpin Service connection originating from local Pods, identified by `HairpinCTMark` and `FromLocalRegMark`. It performs SNAT with the IP address of the local Antrea gateway and forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to flow 1, `ServiceCTMark` and `HairpinCTMark` are persisted in `SNATCtZone`. Flow 3 matches the subsequent request packets of connection whose first request packet has been performed SNAT and then -invoke `ct` action on the packets again to restore tha "tracked" state in `SNATCtZone`. The packets with appropriate +invoke `ct` action on the packets again to restore the "tracked" state in `SNATCtZone`. The packets with the appropriate "tracked" state are forwarded to table [L2ForwardingCalc]. Flow 4 matches the first packet of Service connections requiring SNAT, identified by `ConnSNATCTMark` and -`FromGatewayRegMark`, indicating the connection is destined to an external Service IP initiated through the +`FromGatewayRegMark`, indicating the connection is destined for an external Service IP initiated through the Antrea gateway and the Endpoint is a remote Pod. It performs SNAT with the IP address of the local Antrea gateway and forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to other flow 1 or 2, `ServiceCTMark` is persisted in `SNATCtZone`. @@ -1538,16 +1538,16 @@ If you dump the flows for this table, you may see the following: 6. table=L2ForwardingCalc, priority=0 actions=goto_table:TrafficControl ``` -Flow 1 matches packets destined to the local Antrea gateway, identified by the destination MAC address being that of +Flow 1 matches packets destined for the local Antrea gateway, identified by the destination MAC address being that of the local Antrea gateway. It loads `OutputToOFPortRegMark`, indicating that the packets should output to an OVS port, -and the port number of the local Antrea gateway to `TargetOFPortField`. Both of thees two values will be consumed in +and the port number of the local Antrea gateway to `TargetOFPortField`. Both of these two values will be consumed in table [Output]. -Flow 2 matches packets destined to a tunnel, identified by the destination MAC address being that of the *Global Virtual +Flow 2 matches packets destined for a tunnel, identified by the destination MAC address being that of the *Global Virtual MAC*. Similar to flow 1, `OutputToOFPortRegMark` is loaded, and the port number of the tunnel is loaded to `TargetOFPortField`. -Flows 3-5 matches packets destined to local Pods, identified by the destination MAC address being that of the local +Flows 3-5 match packets destined for local Pods, identified by the destination MAC address being that of the local Pods. Similar to flow 1, `OutputToOFPortRegMark` is loaded, and the port number of the tunnel is loaded to `TargetOFPortField`. @@ -1573,16 +1573,16 @@ are output to the port to which they are destined. To identify such packets, `Ou the packets should be output to an OVS port, and `FromTCReturnRegMark` loaded in table [Classifier], indicating that the packets are from a TrafficControl return port, are utilized. -Flow 2 is installed for the sample TrafficControl `redirect-web-to-local`, which marks the packets destined to the Pods -labelled by `app: web` with `TrafficControlRedirectRegMark`, indicating the packets should be redirected to a +Flow 2 is installed for the sample TrafficControl `redirect-web-to-local`, which marks the packets destined for the Pods +labeled by `app: web` with `TrafficControlRedirectRegMark`, indicating the packets should be redirected to a TrafficControl target port whose number is loaded to `TrafficControlTargetOFPortField`. Flow 2 is also installed for the sample TrafficControl `redirect-web-to-local`. Similar to flow 2, `TrafficControlRedirectRegMark` is loaded and the TrafficControl target port whose number is loaded to `TrafficControlTargetOFPortField`. -Flow 4 is installed for the sample TrafficControl `mirror-db-to-local`, which marks the packets destined to the Pods -labelled by `app: db` with `TrafficControlMirrorRegMark`, indicating the packets should be mirrored to a +Flow 4 is installed for the sample TrafficControl `mirror-db-to-local`, which marks the packets destined for the Pods +labeled by `app: db` with `TrafficControlMirrorRegMark`, indicating the packets should be mirrored to a TrafficControl target port whose number is loaded to `TrafficControlTargetOFPortField`. Flow 5 is also installed for the sample TrafficControl `redirect-web-to-local`. Similar to flow 2, @@ -1610,15 +1610,15 @@ If you dump the flows for this table, you should see the following: Flow 1 matches locally generated request packets, identified by `pkt_mark` which is set by iptables in the host network namespace. It forwards the packets to table [ConntrackCommit] directly to bypass all tables for ingress security. -Flow 2 matches packets destined to NodePort Services and forwards them to table [AntreaPolicyIngressRule] to enforce +Flow 2 matches packets destined for NodePort Services and forwards them to table [AntreaPolicyIngressRule] to enforce Antrea-native NetworkPolicies applied to NodePort Services. Without this flow, if the selected Endpoint is not a local -Pod, the packets might be matched by one in flows 3-5, skipping table [AntreaPolicyIngressRule]. +Pod, the packets might be matched by one of the flows 3-5, skipping table [AntreaPolicyIngressRule]. -Flows 3-5 matches packets destined to the local Antrea gateway, tunnel, uplink port with `ToGatewayRegMark`, +Flows 3-5 matches packets destined for the local Antrea gateway, tunnel, uplink port with `ToGatewayRegMark`, `ToTunnelRegMark` or `ToUplinkRegMark`, respectively, and forwards them to table [IngressMetric] directly to bypass all tables for ingress security. -Flow 5 matches packets from hairpin connections with `HairpinCTMark` and forward them to table [ConntrackCommit] +Flow 5 matches packets from hairpin connections with `HairpinCTMark` and forwards them to table [ConntrackCommit] directly to bypass all tables for ingress security. Refer to this PR [#5687](https://github.com/antrea-io/antrea/pull/5687) for more information. @@ -1626,8 +1626,8 @@ Flow 6 is the table-miss flow. ### AntreaPolicyIngressRule -This table is very similar to table [AntreaPolicyEgressRule], but implements the ingress rules of Antrea-native -NetworkPolicies. Depending on the tier to which the policy belongs to, the rules will be installed in a table +This table is very similar to table [AntreaPolicyEgressRule] but implements the ingress rules of Antrea-native +NetworkPolicies. Depending on the tier to which the policy belongs, the rules will be installed in a table corresponding to that tier. The ingress table to tier mappings is as follows: ```text @@ -1672,9 +1672,9 @@ These flows are described as follows: - Flow 6 is utilized to match packets meeting all the three dimensions of `conjunction` with `conj_id` 6 and forward them to table [IngressMetric], persisting `conj_id` to `IngressRuleCTLabel` that is consumed in table [IngressMetric]. -Flows 7-9, whose priorities are all 44899, are installed for the egress rule with an `Drop` action defined after rule -`AllowFromClient` in the sample policy, serving as a default rule. Unlike the default of Kubernetes NetworkPolicy, -Antrea-native NetworkPolicy have no default rule, and all rules should be explicitly defined. Hence, they are evaluated +Flows 7-9, whose priorities are all 44899, are installed for the egress rule with a `Drop` action defined after the rule +`AllowFromClient` in the sample policy, serves as a default rule. Unlike the default of Kubernetes NetworkPolicy, +Antrea-native NetworkPolicy has no default rule, and all rules should be explicitly defined. Hence, they are evaluated as-is, and there is no need for a table [AntreaPolicyIngressDefault]. These flows are described as follows: - Flow 7 is utilized to match any packets, constituting the second dimension for `conjunction` with `conj_id` 4. @@ -1687,8 +1687,8 @@ Flow 10 is the table-miss flow to forward packets, which are not matched by othe ### IngressRule -This table is very similar to table [EgressRule], but implements ingress rules for Kubernetes NetworkPolicies. Once again, -you will need to keep mind the Kubernetes NetworkPolicy [specification](#kubernetes-networkpolicy-implementation) that +This table is very similar to table [EgressRule] but implements ingress rules for Kubernetes NetworkPolicies. Once again, +you will need to keep in mind the Kubernetes NetworkPolicy [specification](#kubernetes-networkpolicy-implementation) that we are using. If you dump the flows for this table, you should see something like this: @@ -1701,7 +1701,7 @@ If you dump the flows for this table, you should see something like this: 5. table=IngressRule, priority=0 actions=goto_table:IngressDefaultRule ``` -Flows 1-4 are installed for ingress rule in the sample Kubernetes NetworkPolicy. These flows are described as follows: +Flows 1-4 are installed for the ingress rule in the sample Kubernetes NetworkPolicy. These flows are described as follows: - Flow 1 is utilized to match packets with the source IP address in set {10.10.0.26}, which is from the Pods selected by label `app: client`, constituting the first dimension for `conjunction` with `conj_id` 3. @@ -1718,9 +1718,9 @@ Flow 5 is the table-miss flow to forward packets, which are not matched by other This table is similar in its purpose to table [IngressDefault], and it complements table [IngressRule] for Kubernetes NetworkPolicy ingress rule implementation. In Kubernetes, when a NetworkPolicy is applied to a set of Pods, the default -behavior for these Pods become "deny" (it becomes an [isolated +behavior for these Pods becomes "deny" (it becomes an [isolated Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods)). This -table is in charge of dropping traffic destined to Pods to which a NetworkPolicy (with an ingress rule) is applied, +table is in charge of dropping traffic destined for Pods to which a NetworkPolicy (with an ingress rule) is applied, and which did not match any of the allow list rules. If you dump the flows for this table, you may see the following: @@ -1730,10 +1730,10 @@ If you dump the flows for this table, you may see the following: 2. table=IngressDefaultRule, priority=0 actions=goto_table:IngressMetric ``` -Flow 1, based on our sample Kubernetes NetworkPolicy, is to drop traffic destined to port 0x25, the port number +Flow 1, based on our sample Kubernetes NetworkPolicy, is to drop traffic destined for port 0x25, the port number associated with Pods selected by label `app: web`. -Flow 2 is table-miss flow to forwards packets to table [IngressMetric]. +Flow 2 is the table-miss flow to forward packets to table [IngressMetric]. This table is also used to implement Antrea-native NetworkPolicy ingress rules that are created in the Baseline Tier. Since the Baseline Tier is meant to be enforced after Kubernetes NetworkPolicies, the corresponding flows will be created @@ -1789,7 +1789,7 @@ following cases: 1. Output packets from hairpin connections to the ingress port where the packets are received. 2. Output packets to an OVS port. -3. Output packets to controller (Antrea Agent). +3. Output packets to the controller (Antrea Agent). 4. Drop packets. If you dump the flows for this table, you should see the following: @@ -1811,8 +1811,8 @@ port where they were received. Flow 2 for case 2. It matches packets by matching `OutputToOFPortRegMark` and outputs them to the OVS port specified by the value stored in `TargetOFPortField`. -Flows 3-4 for case 3. They match packets by matching `OutputToControllerRegMark` and value stored in -`PacketInOperationField`, then output them to controller (Antrea Agent) with corresponding user data. +Flows 3-4 for case 3. They match packets by matching `OutputToControllerRegMark` and the value stored in +`PacketInOperationField`, then output them to the controller (Antrea Agent) with corresponding user data. Flow 4 is the table-miss flow for case 4. It drops packets that do not match any of the flows in this table. @@ -1850,4 +1850,4 @@ Flow 4 is the table-miss flow for case 4. It drops packets that do not match any [LoadBalancer]: #loadbalancer [LoadBalancer with Session Affinity]: #loadbalancer-with-session-affinity [Service with ExternalIP]: #service-with-externalip -[Service with ExternalIP and Session Affinity]: #service-with-externalip-and-session-affinity \ No newline at end of file +[Service with ExternalIP and Session Affinity]: #service-with-externalip-and-session-affinity