diff --git a/docs/design/ovs-pipeline.md b/docs/design/ovs-pipeline.md index c3f5675b907..03be9d06bd2 100644 --- a/docs/design/ovs-pipeline.md +++ b/docs/design/ovs-pipeline.md @@ -50,15 +50,15 @@ The document references version v1.15 of Antrea. the state of a TCP, UDP, ICMP, etc., connection. See the [OVS Conntrack tutorial](https://docs.openvswitch.org/en/latest/tutorials/ovs-conntrack/) for more information. - *reg mark*: a reg mark is a value stored in an OVS register, serving to convey information for a packet across the - pipeline. Explore all reg marks used in the pipeline in the [Registers](#registers) section. + pipeline. Explore all reg marks used in the pipeline in the [OVS Registers] section. - *ct mark*: a ct mark is a value stored in the OVS connection tracking mark, serving to convey information for a connection throughout its lifecycle across the pipeline. Explore all ct marks used in the pipeline in the [Ct - Marks](#ct-marks) section. + Marks] section. - *ct label*: it is similar to *ct label*, serving to convey information for a connection throughout its lifecycle across - the pipeline. Explore all ct labels used in the pipeline in the [Ct Labels](#ct-labels) section. + the pipeline. Explore all ct labels used in the pipeline in the [Ct Labels] section. - *ct zone*: ct zone is simply used to isolate connection tracking rules. It is conceptually similar to the more generic Linux network namespaces, but ct zone is specific to conntrack and has less overhead. Explore all ct zones used in the - pipeline in the [Ct Zones](#ct-zones) section. + pipeline in the [Ct Zones] section. ### Misc @@ -121,9 +121,9 @@ to the registers we use. | | | | 0x3 | ToLocalRegMark | Packet destination is local Pod port. | | | | | 0x4 | ToUplinkRegMark | Packet destination is uplink port. | | | | | 0x5 | ToBridgeRegMark | Packet destination is local bridge port. | -| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source/destination MAC address does not need to be rewritten. | -| | | | 0b1 | RewriteMACRegMark | Packet's source/destination MAC address needs to be rewritten. | -| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop/Reject) by Antrea NetworkPolicy. | +| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source/destination MAC address does not need to be rewritten. | +| | | | 0b1 | RewriteMACRegMark | Packet's source/destination MAC address needs to be rewritten. | +| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop/Reject) by Antrea NetworkPolicy. | | | bits 11-12 | APDispositionField | 0b00 | DispositionAllowRegMark | Indicates Antrea NetworkPolicy disposition: allow. | | | | | 0b01 | DispositionDropRegMark | Indicates Antrea NetworkPolicy disposition: drop. | | | | | 0b11 | DispositionPassRegMark | Indicates Antrea NetworkPolicy disposition: pass. | @@ -136,24 +136,24 @@ to the registers we use. | | bits 25-32 | PacketInOperationField | | | Field to store NetworkPolicy packetIn operation. | | NXM_NX_REG1 | bits 0-31 | TargetOFPortField | | | Egress OVS port of packet. | | NXM_NX_REG2 | bits 0-31 | SwapField | | | Swap values in flow fields in OpenFlow actions. | -| | | PacketInTableField | | | OVS table where it was decided to send packets to the controller (Antrea Agent). | +| | | PacketInTableField | | | OVS table where it was decided to send packets to the controller (Antrea Agent). | | NXM_NX_REG3 | bits 0-31 | EndpointIPField | | | Field to store IPv4 address of selected Service Endpoint. | | | | APConjIDField | | | Field to store Conjunction ID for Antrea Policy. | -| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP/UDP/SCTP port of a Service's selected Endpoint. | +| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP/UDP/SCTP port of a Service's selected Endpoint. | | | bits 16-18 | ServiceEPStateField | 0b001 | EpToSelectRegMark | Packet needs to do Service Endpoint selection. | | | bits 16-18 | ServiceEPStateField | 0b010 | EpSelectedRegMark | Packet has done Service Endpoint selection. | | | bits 16-18 | ServiceEPStateField | 0b011 | EpToLearnRegMark | Packet has done Service Endpoint selection and the selected Endpoint needs to be cached. | | | bits 0-18 | EpUnionField | | | The union value of EndpointPortField and ServiceEPStateField. | -| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined for a Service of type NodePort. | +| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined for a Service of type NodePort. | | | bit 20 | | 0b1 | AntreaFlexibleIPAMRegMark | Packet is from local Antrea IPAM Pod. | | | bit 20 | | 0b0 | NotAntreaFlexibleIPAMRegMark | Packet is not from local Antrea IPAM Pod. | -| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined for a Service's external IP. | +| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined for a Service's external IP. | | | bits 22-23 | TrafficControlActionField | 0b01 | TrafficControlMirrorRegMark | Indicates packet needs to be mirrored (used by TrafficControl). | | | | | 0b10 | TrafficControlRedirectRegMark | Indicates packet needs to be redirected (used by TrafficControl). | -| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined for a Service that is using other other Services as Endpoints. | -| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined for a Service working in DSR mode. | -| | | | 0b0 | NotDSRServiceRegMark | Packet is destined for a Service working not in DSR mode. | -| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined for a Service selecting a remote non-hostNetwork Endpoint. | +| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined for a Service that is using other other Services as Endpoints. | +| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined for a Service working in DSR mode. | +| | | | 0b0 | NotDSRServiceRegMark | Packet is destined for a Service working not in DSR mode. | +| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined for a Service selecting a remote non-hostNetwork Endpoint. | | | bit 27 | | 0b1 | FromExternalRegMark | Packet is from Antrea gateway, but its source IP is not the gateway IP. | | NXM_NX_REG5 | bits 0-31 | TFEgressConjIDField | | | Egress conjunction ID hit by TraceFlow packet. | | NXM_NX_REG6 | bits 0-31 | TFIngressConjIDField | | | Ingress conjunction ID hit by TraceFlow packet. | @@ -174,8 +174,8 @@ we assign friendly names to the bits we use. | Field Range | Field Name | Ct Mark Value | Ct Mark Name | Description | |-------------|-----------------------|---------------|--------------------|-----------------------------------------------------------------| -| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is the Antrea gateway port. | -| | | 0b0101 | FromBridgeCTMark | Connection source is the local bridge port. | +| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is the Antrea gateway port. | +| | | 0b0101 | FromBridgeCTMark | Connection source is the local bridge port. | | bit 4 | | 0b1 | ServiceCTMark | Connection is for Service. | | | | 0b0 | NotServiceCTMark | Connection is not for Service. | | bit 5 | | 0b1 | ConnSNATCTMark | SNAT'd connection for Service. | @@ -254,7 +254,7 @@ will see the IP address showed up in the OVS flows. ## Kubernetes Service Implementation Like Kubernetes NetworkPolicy, several tables of the pipeline are dedicated to [Kubernetes -Service] (https://kubernetes.io/docs/concepts/services-networking/service/) implementation (table [NodePortMark], +Service](https://kubernetes.io/docs/concepts/services-networking/service/) implementation (table [NodePortMark], [SessionAffinity], [ServiceLB], and [EndpointDNAT]). By enabling `proxyAll`, ClusterIP, NodePort, LoadBalancer, and ExternalIP are all supported. Otherwise, only in-cluster @@ -512,7 +512,7 @@ Pods mentioned above as the target of this policy. In addition to layer 3 and layer 4 policies mentioned above, [Antrea Layer 7 NetworkPolicy](../antrea-l7-network-policy.md) is also supported in Antrea. The main difference is that Antrea layer 7 NetworkPolicy uses layer 7 protocol to match -traffic, not layer 3 and layer 4 protocol. To achieve this, +traffic, not layer 3 and layer 4 protocol. Consider the following Antrea layer 7 NetworkPolicy in the Application tier as an example for the remainder of this document. @@ -622,11 +622,10 @@ spec: ## Egress Implementation -Table [EgressMark] is dedicated to the implementation of `Egress`. +Table [EgressMark] is dedicated to the implementation of `Egress`. Consider the following Egresses as examples for the remainder of this document. - ```yaml apiVersion: crd.antrea.io/v1beta1 kind: Egress @@ -811,7 +810,7 @@ specific cases: 1. Allowing packets from the local Antrea gateway, where checks are not currently performed. 2. Ensuring that the source IP and MAC addresses are correct, i.e., matching the values configured on the interface when - Antrea sets up networking for a Pod. + Antrea sets up networking for a Pod. If you dump the flows for this table, you may see the following: @@ -845,7 +844,7 @@ Flow 5 is the table-miss flow to drop IP spoofing packets. This table is used to perform `de-SNAT` on reply packets by invoking action `ct` on them. The packets are from SNAT'd Service connections that have been committed with `SNATCtZone` in table [SNAT]. After invoking action `ct`, the packets will be in a "tracked" state, restoring all [connection tracking -fields](https://www.openvswitch.org/support/dist-docs/ovs-fields.7.txt) (such as `ct_state`, `ct_mark`, `ct_label`, etc.) +fields](https://www.openvswitch.org/support/dist-docs/ovs-fields.7.txt) (such as `ct_state`, `ct_mark`, `ct_label`, etc.) to their original values. The packets with a "tracked" state are then forwarded to table [ConntrackZone]. If you dump the flows for this table, you may see the following: @@ -959,10 +958,10 @@ If you dump the flows for this table, you may see the following: ``` Flows 1-2 match packets destined for the local Node from local Pods. `NodePortRegMark` is loaded, indicating that the -packets are potentially destined for NodePort Services. +packets are potentially destined for NodePort Services. Flow 3 match packets destined for the *Virtual NodePort DNAT IP*. Packets destined for NodePort Services from the local -Node or the external network is DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline. +Node or the external network is DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline. Flow 4 is the table-miss flow. @@ -1214,7 +1213,7 @@ Flow 10 is the table-miss flow to forward packets, which are not matched by othe ### EgressRule For this table, you will need to keep in mind the Kubernetes NetworkPolicy -[specification](#kubernetes-networkpolicy-implementation) that we are using. +[specification](#kubernetes-networkpolicy-implementation) that we are using. This table is used to implement the egress rules across all Kubernetes NetworkPolicies. If you dump the flows for this table, you may see the following: @@ -1362,7 +1361,7 @@ local Antrea gateway MAC, and the destination MAC address is set to the associat packets are then forwarded to table [L3DecTTL] to decrease the TTL value. Flow 8 matches request packets originating from local Pods and destined for the external network, and then forwards them -to table [EgressMark] dedicated to feature Egress. In table [EgressMark], SNAT IPs for Egress are looked up for the packets. +to table [EgressMark] dedicated to feature Egress. In table [EgressMark], SNAT IPs for Egress are looked up for the packets. To match the expected packets, `FromLocalRegMark` is utilized to exclude packets that are not from local Pods. Additionally, `NotAntreaFlexibleIPAMRegMark`, mutually exclusive with `AntreaFlexibleIPAMRegMark` that is used to mark packets from Antrea IPAM Pods, is used since Egress can only be applied to Node IPAM Pods. @@ -1374,7 +1373,7 @@ tunnel, the destination MAC address of the packets, represented by the *Global V MAC address of the local Antrea gateway. Flow 10 matches packets from Service connections that are originating from the local Antrea gateway and destined for the -external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark`, and `ServiceCTMark`. The +external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark`, and `ServiceCTMark`. The destination MAC address is then set to that of the local Antrea gateway. Additionally, `ToGatewayRegMark`, which will be used with `FromGatewayRegMark` together to identify hairpin connections in table [SNATMark], is loaded. Finally, the packets are forwarded to table [L3DecTTL]. @@ -1519,7 +1518,7 @@ Flow 4 matches the first packet of Service connections requiring SNAT, identifie Antrea gateway and the Endpoint is a remote Pod. It performs SNAT with the IP address of the local Antrea gateway and forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to other flow 1 or 2, `ServiceCTMark` is persisted in `SNATCtZone`. - + Flow 5 is the table-miss flow. ### L2ForwardingCalc @@ -1836,18 +1835,20 @@ Flow 4 is the table-miss flow for case 4. It drops packets that do not match any [SNAT]: #snat [L2ForwardingCalc]: #l2forwardingcalc [TrafficControl]: #trafficcontrol -[IngressSecurityClassifier]: #ingresssecurityclassifier [AntreaPolicyIngressRule]: #antreapolicyingressrule [IngressRule]: #ingressrule [IngressDefault]: #ingressdefault [IngressMetric]: #ingressmetric [Output]: #output +[OVS Registers]: #ovs-registers +[Ct Marks]: #ovs-ct-mark +[Ct Labels]: #ovs-ct-label +[Ct Zones]: #ovs-ct-zone [ClusterIP without Endpoint]: #clusterip-without-endpoint [ClusterIP]: #clusterip [ClusterIP with Session Affinity]: #clusterip-with-session-affinity [NodePort]: #nodeport [NodePort with Session Affinity]: #nodeport-with-session-affinity -[LoadBalancer]: #loadbalancer [LoadBalancer with Session Affinity]: #loadbalancer-with-session-affinity [Service with ExternalIP]: #service-with-externalip [Service with ExternalIP and Session Affinity]: #service-with-externalip-and-session-affinity