Skip to content

Commit

Permalink
a
Browse files Browse the repository at this point in the history
Signed-off-by: Hongliang Liu <[email protected]>
  • Loading branch information
hongliangl committed Mar 9, 2024
1 parent fe8a6a5 commit d20b6d3
Showing 1 changed file with 32 additions and 31 deletions.
63 changes: 32 additions & 31 deletions docs/design/ovs-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ The document references version v1.15 of Antrea.
the state of a TCP, UDP, ICMP, etc., connection. See the [OVS Conntrack
tutorial](https://docs.openvswitch.org/en/latest/tutorials/ovs-conntrack/) for more information.
- *reg mark*: a reg mark is a value stored in an OVS register, serving to convey information for a packet across the
pipeline. Explore all reg marks used in the pipeline in the [Registers](#registers) section.
pipeline. Explore all reg marks used in the pipeline in the [OVS Registers] section.
- *ct mark*: a ct mark is a value stored in the OVS connection tracking mark, serving to convey information for a
connection throughout its lifecycle across the pipeline. Explore all ct marks used in the pipeline in the [Ct
Marks](#ct-marks) section.
Marks] section.
- *ct label*: it is similar to *ct label*, serving to convey information for a connection throughout its lifecycle across
the pipeline. Explore all ct labels used in the pipeline in the [Ct Labels](#ct-labels) section.
the pipeline. Explore all ct labels used in the pipeline in the [Ct Labels] section.
- *ct zone*: ct zone is simply used to isolate connection tracking rules. It is conceptually similar to the more generic
Linux network namespaces, but ct zone is specific to conntrack and has less overhead. Explore all ct zones used in the
pipeline in the [Ct Zones](#ct-zones) section.
pipeline in the [Ct Zones] section.

### Misc

Expand Down Expand Up @@ -121,9 +121,9 @@ to the registers we use.
| | | | 0x3 | ToLocalRegMark | Packet destination is local Pod port. |
| | | | 0x4 | ToUplinkRegMark | Packet destination is uplink port. |
| | | | 0x5 | ToBridgeRegMark | Packet destination is local bridge port. |
| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source/destination MAC address does not need to be rewritten. |
| | | | 0b1 | RewriteMACRegMark | Packet's source/destination MAC address needs to be rewritten. |
| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop/Reject) by Antrea NetworkPolicy. |
| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source/destination MAC address does not need to be rewritten. |
| | | | 0b1 | RewriteMACRegMark | Packet's source/destination MAC address needs to be rewritten. |
| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop/Reject) by Antrea NetworkPolicy. |
| | bits 11-12 | APDispositionField | 0b00 | DispositionAllowRegMark | Indicates Antrea NetworkPolicy disposition: allow. |
| | | | 0b01 | DispositionDropRegMark | Indicates Antrea NetworkPolicy disposition: drop. |
| | | | 0b11 | DispositionPassRegMark | Indicates Antrea NetworkPolicy disposition: pass. |
Expand All @@ -136,24 +136,24 @@ to the registers we use.
| | bits 25-32 | PacketInOperationField | | | Field to store NetworkPolicy packetIn operation. |
| NXM_NX_REG1 | bits 0-31 | TargetOFPortField | | | Egress OVS port of packet. |
| NXM_NX_REG2 | bits 0-31 | SwapField | | | Swap values in flow fields in OpenFlow actions. |
| | | PacketInTableField | | | OVS table where it was decided to send packets to the controller (Antrea Agent). |
| | | PacketInTableField | | | OVS table where it was decided to send packets to the controller (Antrea Agent). |
| NXM_NX_REG3 | bits 0-31 | EndpointIPField | | | Field to store IPv4 address of selected Service Endpoint. |
| | | APConjIDField | | | Field to store Conjunction ID for Antrea Policy. |
| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP/UDP/SCTP port of a Service's selected Endpoint. |
| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP/UDP/SCTP port of a Service's selected Endpoint. |
| | bits 16-18 | ServiceEPStateField | 0b001 | EpToSelectRegMark | Packet needs to do Service Endpoint selection. |
| | bits 16-18 | ServiceEPStateField | 0b010 | EpSelectedRegMark | Packet has done Service Endpoint selection. |
| | bits 16-18 | ServiceEPStateField | 0b011 | EpToLearnRegMark | Packet has done Service Endpoint selection and the selected Endpoint needs to be cached. |
| | bits 0-18 | EpUnionField | | | The union value of EndpointPortField and ServiceEPStateField. |
| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined for a Service of type NodePort. |
| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined for a Service of type NodePort. |
| | bit 20 | | 0b1 | AntreaFlexibleIPAMRegMark | Packet is from local Antrea IPAM Pod. |
| | bit 20 | | 0b0 | NotAntreaFlexibleIPAMRegMark | Packet is not from local Antrea IPAM Pod. |
| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined for a Service's external IP. |
| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined for a Service's external IP. |
| | bits 22-23 | TrafficControlActionField | 0b01 | TrafficControlMirrorRegMark | Indicates packet needs to be mirrored (used by TrafficControl). |
| | | | 0b10 | TrafficControlRedirectRegMark | Indicates packet needs to be redirected (used by TrafficControl). |
| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined for a Service that is using other other Services as Endpoints. |
| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined for a Service working in DSR mode. |
| | | | 0b0 | NotDSRServiceRegMark | Packet is destined for a Service working not in DSR mode. |
| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined for a Service selecting a remote non-hostNetwork Endpoint. |
| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined for a Service that is using other other Services as Endpoints. |
| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined for a Service working in DSR mode. |
| | | | 0b0 | NotDSRServiceRegMark | Packet is destined for a Service working not in DSR mode. |
| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined for a Service selecting a remote non-hostNetwork Endpoint. |
| | bit 27 | | 0b1 | FromExternalRegMark | Packet is from Antrea gateway, but its source IP is not the gateway IP. |
| NXM_NX_REG5 | bits 0-31 | TFEgressConjIDField | | | Egress conjunction ID hit by TraceFlow packet. |
| NXM_NX_REG6 | bits 0-31 | TFIngressConjIDField | | | Ingress conjunction ID hit by TraceFlow packet. |
Expand All @@ -174,8 +174,8 @@ we assign friendly names to the bits we use.

| Field Range | Field Name | Ct Mark Value | Ct Mark Name | Description |
|-------------|-----------------------|---------------|--------------------|-----------------------------------------------------------------|
| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is the Antrea gateway port. |
| | | 0b0101 | FromBridgeCTMark | Connection source is the local bridge port. |
| bits 0-3 | ConnSourceCTMarkField | 0b0010 | FromGatewayCTMark | Connection source is the Antrea gateway port. |
| | | 0b0101 | FromBridgeCTMark | Connection source is the local bridge port. |
| bit 4 | | 0b1 | ServiceCTMark | Connection is for Service. |
| | | 0b0 | NotServiceCTMark | Connection is not for Service. |
| bit 5 | | 0b1 | ConnSNATCTMark | SNAT'd connection for Service. |
Expand Down Expand Up @@ -254,7 +254,7 @@ will see the IP address showed up in the OVS flows.
## Kubernetes Service Implementation

Like Kubernetes NetworkPolicy, several tables of the pipeline are dedicated to [Kubernetes
Service] (https://kubernetes.io/docs/concepts/services-networking/service/) implementation (table [NodePortMark],
Service](https://kubernetes.io/docs/concepts/services-networking/service/) implementation (table [NodePortMark],
[SessionAffinity], [ServiceLB], and [EndpointDNAT]).

By enabling `proxyAll`, ClusterIP, NodePort, LoadBalancer, and ExternalIP are all supported. Otherwise, only in-cluster
Expand Down Expand Up @@ -512,7 +512,7 @@ Pods mentioned above as the target of this policy.

In addition to layer 3 and layer 4 policies mentioned above, [Antrea Layer 7 NetworkPolicy](../antrea-l7-network-policy.md)
is also supported in Antrea. The main difference is that Antrea layer 7 NetworkPolicy uses layer 7 protocol to match
traffic, not layer 3 and layer 4 protocol. To achieve this,
traffic, not layer 3 and layer 4 protocol.

Consider the following Antrea layer 7 NetworkPolicy in the Application tier as an example for the remainder of this
document.
Expand Down Expand Up @@ -622,11 +622,10 @@ spec:

## Egress Implementation

Table [EgressMark] is dedicated to the implementation of `Egress`.
Table [EgressMark] is dedicated to the implementation of `Egress`.

Consider the following Egresses as examples for the remainder of this document.


```yaml
apiVersion: crd.antrea.io/v1beta1
kind: Egress
Expand Down Expand Up @@ -811,7 +810,7 @@ specific cases:

1. Allowing packets from the local Antrea gateway, where checks are not currently performed.
2. Ensuring that the source IP and MAC addresses are correct, i.e., matching the values configured on the interface when
Antrea sets up networking for a Pod.
Antrea sets up networking for a Pod.

If you dump the flows for this table, you may see the following:

Expand Down Expand Up @@ -845,7 +844,7 @@ Flow 5 is the table-miss flow to drop IP spoofing packets.
This table is used to perform `de-SNAT` on reply packets by invoking action `ct` on them. The packets are from SNAT'd
Service connections that have been committed with `SNATCtZone` in table [SNAT]. After invoking action `ct`, the packets
will be in a "tracked" state, restoring all [connection tracking
fields](https://www.openvswitch.org/support/dist-docs/ovs-fields.7.txt) (such as `ct_state`, `ct_mark`, `ct_label`, etc.)
fields](https://www.openvswitch.org/support/dist-docs/ovs-fields.7.txt) (such as `ct_state`, `ct_mark`, `ct_label`, etc.)
to their original values. The packets with a "tracked" state are then forwarded to table [ConntrackZone].

If you dump the flows for this table, you may see the following:
Expand Down Expand Up @@ -959,10 +958,10 @@ If you dump the flows for this table, you may see the following:
```

Flows 1-2 match packets destined for the local Node from local Pods. `NodePortRegMark` is loaded, indicating that the
packets are potentially destined for NodePort Services.
packets are potentially destined for NodePort Services.

Flow 3 match packets destined for the *Virtual NodePort DNAT IP*. Packets destined for NodePort Services from the local
Node or the external network is DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline.
Node or the external network is DNAT'd to the *Virtual NodePort DNAT IP* by iptables before entering the pipeline.

Flow 4 is the table-miss flow.

Expand Down Expand Up @@ -1214,7 +1213,7 @@ Flow 10 is the table-miss flow to forward packets, which are not matched by othe
### EgressRule

For this table, you will need to keep in mind the Kubernetes NetworkPolicy
[specification](#kubernetes-networkpolicy-implementation) that we are using.
[specification](#kubernetes-networkpolicy-implementation) that we are using.

This table is used to implement the egress rules across all Kubernetes NetworkPolicies. If you dump the flows for this
table, you may see the following:
Expand Down Expand Up @@ -1362,7 +1361,7 @@ local Antrea gateway MAC, and the destination MAC address is set to the associat
packets are then forwarded to table [L3DecTTL] to decrease the TTL value.

Flow 8 matches request packets originating from local Pods and destined for the external network, and then forwards them
to table [EgressMark] dedicated to feature Egress. In table [EgressMark], SNAT IPs for Egress are looked up for the packets.
to table [EgressMark] dedicated to feature Egress. In table [EgressMark], SNAT IPs for Egress are looked up for the packets.
To match the expected packets, `FromLocalRegMark` is utilized to exclude packets that are not from local Pods.
Additionally, `NotAntreaFlexibleIPAMRegMark`, mutually exclusive with `AntreaFlexibleIPAMRegMark` that is used to mark
packets from Antrea IPAM Pods, is used since Egress can only be applied to Node IPAM Pods.
Expand All @@ -1374,7 +1373,7 @@ tunnel, the destination MAC address of the packets, represented by the *Global V
MAC address of the local Antrea gateway.

Flow 10 matches packets from Service connections that are originating from the local Antrea gateway and destined for the
external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark`, and `ServiceCTMark`. The
external network. This is accomplished by matching `RewriteMACRegMark`, `FromGatewayRegMark`, and `ServiceCTMark`. The
destination MAC address is then set to that of the local Antrea gateway. Additionally, `ToGatewayRegMark`, which will be
used with `FromGatewayRegMark` together to identify hairpin connections in table [SNATMark], is loaded. Finally,
the packets are forwarded to table [L3DecTTL].
Expand Down Expand Up @@ -1519,7 +1518,7 @@ Flow 4 matches the first packet of Service connections requiring SNAT, identifie
Antrea gateway and the Endpoint is a remote Pod. It performs SNAT with the IP address of the local Antrea gateway and
forwards the SNAT'd packets to table [L2ForwardingCalc]. Similar to other flow 1 or 2, `ServiceCTMark` is persisted in
`SNATCtZone`.

Flow 5 is the table-miss flow.

### L2ForwardingCalc
Expand Down Expand Up @@ -1836,18 +1835,20 @@ Flow 4 is the table-miss flow for case 4. It drops packets that do not match any
[SNAT]: #snat
[L2ForwardingCalc]: #l2forwardingcalc
[TrafficControl]: #trafficcontrol
[IngressSecurityClassifier]: #ingresssecurityclassifier
[AntreaPolicyIngressRule]: #antreapolicyingressrule
[IngressRule]: #ingressrule
[IngressDefault]: #ingressdefault
[IngressMetric]: #ingressmetric
[Output]: #output
[OVS Registers]: #ovs-registers
[Ct Marks]: #ovs-ct-mark
[Ct Labels]: #ovs-ct-label
[Ct Zones]: #ovs-ct-zone
[ClusterIP without Endpoint]: #clusterip-without-endpoint
[ClusterIP]: #clusterip
[ClusterIP with Session Affinity]: #clusterip-with-session-affinity
[NodePort]: #nodeport
[NodePort with Session Affinity]: #nodeport-with-session-affinity
[LoadBalancer]: #loadbalancer
[LoadBalancer with Session Affinity]: #loadbalancer-with-session-affinity
[Service with ExternalIP]: #service-with-externalip
[Service with ExternalIP and Session Affinity]: #service-with-externalip-and-session-affinity

0 comments on commit d20b6d3

Please sign in to comment.