From dd4a3f8f90c73dfc0e1976cb05b3b2c1da453e7d Mon Sep 17 00:00:00 2001 From: Hongliang Liu Date: Fri, 8 Mar 2024 17:37:02 +0800 Subject: [PATCH] Update OVS pipeline document Signed-off-by: Hongliang Liu --- docs/design/ovs-pipeline.md | 225 +++++++++++++++++++++--------------- 1 file changed, 131 insertions(+), 94 deletions(-) diff --git a/docs/design/ovs-pipeline.md b/docs/design/ovs-pipeline.md index be0f26fc199..cb1ce80451e 100644 --- a/docs/design/ovs-pipeline.md +++ b/docs/design/ovs-pipeline.md @@ -108,61 +108,62 @@ where `` is the name of a table in the pipeline, and `` is We use some OVS registers to carry information throughout the pipeline. To enhance usability, we assign friendly names to the registers we use. -| Register | Field Range | Field Name | Reg Mark Value | Reg Mark Name | Description | -|---------------|-------------|---------------------------|----------------|---------------------------------|------------------------------------------------------------------------------------------| -| NXM_NX_REG0 | bits 0-3 | PktSourceField | 0x1 | FromTunnelRegMark | Packet source is tunnel port. | -| | | | 0x2 | FromGatewayRegMark | Packet source is Antrea gateway port. | -| | | | 0x3 | FromLocalRegMark | Packet source is local Pod port. | -| | | | 0x4 | FromUplinkRegMark | Packet source is uplink port. | -| | | | 0x5 | FromBridgeRegMark | Packet source is local bridge port. | -| | | | 0x6 | FromTCReturnRegMark | Packet source is TrafficControl return port. | -| | bits 4-7 | PktDestinationField | 0x1 | ToTunnelRegMark | Packet destination is tunnel port. | -| | | | 0x2 | ToGatewayRegMark | Packet destination is the local Antrea gateway port. | -| | | | 0x3 | ToLocalRegMark | Packet destination is local Pod port. | -| | | | 0x4 | ToUplinkRegMark | Packet destination is uplink port. | -| | | | 0x5 | ToBridgeRegMark | Packet destination is local bridge port. | -| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source / destination MAC address does not need to be rewritten. | -| | | | 0b1 | RewriteMACRegMark | Packet's source / destination MAC address needs to be rewritten. | -| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop / Reject) by Antrea NetworkPolicy. | -| | bits 11-12 | APDispositionField | 0b00 | DispositionAllowRegMark | Indicates Antrea NetworkPolicy disposition: allow. | -| | | | 0b01 | DispositionDropRegMark | Indicates Antrea NetworkPolicy disposition: drop. | -| | | | 0b11 | DispositionPassRegMark | Indicates Antrea NetworkPolicy disposition: pass. | -| | bit 13 | | 0b1 | GeneratedRejectPacketOutRegMark | Indicates packet is a generated reject response packet-out. | -| | bit 14 | | 0b1 | SvcNoEpRegMark | Indicates packet towards a Service without Endpoint. | -| | bit 19 | | 0b1 | RemoteSNATRegMark | Indicates packet needs SNAT on a remote Node. | -| | bit 22 | | 0b1 | L7NPRedirectRegMark | Indicates L7 Antrea NetworkPolicy disposition of redirect. | -| | bits 21-22 | OutputRegField | 0b01 | OutputToOFPortRegMark | Output packet to an OVS port. | -| | | | 0b10 | OutputToControllerRegMark | Send packet to Antrea Agent. | -| | bits 25-32 | PacketInOperationField | | | Field to store NetworkPolicy packetIn operation. | -| NXM_NX_REG1 | bits 0-31 | TargetOFPortField | | | Egress OVS port of packet. | -| NXM_NX_REG2 | bits 0-31 | SwapField | | | Swap values in flow fields in OpenFlow actions. | -| | | PacketInTableField | | | OVS table where it was decided to send packet to controller (Antrea Agent). | -| NXM_NX_REG3 | bits 0-31 | EndpointIPField | | | Field to store IPv4 address of selected Service Endpoint. | -| | | APConjIDField | | | Field to store Conjunction ID for Antrea Policy. | -| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP / UDP / SCTP port of a Service's selected Endpoint. | -| | bits 16-18 | ServiceEPStateField | 0b001 | EpToSelectRegMark | Packet needs to do Service Endpoint selection. | -| | bits 16-18 | ServiceEPStateField | 0b010 | EpSelectedRegMark | Packet has done Service Endpoint selection. | -| | bits 16-18 | ServiceEPStateField | 0b011 | EpToLearnRegMark | Packet has done Service Endpoint selection and the selected Endpoint needs to be cached. | -| | bits 0-18 | EpUnionField | | | The union value of EndpointPortField and ServiceEPStateField. | -| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined to a Service of type NodePort. | -| | bit 20 | | 0b1 | AntreaFlexibleIPAMRegMark | Packet is from local Antrea IPAM Pod. | -| | bit 20 | | 0b0 | NotAntreaFlexibleIPAMRegMark | Packet is not from local Antrea IPAM Pod. | -| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined to a Service's external IP. | -| | bits 22-23 | TrafficControlActionField | 0b01 | TrafficControlMirrorRegMark | Indicates packet needs to be mirrored (used by TrafficControl). | -| | | | 0b10 | TrafficControlRedirectRegMark | Indicates packet needs to be redirected (used by TrafficControl). | -| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined to a Service which is using other other Service as Endpoints. | -| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined to a Service working in DSR mode. | -| | | | 0b0 | NotDSRServiceRegMark | Packet is destined to a Service working not in DSR mode. | -| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined to a Service selecting a remote non-hostNetwork Endpoint. | -| | bit 27 | | 0b1 | FromExternalRegMark | Packet is from Antrea gateway, but its source IP is not the gateway IP. | -| NXM_NX_REG5 | bits 0-31 | TFEgressConjIDField | | | Egress conjunction ID hit by TraceFlow packet. | -| NXM_NX_REG6 | bits 0-31 | TFIngressConjIDField | | | Ingress conjunction ID hit by TraceFlow packet. | -| NXM_NX_REG7 | bits 0-31 | ServiceGroupIDField | | | GroupID corresponding to the Service. | -| NXM_NX_REG8 | bits 0-11 | VLANIDField | | | VLAN ID. | -| | bits 12-15 | CtZoneTypeField | 0b0001 | IPCtZoneTypeRegMark | Ct zone type is IPv4. | -| | | | 0b0011 | IPv6CtZoneTypeRegMark | Ct zone type is IPv6. | -| | bits 0-15 | CtZoneField | | | Ct zone ID is a combination of VLANIDField and CtZoneTypeField. | -| NXM_NX_XXREG3 | bits 0-127 | EndpointIP6Field | | | Field to store IPv6 address of selected Service Endpoint. | +| Register | Field Range | Field Name | Reg Mark Value | Reg Mark Name | Description | +|---------------|-------------|---------------------------------|----------------|---------------------------------|------------------------------------------------------------------------------------------------------| +| NXM_NX_REG0 | bits 0-3 | PktSourceField | 0x1 | FromTunnelRegMark | Packet source is tunnel port. | +| | | | 0x2 | FromGatewayRegMark | Packet source is Antrea gateway port. | +| | | | 0x3 | FromLocalRegMark | Packet source is local Pod port. | +| | | | 0x4 | FromUplinkRegMark | Packet source is uplink port. | +| | | | 0x5 | FromBridgeRegMark | Packet source is local bridge port. | +| | | | 0x6 | FromTCReturnRegMark | Packet source is TrafficControl return port. | +| | bits 4-7 | PktDestinationField | 0x1 | ToTunnelRegMark | Packet destination is tunnel port. | +| | | | 0x2 | ToGatewayRegMark | Packet destination is the local Antrea gateway port. | +| | | | 0x3 | ToLocalRegMark | Packet destination is local Pod port. | +| | | | 0x4 | ToUplinkRegMark | Packet destination is uplink port. | +| | | | 0x5 | ToBridgeRegMark | Packet destination is local bridge port. | +| | bit 9 | | 0b0 | NotRewriteMACRegMark | Packet's source / destination MAC address does not need to be rewritten. | +| | | | 0b1 | RewriteMACRegMark | Packet's source / destination MAC address needs to be rewritten. | +| | bit 10 | | 0b1 | APDenyRegMark | Packet denied (Drop / Reject) by Antrea NetworkPolicy. | +| | bits 11-12 | APDispositionField | 0b00 | DispositionAllowRegMark | Indicates Antrea NetworkPolicy disposition: allow. | +| | | | 0b01 | DispositionDropRegMark | Indicates Antrea NetworkPolicy disposition: drop. | +| | | | 0b11 | DispositionPassRegMark | Indicates Antrea NetworkPolicy disposition: pass. | +| | bit 13 | | 0b1 | GeneratedRejectPacketOutRegMark | Indicates packet is a generated reject response packet-out. | +| | bit 14 | | 0b1 | SvcNoEpRegMark | Indicates packet towards a Service without Endpoint. | +| | bit 19 | | 0b1 | RemoteSNATRegMark | Indicates packet needs SNAT on a remote Node. | +| | bit 22 | | 0b1 | L7NPRedirectRegMark | Indicates L7 Antrea NetworkPolicy disposition of redirect. | +| | bits 21-22 | OutputRegField | 0b01 | OutputToOFPortRegMark | Output packet to an OVS port. | +| | | | 0b10 | OutputToControllerRegMark | Send packet to Antrea Agent. | +| | bits 25-32 | PacketInOperationField | | | Field to store NetworkPolicy packetIn operation. | +| NXM_NX_REG1 | bits 0-31 | TargetOFPortField | | | Egress OVS port of packet. | +| NXM_NX_REG2 | bits 0-31 | SwapField | | | Swap values in flow fields in OpenFlow actions. | +| | | PacketInTableField | | | OVS table where it was decided to send packet to controller (Antrea Agent). | +| NXM_NX_REG3 | bits 0-31 | EndpointIPField | | | Field to store IPv4 address of selected Service Endpoint. | +| | | APConjIDField | | | Field to store Conjunction ID for Antrea Policy. | +| NXM_NX_REG4 | bits 0-15 | EndpointPortField | | | Field store TCP / UDP / SCTP port of a Service's selected Endpoint. | +| | bits 16-18 | ServiceEPStateField | 0b001 | EpToSelectRegMark | Packet needs to do Service Endpoint selection. | +| | bits 16-18 | ServiceEPStateField | 0b010 | EpSelectedRegMark | Packet has done Service Endpoint selection. | +| | bits 16-18 | ServiceEPStateField | 0b011 | EpToLearnRegMark | Packet has done Service Endpoint selection and the selected Endpoint needs to be cached. | +| | bits 0-18 | EpUnionField | | | The union value of EndpointPortField and ServiceEPStateField. | +| | bit 19 | | 0b1 | ToNodePortAddressRegMark | Packet is destined to a Service of type NodePort. | +| | bit 20 | | 0b1 | AntreaFlexibleIPAMRegMark | Packet is from local Antrea IPAM Pod. | +| | bit 20 | | 0b0 | NotAntreaFlexibleIPAMRegMark | Packet is not from local Antrea IPAM Pod. | +| | bit 21 | | 0b1 | ToExternalAddressRegMark | Packet is destined to a Service's external IP. | +| | bits 22-23 | TrafficControlActionField | 0b01 | TrafficControlMirrorRegMark | Indicates packet needs to be mirrored (used by TrafficControl). | +| | | | 0b10 | TrafficControlRedirectRegMark | Indicates packet needs to be redirected (used by TrafficControl). | +| | bit 24 | | 0b1 | NestedServiceRegMark | Packet is destined to a Service which is using other other Service as Endpoints. | +| | bit 25 | | 0b1 | DSRServiceRegMark | Packet is destined to a Service working in DSR mode. | +| | | | 0b0 | NotDSRServiceRegMark | Packet is destined to a Service working not in DSR mode. | +| | bit 26 | | 0b1 | RemoteEndpointRegMark | Packet is destined to a Service selecting a remote non-hostNetwork Endpoint. | +| | bit 27 | | 0b1 | FromExternalRegMark | Packet is from Antrea gateway, but its source IP is not the gateway IP. | +| NXM_NX_REG5 | bits 0-31 | TFEgressConjIDField | | | Egress conjunction ID hit by TraceFlow packet. | +| NXM_NX_REG6 | bits 0-31 | TFIngressConjIDField | | | Ingress conjunction ID hit by TraceFlow packet. | +| NXM_NX_REG7 | bits 0-31 | ServiceGroupIDField | | | GroupID corresponding to the Service. | +| NXM_NX_REG8 | bits 0-11 | VLANIDField | | | VLAN ID. | +| | bits 12-15 | CtZoneTypeField | 0b0001 | IPCtZoneTypeRegMark | Ct zone type is IPv4. | +| | | | 0b0011 | IPv6CtZoneTypeRegMark | Ct zone type is IPv6. | +| | bits 0-15 | CtZoneField | | | Ct zone ID is a combination of VLANIDField and CtZoneTypeField. | +| NXM_NX_REG9 | bits 0-31 | TrafficControlTargetOFPortField | | | Field to cache the OVS port to output packets to be mirrored or redirected (used by TrafficControl). | +| NXM_NX_XXREG3 | bits 0-127 | EndpointIP6Field | | | Field to store IPv6 address of selected Service Endpoint. | Note that reg marks that have overlapped bits will not be used at the same time, such as `SwapField` and `PacketInTableField`. @@ -570,7 +571,7 @@ spec: host: "*.bar.com" # not be considered. ``` -### TrafficControl Implementation +## TrafficControl Implementation [TrafficControl](../traffic-control.md) is a CRD API that manages and manipulates the transmission of Pod traffic. Antrea creates additional dedicated table [TrafficControl] to implement TrafficControl. Use the provided TrafficControls @@ -1516,60 +1517,96 @@ Flow 5 is the table-miss flow. ### L2ForwardingCalc -This is essentially the "dmac" table of the switch. We program one flow for each port (tunnel port, the local Antrea gateway -port, and local Pod ports). +This is essentially the "dmac" table of the switch. We program one flow for each port (tunnel port, the local Antrea +gateway port, and local Pod ports). If you dump the flows for this table, you may see the following: ```text -1. cookie=0x2010000000000, table=L2ForwardingCalc, priority=200,dl_dst=ba:5e:d1:55:aa:c0 actions=set_field:0x2->reg1,set_field:0x200000/0x600000->reg0,goto_table:IngressSecurityClassifier -2. cookie=0x2010000000000, table=L2ForwardingCalc, priority=200,dl_dst=aa:bb:cc:dd:ee:ff actions=set_field:0x1->reg1,set_field:0x200000/0x600000->reg0,goto_table:IngressSecurityClassifier -3. cookie=0x2010000000000, table=L2ForwardingCalc, priority=200,dl_dst=2e:ba:06:b2:44:91 actions=set_field:0x8->reg1,set_field:0x200000/0x600000->reg0,goto_table:IngressSecurityClassifier -4. cookie=0x2010000000000, table=L2ForwardingCalc, priority=200,dl_dst=c2:5a:5e:50:95:9b actions=set_field:0x9->reg1,set_field:0x200000/0x600000->reg0,goto_table:IngressSecurityClassifier -5. cookie=0x2000000000000, table=L2ForwardingCalc, priority=0 actions=goto_table:IngressSecurityClassifier +1. table=L2ForwardingCalc, priority=200,dl_dst=ba:5e:d1:55:aa:c0 actions=set_field:0x2->reg1,set_field:0x200000/0x600000->reg0,goto_table:TrafficControl +2. table=L2ForwardingCalc, priority=200,dl_dst=aa:bb:cc:dd:ee:ff actions=set_field:0x1->reg1,set_field:0x200000/0x600000->reg0,goto_table:TrafficControl +3. table=L2ForwardingCalc, priority=200,dl_dst=5e:b5:e3:a6:90:b7 actions=set_field:0x24->reg1,set_field:0x200000/0x600000->reg0,goto_table:TrafficControl +4. table=L2ForwardingCalc, priority=200,dl_dst=fa:b7:53:74:21:a6 actions=set_field:0x25->reg1,set_field:0x200000/0x600000->reg0,goto_table:TrafficControl +5. table=L2ForwardingCalc, priority=200,dl_dst=36:48:21:a2:9d:b4 actions=set_field:0x26->reg1,set_field:0x200000/0x600000->reg0,goto_table:TrafficControl +6. table=L2ForwardingCalc, priority=0 actions=goto_table:TrafficControl ``` -- Flow 1 is to match packets destined to the the local Antrea gateway. - - Match condition `dl_dst=ba:5e:d1:55:aa:c0` is to match packets destined to the the local Antrea gateway MAC address. - - Action `set_field:0x2->reg1` is to load output OVS port number to `TargetOFPortField`. - - Action `set_field:0x200000/0x600000->reg0` is to load `OutputToOFPortRegMark`, indicating that packets should output - to an OVS port. - - Action `goto_table:IngressSecurityClassifier` is to forward packets to table [IngressSecurityClassifier]. -- Flow 2 is to match packets destined to tunnel. - - Match condition `dl_dst=aa:bb:cc:dd:ee:ff` is to match packets destined to the Global Virtual MAC address, which is - used for tunnel traffic. - - Actions are the same as flow 1. -- Flows 3-4 are to match packets destined to local Pods. - - Match conditions `dl_dst=2e:ba:06:b2:44:91` and `dl_dst=c2:5a:5e:50:95:9b` are to match packets destined to the MAC - addresses of local Pods. - - Actions are the same as flow 1. -- Flow 4 is the auto-generated flow. - -In above flows 1-5, we load `OutputToOFPortRegMark` to indicate that there was a matching entry for the destination MAC -address and that the packet must be forwarded. We also use the `TargetOFPortField` to store the egress port for packet, -which will be used as a parameter to the `output` OpenFlow action in table [Output]. +Flow 1 matches packets destined to the local Antrea gateway, identified by the destination MAC address being that of +the local Antrea gateway. It loads `OutputToOFPortRegMark`, indicating that the packets should output to an OVS port, +and the port number of the local Antrea gateway to `TargetOFPortField`. Both of thees two values will be consumed in +table [Output]. + +Flow 2 matches packets destined to a tunnel, identified by the destination MAC address being that of the *Global Virtual +MAC*. Similar to flow 1, `OutputToOFPortRegMark` is loaded, and the port number of the tunnel is loaded to +`TargetOFPortField`. + +Flows 3-5 matches packets destined to local Pods, identified by the destination MAC address being that of the local +Pods. Similar to flow 1, `OutputToOFPortRegMark` is loaded, and the port number of the tunnel is loaded to +`TargetOFPortField`. + +Flow 6 is the table-miss flow. + +### TrafficControl + +This table is dedicated to `TrafficControl`. + +If you dump the flows for this table, you may see the following: + +```text +1. table=TrafficControl, priority=210,reg0=0x200006/0x60000f actions=goto_table:Output +2. table=TrafficControl, priority=200,reg1=0x25 actions=set_field:0x22->reg9,set_field:0x800000/0xc00000->reg4,goto_table:IngressSecurityClassifier +3. table=TrafficControl, priority=200,in_port="web-7975-274540" actions=set_field:0x22->reg9,set_field:0x800000/0xc00000->reg4,goto_table:IngressSecurityClassifier +4. table=TrafficControl, priority=200,reg1=0x26 actions=set_field:0x27->reg9,set_field:0x400000/0xc00000->reg4,goto_table:IngressSecurityClassifier +5. table=TrafficControl, priority=200,in_port="db-755c6-5080e3" actions=set_field:0x27->reg9,set_field:0x400000/0xc00000->reg4,goto_table:IngressSecurityClassifier +6. table=TrafficControl, priority=0 actions=goto_table:IngressSecurityClassifier +``` + +Flow 1 matches packets returned from TrafficControl return ports and forwards them to table [Output], where the packets +are output to the port to which they are destined. To identify such packets, `OutputToOFPortRegMark`, indicating that +the packets should be output to an OVS port, and `FromTCReturnRegMark` loaded in table [Classifier], indicating that +the packets are from a TrafficControl return port, are utilized. + +Flow 2 is installed for the sample TrafficControl `redirect-web-to-local`, which marks the packets destined to the Pods +labelled by `app: web` with `TrafficControlRedirectRegMark`, indicating the packets should be redirected to a +TrafficControl target port whose number is loaded to `TrafficControlTargetOFPortField`. + +Flow 2 is also installed for the sample TrafficControl `redirect-web-to-local`. Similar to flow 2, +`TrafficControlRedirectRegMark` is loaded and the TrafficControl target port whose number is loaded to +`TrafficControlTargetOFPortField`. + +Flow 4 is installed for the sample TrafficControl `mirror-db-to-local`, which marks the packets destined to the Pods +labelled by `app: db` with `TrafficControlMirrorRegMark`, indicating the packets should be mirrored to a +TrafficControl target port whose number is loaded to `TrafficControlTargetOFPortField`. + +Flow 5 is also installed for the sample TrafficControl `redirect-web-to-local`. Similar to flow 2, +`TrafficControlRedirectRegMark` is loaded and the TrafficControl target port whose number is loaded to +`TrafficControlTargetOFPortField`. + +Flow 6 is the table-miss flow. ### IngressSecurityClassifier -This table is to classify packets before entering the tables for ingress security. +This table is to classify packets before they enter the tables for ingress security. If you dump the flows for this table, you should see the following: ```text 1. table=IngressSecurityClassifier, priority=210,pkt_mark=0x80000000/0x80000000,ct_state=-rpl+trk,ip actions=goto_table:ConntrackCommit -2. table=IngressSecurityClassifier, priority=200,reg0=0x20/0xf0 actions=goto_table:IngressMetric -3. table=IngressSecurityClassifier, priority=200,reg0=0x10/0xf0 actions=goto_table:IngressMetric -4. table=IngressSecurityClassifier, priority=200,reg0=0x40/0xf0 actions=goto_table:IngressMetric -5. table=IngressSecurityClassifier, priority=200,ct_mark=0x40/0x40 actions=goto_table:ConntrackCommit -6. table=IngressSecurityClassifier, priority=0 actions=goto_table:AntreaPolicyIngressRule +2. table=IngressSecurityClassifier, priority=201,reg4=0x80000/0x80000 actions=goto_table:AntreaPolicyIngressRule +3. table=IngressSecurityClassifier, priority=200,reg0=0x20/0xf0 actions=goto_table:IngressMetric +4. table=IngressSecurityClassifier, priority=200,reg0=0x10/0xf0 actions=goto_table:IngressMetric +5. table=IngressSecurityClassifier, priority=200,reg0=0x40/0xf0 actions=goto_table:IngressMetric +6. table=IngressSecurityClassifier, priority=200,ct_mark=0x40/0x40 actions=goto_table:ConntrackCommit +7. table=IngressSecurityClassifier, priority=0 actions=goto_table:AntreaPolicyIngressRule ``` -- Flow 1 is to match locally generated request packets and forward them to table [ConntrackCommit] directly to bypass - all tables for ingress security. - - Match condition `pkt_mark=0x80000000/0x80000000` is to match packets with iptables fwmark 0x80000000, which is set - by iptables rules in the host network namespace to mark locally generated packets. - - Match condition `ct_state=-rpl+trk` is to match request packets. -- Flow 2-4 are to match some packets destined to the local Antrea gateway, tunnel, uplink port by matching `ToGatewayRegMark`, +Flow 1 matches locally generated request packets, identified by `pkt_mark` which is set by iptables in the host network +namespace. It forwards the packets to table [ConntrackCommit] directly to bypass all tables for ingress security. + +Flow 2 matches packets towards to NodePort Services and forwards them to table [AntreaPolicyIngressRule], avoiding that +packets traversed the Antrea gateway and destined to NodePort Services are matched by flow + +- Flow 3-5 are to match some packets destined to the local Antrea gateway, tunnel, uplink port by matching `ToGatewayRegMark`, `ToTunnelRegMark` or `ToUplinkRegMark` respectively and forward them to table [IngressMetric] directly to bypass tables for ingress security rules. - Match condition `reg0=0x20/0xf0` is to match `ToGatewayRegMark`, indicating that packets are destined to local