Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add packet capture feature #5821

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

hangyan
Copy link
Member

@hangyan hangyan commented Dec 22, 2023

This commit adds the packet capture feature to Antrea.

Fix #5443

Feature Introduction

Traceflow works well for network flow diagnose, but sometimes users may want to take a look into the raw packets in the flow. Currently, Antrea lacks the ability to capture raw packet in live traffic.

CRD Example

apiVersion: crd.antrea.io/v1alpha1
kind: PacketCapture
metadata:
  name: tf-test
spec:
  timeout: 60             # a hard limit for all sampling session
  type: FirstNCapture   # support one type first
  parameters: 
    number: 15            # the number of packets to be captured
  source:                 # same selector as Traceflow
    namespace: default
    pod: tcp-sts-0
  destination:
    namespace: default
    pod: tcp-sts-2   
  packet:
    ipHeader: 
      protocol: 6 
    transportHeader:
      tcp:
        srcPort: 10000 
        dstPort: 80 
  fileServer:
    url: sftp://youtestdomain.com:22/root/test
  authentication:
    authType: “BasicAuthenticaion“
    authSecret:
      name: support-bundle-secret
      namespace: default

Notice

  1. PacketCapture uses OVS REG as the flow mark to do the capture.
  2. Compare to traceflow/supportbundle, since the results need to be uploaded to a sftp server (rather being included in CRD spec or supporting download directly), it's not very convenient for users to specify url/pwd/username for a sftp server in cmd args, so antctl sub-command support for PacketSampling is not implemented yet.

Vefied cases

  1. pod -> pod
  2. IP -> pod / pod -> IP
  3. pod -> svc
  4. udp/icmp...

known issue for now

  1. pod -> svc -> local pod not working.

Links:

  1. Origin Proposal
  2. Add first N sampling to the live-traffic traceflow #5345

@hangyan
Copy link
Member Author

hangyan commented Dec 22, 2023

@luolanzone Here is the current PR for packet sampling. If you think more information was needed to help review this, let me know and I will update this ASAP. currently i'm actively working on the remaining tests and unitest/docs. Appreciate any inputs on the current implemention.

@luolanzone luolanzone added this to the Antrea v1.16 release milestone Jan 4, 2024
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
build/charts/antrea/crds/packetsampling.yaml Outdated Show resolved Hide resolved
cmd/antrea-controller/controller.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
build/yamls/antrea.yml Outdated Show resolved Hide resolved
pkg/agent/openflow/packetsampling.go Outdated Show resolved Hide resolved
@hangyan hangyan force-pushed the topic/yhang/packet-sampling branch 3 times, most recently from 274ebd6 to 110f542 Compare January 5, 2024 07:08
@hangyan
Copy link
Member Author

hangyan commented Jan 5, 2024

@luolanzone Hi lan, all updated. will make sure the golangci-fix pass and import order is correct in the following commits. Please have a review again

build/charts/antrea/conf/antrea-agent.conf Outdated Show resolved Hide resolved
build/charts/antrea/conf/antrea-controller.conf Outdated Show resolved Hide resolved
docs/packetsampling-guide.md Outdated Show resolved Hide resolved
docs/packetsampling-guide.md Outdated Show resolved Hide resolved
docs/packetsampling-guide.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
if !c.packetSamplingSynced() {
return errors.New("PacketSampling controller is not started")
}
oldPS, samplingState, shouldSkip, err := c.parsePacketIn(pktIn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's your thoughts about the name?

pkg/agent/controller/packetsampling/packetin_test.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
err = retry.RetryOnConflict(retry.DefaultRetry, func() error {
ps, err := c.packetSamplingInformer.Lister().Get(oldPS.Name)
if err != nil {
return fmt.Errorf("get packetsampling failed: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you used both packetsampling and PacketSampling in the log. Maybe stick to PacketSampling in logs and comments.

pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin_test.go Outdated Show resolved Hide resolved
pkg/features/antrea_features.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
pkg/controller/packetsampling/controller.go Outdated Show resolved Hide resolved
@luolanzone
Copy link
Contributor

@hangyan please check and add e2e test as traceflow_test.go for this new feature. Thanks.

@hangyan hangyan changed the title [WIP] Add packetsampling feature Add packetsampling feature Jan 31, 2024
@hangyan
Copy link
Member Author

hangyan commented Jan 31, 2024

hi @tnqn @wenyingd @gran-vmv Can you help review this PR too? Currently i'm working on finishing the unit tests and i think the main code is ready. Thanks!

@hangyan hangyan force-pushed the topic/yhang/packet-sampling branch 2 times, most recently from 9bf9515 to e6f0ee1 Compare February 6, 2024 11:20
hack/.notableofcontents Outdated Show resolved Hide resolved
build/charts/antrea/conf/antrea-agent.conf Outdated Show resolved Hide resolved
build/charts/antrea/conf/antrea-controller.conf Outdated Show resolved Hide resolved
build/charts/antrea/crds/packetsampling.yaml Outdated Show resolved Hide resolved
build/charts/antrea/conf/antrea-agent.conf Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
build/charts/antrea/conf/antrea-agent.conf Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
Comment on lines 78 to 93
packetDirectoryUnix = "/tmp/packetsampling/packets"
packetDirectoryWindows = "C:\\packetsampling\\packets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we not create directory under TEMP on Windows?

}

packet.DestinationIP = net.ParseIP(dstSvc.Spec.ClusterIP)
if !packet.IsIPv6 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this check to the logic in validation webhook when creating the CR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. updated to use golang function for temp dir.
  2. updated to use webhook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please use webhook to do the validation.

Copy link
Member Author

@hangyan hangyan Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on 4.7

Accoding to Quan's suggestions. controller was removed, so i think the validation webhook can keep the old way? because it only do some simple validation on the fields, not calling k8s apis.

also @tnqn . Is it ok to keep the validation code in controller/<feature-name>/... when there are no controller?


Is it possible that the pod/service object exist during the webhook validation but was deleted when the packetsapmpling was running? We still need to add error check in the code, right?

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finished yet

build/charts/antrea/conf/antrea-agent.conf Outdated Show resolved Hide resolved
build/charts/antrea/crds/packetsampling.yaml Outdated Show resolved Hide resolved
build/charts/antrea/templates/agent/clusterrole.yaml Outdated Show resolved Hide resolved
build/charts/antrea/templates/controller/clusterrole.yaml Outdated Show resolved Hide resolved
docs/packetsampling-guide.md Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
pkg/agent/controller/packetsampling/packetin.go Outdated Show resolved Hide resolved
@luolanzone
Copy link
Contributor

Hi @hangyan I have moved this PR out of v2.0, could you please create a separate PR for API first. We can do a further discussion for the data path implementation. Thanks.

@hangyan
Copy link
Member Author

hangyan commented Apr 23, 2024

Hi @hangyan I have moved this PR out of v2.0, could you please create a separate PR for API first. We can do a further discussion for the data path implementation. Thanks.

OK.

@hangyan hangyan force-pushed the topic/yhang/packet-sampling branch from ddfe4fb to 41cfb13 Compare April 23, 2024 07:55
@hangyan hangyan force-pushed the topic/yhang/packet-sampling branch 2 times, most recently from 07604b7 to 13d3d42 Compare September 14, 2024 08:19
@hangyan
Copy link
Member Author

hangyan commented Sep 19, 2024

cc @jianjuns @antoninbas also continue help review this. (there some quan's comment left on the pipeline impl i'm current working on a fix. but the other parts has been updated according to the api change.

docs/feature-gates.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
@luolanzone luolanzone added the action/release-note Indicates a PR that should be included in release notes. label Oct 14, 2024
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finished yet.

docs/feature-gates.md Outdated Show resolved Hide resolved
docs/feature-gates.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
docs/packetcapture-guide.md Outdated Show resolved Hide resolved
ci/kind/test-e2e-kind.sh Outdated Show resolved Hide resolved
// genEndpointMatchPackets generates match packets (with destination Endpoint's IP/port info) besides the normal match packet.
// these match packets will help the pipeline to capture the pod -> svc traffic.
// TODO: 1. support name based port name 2. dual-stack support
func (c *Controller) genEndpointMatchPackets(pc *crdv1alpha1.PacketCapture) ([]binding.Packet, error) {
Copy link
Contributor

@luolanzone luolanzone Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree not to support Pod-to-Service at the moment, @jianjuns @antoninbas please input if you have a different view. Thanks.

ofPort = uint32(podInterfaces[0].OFPort)
senderPacket = packet
klog.V(2).InfoS("PacketCapture sender packet", "packet", *packet)
if senderOnly && pc.Spec.Destination.Service != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you missed another case that senderOnly && pc.Spec.Destination.IP !="" . I feel senderOnly is not aproper name, maybe emptyDestinationPod

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was only for the service case to generated flows for endpoints. Since we are planning to remove this, will see after that settled

@hangyan hangyan force-pushed the topic/yhang/packet-sampling branch 2 times, most recently from d10cfd0 to 3e56a16 Compare October 14, 2024 10:20
}
}

c.runningPacketCapturesMutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple returns between c.runningPacketCapturesMutex.Lock() and c.runningPacketCapturesMutex.Unlock(), there will be dead locks if the function returned but Unlock() is not executed. Please wrap them into a function and use defer to run unlock:

func(){
    c.runningPacketCapturesMutex.Lock()
    defer c.runningPacketCapturesMutex.Unlock()
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 338 to 352
pcState.shouldSyncPackets = len(podInterfaces) > 0
if !pcState.shouldSyncPackets {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean you only capture the packets on the destination Node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly. we first select a target pod, then use this pod to select the target node. The pod could be on the source or the dest, so the node also is.

Comment on lines 361 to 387
var file afero.File
filePath := uidToPath(string(pc.UID))
if _, err := os.Stat(filePath); err == nil {
return fmt.Errorf("packet file already exists. this may be due to an unexpected termination")
} else if os.IsNotExist(err) {
file, err = defaultFS.Create(filePath)
if err != nil {
return fmt.Errorf("failed to create pcapng file: %w", err)
}
} else {
return fmt.Errorf("couldn't check if the file exists: %w", err)
}
writer, err := pcapgo.NewNgWriter(file, layers.LinkTypeEthernet)
if err != nil {
return fmt.Errorf("couldn't initialize pcap writer: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably wrap into a function as getFileAndWriter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
packet.IsIPv6 = pc.Spec.Packet.IPFamily == v1.IPv6Protocol

if receiverOnly {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receiverOnly = pc.Spec.Destination.Pod != nil && pc.Spec.Source.Pod == nil
When it's receiverOnly, why the destination Pod in the spec is ignored in this case? here only assigned source IP and destinationMAC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an ofPort matcher in the flow to serve this purpose in this case.

			MatchDstMAC(packet.DestinationMAC).
			Action().LoadToRegField(TargetOFPortField, ofPort).

} else if inputProto.StrVal == "UDP" {
return protocol.Type_UDP
} else {
return protocol.Type_IPv6ICMP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not correct, I didn't see you limit the protocol scope in the manifest:

                   protocol:
                      x-kubernetes-int-or-string: true

I can see we set port as x-kubernetes-int-or-string in most cases, we should limit the protocol to three string types: TCP/UDP/ICMP as you mentioned in the usage document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, waiting for the updates from the API MR comments.

cc @antoninbas to confirm this.

authConfig := getDefaultFileServerAuth()
serverAuth, err := ftp.ParseBundleAuth(*authConfig, c.kubeClient)
if err != nil {
klog.ErrorS(err, "Failed to get authentication defined in the PacketCapture CR", "name", pc.Name, "authentication", authConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log correct? it's not defined by the CR but a pre-defined secret, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@luolanzone
Copy link
Contributor

@hangyan don't forget to resolve code conflicts in pkg/agent/supportbundlecollection/support_bundle_controller.go

Signed-off-by: Hang Yan <[email protected]>
@hangyan
Copy link
Member Author

hangyan commented Oct 16, 2024

@hangyan don't forget to resolve code conflicts in pkg/agent/supportbundlecollection/support_bundle_controller.go

Done. All fixed. Please take a look again.

Signed-off-by: Hang Yan <[email protected]>
Signed-off-by: Hang Yan <[email protected]>
Signed-off-by: Hang Yan <[email protected]>
Signed-off-by: Hang Yan <[email protected]>
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the changes for API related into a dedicated commit? and the details should be added to the commit message of data path change, this can help to simplify code review. Please do the same if you plan to have a new PR for the new data path implementation.

packet file from the sftp server(or from local antrea-agent pod) and analyze its contents with network diagnose tools
like Wireshark or tcpdump.

Currently we support max to `15` concurrrent PacketCapture session running at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Currently we support max to `15` concurrrent PacketCapture session running at the same time.
Currently, we support a maximum of 15 concurrent sessions running at the same time.

dstPort: 8080 # Destination port needs to be set when the protocol is TCP/UDP.
status:
numCapturedPackets: 5
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, filename format is <uid>.pcapng
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, filename format is <uid>.pcapng
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, the filename format is <uid_of_packetcapture>.pcapng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "The path of the packets file that is uploaded to the target file server, in the format of :. The PacketCapture CR UID is used as the file name".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CR above starts a new packet capture of TCP flows from a Pod named `frontend`
to the port 8080 of a Pod named `backend` using TCP protocol. It will capture the first 5 packets
that meet this criterion and upload them to the specified sftp server. Users can download the
packet file from the sftp server(or from local antrea-agent pod) and analyze its contents with network diagnose tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide the path info in the antrea-agent pod, and it's not clear about "local antrea-agent pod", which antrea-agent Pod it will be? The Pod on the Node where the target Pod is running?

@luolanzone luolanzone removed the action/release-note Indicates a PR that should be included in release notes. label Oct 30, 2024
@luolanzone luolanzone removed this from the Antrea v2.2 release milestone Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] A new PacketSampling CRD
6 participants