Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure performance of NSM cmd-forwarder-vpp and provide results #123

Open
denis-tingaikin opened this issue Apr 9, 2021 · 9 comments
Open
Assignees
Labels
bug Something isn't working enhancement New feature or request performance The problem related to system effectivity

Comments

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Apr 9, 2021

Description

We need to know the performance of NSM cmd-forwarder-vpp.

Steps

  1. Estimate maximum possible connections that cmd-forwarder-vpp can support with the default configuration. (Please measure ping latency)
  2. Estimate performance for each of the combinations of the mechanisms for 1...N client/endpoint pairs. Where N is the number from step1. mechanisms: memif2memif, kerne2kernel, memif2vxlan2memif, kernel2vxlan2kernel, kernel2wireguard2kernel, memif2wireguard2memif.
  3. Estimate performance for a combination of different pairs of mechanisms for 1...N client/endpoint pairs. Where N is the number from step1. memif2kernel, kernel2memif, memif2vxlan2kernel, kernel2vxlan2memif, memif2wireguard2kernel, kernel2wireguard2memif.
  4. Report each found issue during testing.
  5. Provide results of benchmark testing.
  6. Provide charts of performance/latencty/memory/cpu for each mechanism's combination.
  7. Recommend ways for improving performance.

Raw estimation

8d

@denis-tingaikin denis-tingaikin added the enhancement New feature or request label Apr 9, 2021
@edwarnicke
Copy link
Member

This is a good task for 'component performance'... but we will also want 'systemic' performance measurements as well :)

@Mixaster995
Copy link
Contributor

Mixaster995 commented Jul 12, 2021

Made some local testing and collected results.

Setup

N clients, 1 endpoint, each client makes 1 request and checks the connection.
On each request all clients, that already established connection before also re-checks the connection.

Intermediate results

I measured maximum number of connections for cases KernelToKernel and MemifToMemif.
Didn't found any critical issues or performance problems, latency is quite low. Sometimes though in kernelTokernel case latency jumps quite high for single tests, but i haven't figured out reason for this behaviour yet.
Memif case is slower, but more stable than Kernel, although it only works for 20 connections on my local pc. Kernel sustain something around 100-105 connections, but sometimes fails to connect.

adding some results i've collected and also a few charts
text results(csv) - measurements.tar.gz
charts:
memifToMemif20
memifToMemif20
memifToMemif23
memifToMemif23
memifToMemif25
memifToMemif25
kernelToKernel150(1st-run-part-1)
kernelToKernel150(1st-run-part-1)
kernelToKernel150(1st-run-part-2)
kernelToKernel150(1st-run-part-2)
kernelToKernel150(2nd-run-part-1)
kernelToKernel150(2nd-run-part-1)
kernelToKernel150(2nd-run-part-2)
kernelToKernel150(2nd-run-part-2)
kernelToKernel150(3rd-run-part-1)
kernelToKernel150(3rd-run-part-1)
kernelToKernel150(3rd-run-part-2)
kernelToKernel150(3rd-run-part-2)
kernelToKernel150(4th-run-part-1)
kernelToKernel150(4th-run-part-1)
kernelToKernel150(4th-run-part-2)
kernelToKernel150(4th-run-part-2)
kernelTokernelToKernel150(5th-run-part-1)
kernelTokernelToKernel150(5th-run-part-1)
kernelToKernel150(5th-run-part-2)
kernelToKernel150(5th-run-part-2)

Question

@edwarnicke what do you think about testing this on some cloud cluster instead of local environment? And if you agree with that, which cloud environment you'd like me to start working with?

@denis-tingaikin
Copy link
Member Author

denis-tingaikin commented Jul 12, 2021

@Mixaster995 Please unzip all charts and attach to the issue.

@denis-tingaikin
Copy link
Member Author

@Mixaster995 Do you have any investigation result related to kernel interfaces instability?

@denis-tingaikin
Copy link
Member Author

@edwarnicke I think this testing can be more useful with we moving to system level testing. I also think that we need to test performance on all our public clusters and start with GKE. WDYT?

@Mixaster995
Copy link
Contributor

I made some changes for testing and tested network bandwidth with iperf3 utility. Currently i've collected some results for "N kernel clients to single kernel endpoint interface" case. Attached chart for this measurements.

kernel_to_kernel_5:
kernel_to_kernel_5
kernel_to_kernel_10:
kernel_to_kernel_10
kernel_to_kernel_15:
kernel_to_kernel_15
kernel_to_kernel_20:
kernel_to_kernel_20

@edwarnicke
Copy link
Member

I'd be curious to see some other combinations like memif to memif :)

@Mixaster995
Copy link
Contributor

kernel2vxlan2kernel-2
kvk2
kernel2vxlan2kernel-3
kvk3
kernel2vxlan2kernel-5
kvk5

kernel2wireguard2kernel-2
kwk2
kernel2wireguard2kernel-3
kwk3
kernel2wireguard2kernel-4
kwk4
kernel2wireguard2kernel-5
kwk5

@Mixaster995
Copy link
Contributor

Prepared some testing stand for local testing in forwarder-vpp. But some weird bug prevented me to use it correctly. After the discussion with the team, i decided to switch to more 'real' testing which doesn't have that bug. Now infrastructure is very similiar to ordinary deployments, but with sidecar containers on nscs and nses which are starting iperf and collecting resulted data. Now working on this direction and deploying everything locally on kind. When some successful results would be obtained, going to move to the other clouds(gke, aks, aws, etc.)

@denis-tingaikin denis-tingaikin added performance The problem related to system effectivity bug Something isn't working labels Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request performance The problem related to system effectivity
Projects
None yet
Development

No branches or pull requests

3 participants