Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward memory explode after 24h of running #1134

Open
denis-tingaikin opened this issue Jul 8, 2024 · 10 comments
Open

Forward memory explode after 24h of running #1134

denis-tingaikin opened this issue Jul 8, 2024 · 10 comments
Assignees

Comments

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Jul 8, 2024

Steps to reproduce

  1. Run the forwarder 24-48h

Actual: memory explode happens and it restarts
Expected: forwarder should be stable.

Details

image

Reproducing with releases: v1.12.0, v1.13.0, v1.13.1

forwarder-vpp_goroutineprofiles_20240706091528.tar.gz
forwarder-vpp_memprofiles_20240706091613.tar.gz

@denis-tingaikin denis-tingaikin self-assigned this Jul 8, 2024
@ljkiraly
Copy link
Contributor

Now the problem occured after 8 hours of testing.
~87 MB memory increase during 1 hour:
image002

go routine profiles:
forwarder-vpp_goroutineprofiles_20240711170319.tar.gz
memory profiles:
forwarder-vpp_memprofiles_20240711170404.tar.gz

Preparing the logs.

@denis-tingaikin
Copy link
Member Author

Yeah, it looks the same as after 24+h of running. Waiting logs.

@ljkiraly
Copy link
Contributor

Forwarder logs for the last occurrence
forwarder-vpp-mem-explosion-logs-20240711-1.tar.gz

@denis-tingaikin
Copy link
Member Author

Could we also get the mapping of pods to nodes?

@denis-tingaikin
Copy link
Member Author

as I can see from logs forwarder-vpp-qqpfp is located on the node n4, right?

@denis-tingaikin
Copy link
Member Author

@ljkiraly I've prepared a fix that should improve the situation in times tinden/cmd-forwarder-vpp:v1.13.2-fix.1. If it works then we'll need to run a few more tests to get a more effective solution.

@ljkiraly
Copy link
Contributor

ljkiraly commented Jul 11, 2024

as I can see from logs forwarder-vpp-qqpfp is located on the node n4, right?

Yes.

forwarder-vpp-56726 1/1 Running 0 9h 10.0.10.106 pool1-ceph-seliics06744-n7
forwarder-vpp-72kmm 1/1 Running 0 9h 10.0.10.103 pool1-ceph-seliics06746-n5
forwarder-vpp-9k6w4 1/1 Running 0 9h 10.0.10.101 control-plane-control-seliics06748-n1
forwarder-vpp-llbts 1/1 Running 0 9h 10.0.10.102 control-plane-control-seliics06750-n3
forwarder-vpp-qqpfp 1/1 Running 0 9h 10.0.10.104 pool1-ceph-seliics06745-n4
forwarder-vpp-tnsd9 1/1 Running 0 9h 10.0.10.105 pool1-ceph-seliics06747-n6
forwarder-vpp-wsx8x 1/1 Running 0 9h 10.0.10.100 control-plane-control-seliics06749-n2

@ljkiraly
Copy link
Contributor

@denis-tingaikin The test now failed after 16 hours with tinden/cmd-forwarder-vpp:v1.13.2-fix.1
This time no server listening on forwarder pod so no profiles can be collected, but find the logs attached.
forw-mem-expl-20240713
logs-2024-07-13-09-15-33.tar.gz

@denis-tingaikin
Copy link
Member Author

denis-tingaikin commented Jul 16, 2024

@ljkiraly OK, we have some small differences here.

For the previous patch in 1h, we get ~87 MB.
For the last patch in 1 hour, we get ~50 MB.

(It might be just luck, but let's keep an eye on it.)

I also prepared a new version of forwarder-vpp: tinden/cmd-forwarder-vpp:v1.13.2-fix.3 

The profile server is available.

Question: Is it possible to get forwarded logs from the forwarders?

Means run something like this for all forwarders (before test start):

kubectl logs -f -n $namespace $forwarderName1 > fwd1.txt
kubectl logs -f -n $namespace $forwarderName2 > fwd2.txt

It could show more details.

@ljkiraly
Copy link
Contributor

Very promising test result from the test started at 2024-07-16 16:17:59 with image tinden/cmd-forwarder-vpp:v1.13.2-fix.3
The test is still running. Latest memory profiles:
forwarder-vpp_goroutineprofiles_20240719093900.tar.gz
Memory consumption in last 12 hours:
fix3-memory-usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants