-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory consumption for UDP associations on Linux (OpenWRT) #745
Comments
I ran the test tools nearly 15 minutes and didn't see memory was occupied more than 7MiB. 400MiB is just impossible. |
The reason why you did not exceed 7M is because you did not turn on UDP forwarding.
You turn on udp forwarding, use the udp test tool I gave, and try it. |
Of course I do enable the UDP relay. Just read the code I just provided and I have only sent UDP DNS queries. |
That is, the mode in which your test tool sends data is too simple and not easy to trigger. I have given you the source code of my test tool. You can compile it yourself and try it. |
Did you not see my screenshot? Am I cheating? |
The |
|
openwrt is used as a router, dns.exe is running under my windows 11 host, |
My test code is exactly the same as yours. You didn't bind the outbound interface, which means that you will have to add route entries for your test name servers to make this test working. I didn't see any difference between So instead of modifying your |
The tcp connection is my other web browsing, I forward all the lan port connections. I can't understand the code, but I have provided you with a way to trigger, |
Since we are talking about memory leak, it should be happening no matter what destination or what port you are using but just creating UDP associations rapidly in a short time. Your You cannot test it without a clean environment, may be the problem that you are facing is just completely unrelated to UDP, which is just wasting our time. I am now going to run |
/etc/config/network
lan----windows
|
I have told you the test environment, and even the source code of the trigger. If you don't do it, you still question me. If you follow your own logic, you will not find the problem. Are the screenshots I gave you fake? I can trigger, but you can't trigger, is this my problem or your problem? |
I have reported several bugs to you, which one was not confirmed in the end? |
You are reporting bugs to an opensource project. I am not paid and not full time working on it. If you just continue just like that I must have to solve your problem or I should do my best to service you, conversation ends here. Remember, you are not a customer to me. We are just software developers. I won't follow your configuration because you are adding variables and you test method is just nonsense. And I can confirm that I can see nearly 400MiB RSS on R4S after opening 100 UDP clients sending queries continously (: But the memory consumption is stable and start to drop (the clients are still here): The test is still running, and I cannot see any memory leak because memory usage is very stable. So if you are willing to go on and see why it takes 400MiB to maintain 65536 UDP associations, we can go on. But I think I have already made a conclusion, there is no memory leak. |
If these dns requests have closed the socket, shouldn't the memory be freed? |
I have tested https://github.com/xjasonlyu/tun2socks , and the function is the same. After I concurrently make a dns request, the memory will fall back to more than 100M after 5 minutes. Rust should take up less memory. |
When I just started sslocal, it took up less than 10M of memory, but after I requested a lot of dns, the memory fell back to less than 10M |
I don't understand technology, I just ask my question, because the memory of the router is relatively small when it is thrown into openwrt, so I have such doubts. |
https://github.com/xjasonlyu/tun2socks 这是这个tun2socks, Peak 1.3G, will eventually fall back to 180MThe peak value of sslocal is 1.1G, and it can only fall back to 450M in the end. |
I heard from others that rust will occupy less memory, but it can only fall back to 450M, so it will not continue. I want to know what he does with that memory, and why it is not returned to the system. |
This is not an easy job to find out. I am now working on reviewing the code. |
Ok, I'm not sure if this is a problem with the code, I'm just asking my own question. |
I don't know what's going on. I have been using golang's tun2socks before, but go takes up a lot of memory, so I tried to use ssloacl, but the current performance seems to be worse. |
https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1704368740 VmHWM: 1172244 kB I tested this and it did not solve the problem, but the speed of reclaiming memory seems to be accelerated, but it also stops when it returns to around 450M. |
It doesn't for fixing this issue, just an optimization that I found while reviewing. |
It is recommended to add the s5 protocol to the tun mode. I have tested some new problems, but my comparison test is based on the transparent socks5 protocol. If you test with 10,000 tcp requests concurrently, and then perform speedtest speed test, the rate will be very low! But I use another golang tun2socks based on socks5 protocol but it won't. |
Just keep in mind that this is a shadowsocks Project. |
The performance issue may come from the new library |
- ref ##745, significantly increase thoughtput, but still slower than system network stack
https://github.com/wenjiax/stress
Use stress to run 10,000 concurrent times for 2 minutes, then shut down, and then use www.speedtest.net to test the speed. The downlink speed is only 20Mbps at most, but before the concurrent speed test, the downlink speed can be above 250Mbps. |
After I use this project, there won't be any slowdown |
You don't need to test every commits I made. I will tell you when it is ready for test. And you can stop mentioning tun2socks, which uses |
Except for Let me explain more about the current problems found in
But since my virtual machine only have 1 core, so all the tasks are now running in 1 thread, which means that they don't need to compete with each others about this lock. I don't want to make a conclusion that If you have any thoughts about where is the bottleneck of the program, please comments. |
- ref #745 - reduce manager lock acquisition compete among TcpSockets and Interface::poll
@f4nff Please test the latest commit and see if it solves this issue in your environment. |
https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1722149352 After the tcp concurrency test, the speed is measured, and the speed drops a lot. Just started the speed test with 200Mbps+, now only... |
This is the test locally in my virtual machine (x86_64):
Why your environment is so special? :( |
两点小建议,不妨参考一下: |
@f4nff 你试试改短 |
Actually speedtest working quite well in my environment. You may test it again with the latest build (the commit you tested has bugs). Another test command line didn't show any results:
I modified
This |
This comment has been minimized.
This comment has been minimized.
It's not convenient for me to test now, I will retest later |
Did you got any results? @f4nff Would you consider using another OpenWRT distribution instead of FriendlyWRT? |
However, this version is obviously much faster than before, but after the concurrent udp memory, it still occupies a lot and will not be released. Generally speaking, it is still not as good as https://github.com/xjasonlyu/tun2socks |
I analyzed it, and what tun2socks-go uses is: The author is reluctant to add socks5 to the handshake protocol, so I can't give more exact comparison data now. |
I just couldn't reproduce it locally because I got 500M in speedtest.net (this is the maximum bandwidth of my public network). So I still think there is just something special only happen in your environment. You can try to run iperf / iperf3 on local network to avoid the bottleneck of your outbound network (300M) for reducing variables in your environment. The current TCP stack still have some performance problems comparing to system's builtin, but it only happens when reaching to 1000M. I will continue working on this problem. And I don't think the protocol is the problem because R4S should easily run encryption protocols when handling 1000M bandwidth. On the other hand, the memory consumption problem should already have a conclusion. BTW, did you have software development background? If so, could you just help to findout what exactly differences between tun2socks and sslocal in your local environment? It is very hard to locate the exact problem if I couldn't reproduce it locally. The optimization I just did is based on imagination and experiences. |
I don't know how to develop, but I have been in charge of testing. I think you'd better add socks5 protocol, so that I can give you accurate test feedback. Besides, socks5 is used very frequently. |
Test equipment: r4s
Test system: openwrt
https://github.com/shadowsocks/shadowsocks-rust/suites/4952439929/artifacts/143776137
download:https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1703977841
/etc/sysctl.d/11-nf-conntrack.conf
/root/socks/sslocal --protocol tun -s "[::1]:8388" -m "aes-256-gcm" -k "hello-kitty" --outbound-bind-interface lo --tun-interface-name tun1 -U --udp-timeout 60 --udp-max-associations 65535
dns.exe -sr gddx.txt -at google.com -sl 6
dns-test.zip
Start dns to test for 30 seconds, then shut down.
Then wait for 5 minutes, after the number of connections drops,
Then it can be seen that sslocal occupies more than 400M of memory and will not continue to release,
The text was updated successfully, but these errors were encountered: