Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption for UDP associations on Linux (OpenWRT) #745

Closed
f4nff opened this issue Jan 16, 2022 · 61 comments
Closed

High memory consumption for UDP associations on Linux (OpenWRT) #745

f4nff opened this issue Jan 16, 2022 · 61 comments

Comments

@f4nff
Copy link

f4nff commented Jan 16, 2022

Test equipment: r4s
Test system: openwrt

https://github.com/shadowsocks/shadowsocks-rust/suites/4952439929/artifacts/143776137
download:https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1703977841

/etc/sysctl.d/11-nf-conntrack.conf

# Do not edit, changes to this file will be lost on upgrades
# /etc/sysctl.conf can be used to customize sysctl settings

net.netfilter.nf_conntrack_acct=1
net.netfilter.nf_conntrack_checksum=0
net.netfilter.nf_conntrack_max=1020000
net.netfilter.nf_conntrack_tcp_timeout_established=7440
net.netfilter.nf_conntrack_udp_timeout=60
net.netfilter.nf_conntrack_udp_timeout_stream=180

/root/socks/sslocal --protocol tun -s "[::1]:8388" -m "aes-256-gcm" -k "hello-kitty" --outbound-bind-interface lo --tun-interface-name tun1 -U --udp-timeout 60 --udp-max-associations 65535

dns.exe -sr gddx.txt -at google.com -sl 6

dns-test.zip

Start dns to test for 30 seconds, then shut down.

Then wait for 5 minutes, after the number of connections drops,

image

image

Then it can be seen that sslocal occupies more than 400M of memory and will not continue to release,

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 16, 2022

I ran the test tools nearly 15 minutes and didn't see memory was occupied more than 7MiB. 400MiB is just impossible.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

The reason why you did not exceed 7M is because you did not turn on UDP forwarding.

-U --udp-timeout 60 --udp-max-associations 65535

You turn on udp forwarding, use the udp test tool I gave, and try it.

@zonyitoo
Copy link
Collaborator

Of course I do enable the UDP relay. Just read the code I just provided and I have only sent UDP DNS queries.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

That is, the mode in which your test tool sends data is too simple and not easy to trigger. I have given you the source code of my test tool. You can compile it yourself and try it.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

Did you not see my screenshot? Am I cheating?

@zonyitoo
Copy link
Collaborator

The dns.exe is not for OpenWRT, right? It is a NT32 program. The dns.go is the source code right?

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

dns.go is the source code!

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

openwrt is used as a router, dns.exe is running under my windows 11 host,

@zonyitoo
Copy link
Collaborator

My test code is exactly the same as yours. You didn't bind the outbound interface, which means that you will have to add route entries for your test name servers to make this test working.

I didn't see any difference between dns.go and dns_pressure.rs.

So instead of modifying your dns.go, I will run dns_pressure on my R4S with your provided sysctl configuration and see what would happened.

@zonyitoo
Copy link
Collaborator

BTW, your dns.go will only open UDP sockets for sending queries, why there are so many TCP connections in your screenshots?

image

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

The tcp connection is my other web browsing, I forward all the lan port connections.
When you request dns, request 208.67.222.222:443, not 208.67.222.222:53

I can't understand the code, but I have provided you with a way to trigger,

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 16, 2022

Since we are talking about memory leak, it should be happening no matter what destination or what port you are using but just creating UDP associations rapidly in a short time.

Your dns.go is nothing just open a UDP socket to 208.67.222.222:443 and then send a DNS query message in plain (without any encryption). So it is just completely the same as dns_pressure.rs.

You cannot test it without a clean environment, may be the problem that you are facing is just completely unrelated to UDP, which is just wasting our time.

I am now going to run dns_pressure with nameserver 208.67.222.222:443. The sslocal is running just like yours.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

config interface 'tun1'
	option proto 'static'
	option ipaddr '10.255.0.1'
	option netmask '255.255.255.0'
	option device 'tun1'

config route
	option interface 'tun1'
	option target '0.0.0.0'
	option netmask '0.0.0.0'
	option gateway '10.255.0.1'
	option table '10'

config rule
	option in 'lan'
	option lookup '10'
	option priority '0'

/etc/config/network

/root/socks/sslocal --protocol tun -s "[::1]:8388" -m "aes-256-gcm" -k "hello-kitty" --outbound-bind-interface lo --tun-interface-name tun1 -U --udp-timeout 60 --udp-max-associations 65535

lan----windows

dns.exe -sr gddx.txt -at google.com -sl 6

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I have told you the test environment, and even the source code of the trigger. If you don't do it, you still question me.

If you follow your own logic, you will not find the problem.

Are the screenshots I gave you fake?

I can trigger, but you can't trigger, is this my problem or your problem?

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I have reported several bugs to you, which one was not confirmed in the end?

@zonyitoo
Copy link
Collaborator

You are reporting bugs to an opensource project. I am not paid and not full time working on it. If you just continue just like that I must have to solve your problem or I should do my best to service you, conversation ends here.

Remember, you are not a customer to me. We are just software developers.

I won't follow your configuration because you are adding variables and you test method is just nonsense.

And I can confirm that I can see nearly 400MiB RSS on R4S after opening 100 UDP clients sending queries continously (:

image

But the memory consumption is stable and start to drop (the clients are still here):

image

The test is still running, and I cannot see any memory leak because memory usage is very stable.

So if you are willing to go on and see why it takes 400MiB to maintain 65536 UDP associations, we can go on. But I think I have already made a conclusion, there is no memory leak.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

If these dns requests have closed the socket, shouldn't the memory be freed?

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I have tested https://github.com/xjasonlyu/tun2socks , and the function is the same. After I concurrently make a dns request, the memory will fall back to more than 100M after 5 minutes. Rust should take up less memory.

@zonyitoo
Copy link
Collaborator

Yes, they are freed after 60 seconds

image

And yes, it is 100MiB on my R4S.

image

lsof shows that all the sockets are closed correctly.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

When I just started sslocal, it took up less than 10M of memory, but after I requested a lot of dns, the memory fell back to less than 10M

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I don't understand technology, I just ask my question, because the memory of the router is relatively small when it is thrown into openwrt, so I have such doubts.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

image

image

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

image

https://github.com/xjasonlyu/tun2socks

这是这个tun2socks,
VmHWM: 1370344 kB
VmRSS: 183384 kB

Peak 1.3G, will eventually fall back to 180M

The peak value of sslocal is 1.1G, and it can only fall back to 450M in the end.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I heard from others that rust will occupy less memory, but it can only fall back to 450M, so it will not continue. I want to know what he does with that memory, and why it is not returned to the system.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 16, 2022

This is not an easy job to find out. I am now working on reviewing the code.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

Ok, I'm not sure if this is a problem with the code, I'm just asking my own question.

@zonyitoo zonyitoo changed the title udp memory leak High memory consumption for UDP associations on Linux (OpenWRT) Jan 16, 2022
@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

I don't know what's going on. I have been using golang's tun2socks before, but go takes up a lot of memory, so I tried to use ssloacl, but the current performance seems to be worse.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

image

https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1704368740

VmHWM: 1172244 kB
VmRSS: 447888 kB

I tested this and it did not solve the problem, but the speed of reclaiming memory seems to be accelerated, but it also stops when it returns to around 450M.

@zonyitoo
Copy link
Collaborator

It doesn't for fixing this issue, just an optimization that I found while reviewing.

@f4nff
Copy link
Author

f4nff commented Jan 16, 2022

/root/socks/sslocal --protocol tun -s5 "[::1]:1080" --outbound-bind-interface lo --tun-interface-name tun1 -U --udp-timeout 60 --udp-max-associations 65535

It is recommended to add the s5 protocol to the tun mode. I have tested some new problems, but my comparison test is based on the transparent socks5 protocol.

If you test with 10,000 tcp requests concurrently, and then perform speedtest speed test, the rate will be very low! But I use another golang tun2socks based on socks5 protocol but it won't.

@zonyitoo
Copy link
Collaborator

Just keep in mind that this is a shadowsocks Project.

@zonyitoo
Copy link
Collaborator

The performance issue may come from the new library smoltcp, which I just imported in refactoring. Just ignore it and I will make sure it is Ok when publishing a new version.

zonyitoo added a commit that referenced this issue Jan 17, 2022
- ref ##745, significantly increase thoughtput, but still slower than
system network stack
@f4nff
Copy link
Author

f4nff commented Jan 17, 2022

https://github.com/wenjiax/stress

./stress -n -1 -c 10000 -m POST https://www.google.com/

Use stress to run 10,000 concurrent times for 2 minutes, then shut down, and then use www.speedtest.net to test the speed. The downlink speed is only 20Mbps at most, but before the concurrent speed test, the downlink speed can be above 250Mbps.

@f4nff
Copy link
Author

f4nff commented Jan 17, 2022

tun2socks

After I use this project, there won't be any slowdown

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 18, 2022

You don't need to test every commits I made. I will tell you when it is ready for test. And you can stop mentioning tun2socks, which uses gvisor as its engine and network stack, which is a mature application published by Google. More optimization have to be made with smoltcp in Rust.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 18, 2022

Except for tun, all the other local client interfaces have the same throughput as direct connections in a 1000M network. So the problem isn't about protocol or other parts of the sslocal, it is just about the tun interface implementation.

Let me explain more about the current problems found in tun local:

  1. smoltcp requires a Virtual Interface structure to drive the state machine of all TCP sockets, so packets input / output have to be transferred to this virtual interface with channels (memory copies).
  2. smoltcp's Virtual Interface managers all the allocated sockets in one single structure and drive the state machines every time when poll was called. So if we want to use it in multi-thread program, we have to use Mutex to protect this interface, or manager. Every call of reads and writes have to take that lock first. poll must be called repeatedly in very short interval (in milliseconds).

But since my virtual machine only have 1 core, so all the tasks are now running in 1 thread, which means that they don't need to compete with each others about this lock. I don't want to make a conclusion that smoltcp can only run in that lower throughtput, so I will keep investigating.

If you have any thoughts about where is the bottleneck of the program, please comments.

zonyitoo added a commit that referenced this issue Jan 19, 2022
- ref #745
- reduce manager lock acquisition compete among TcpSockets and
Interface::poll
@zonyitoo
Copy link
Collaborator

@f4nff Please test the latest commit and see if it solves this issue in your environment.

@f4nff
Copy link
Author

f4nff commented Jan 20, 2022

https://github.com/shadowsocks/shadowsocks-rust/actions/runs/1722149352

download tests:
image

After the tcp concurrency test, the speed is measured, and the speed drops a lot. Just started the speed test with 200Mbps+, now only...

golang-tunsocks

image

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 20, 2022

This is the test locally in my virtual machine (x86_64):

Connecting to host 10.43.57.87, port 5201
[  4] local 10.255.0.1 port 34818 connected to 10.43.57.87 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  61.7 MBytes   517 Mbits/sec    0   88.4 KBytes
[  4]   1.00-2.00   sec  61.3 MBytes   514 Mbits/sec    0   88.4 KBytes
[  4]   2.00-3.00   sec  56.4 MBytes   473 Mbits/sec    0   88.4 KBytes
[  4]   3.00-4.00   sec  56.4 MBytes   473 Mbits/sec    0   88.4 KBytes
[  4]   4.00-5.00   sec  58.7 MBytes   493 Mbits/sec    0   88.4 KBytes
[  4]   5.00-6.00   sec  55.6 MBytes   466 Mbits/sec    0   88.4 KBytes
[  4]   6.00-7.00   sec  60.6 MBytes   509 Mbits/sec    0   88.4 KBytes
[  4]   7.00-8.00   sec  60.9 MBytes   510 Mbits/sec    0   88.4 KBytes
[  4]   8.00-9.00   sec  58.9 MBytes   494 Mbits/sec    0   88.4 KBytes
[  4]   9.00-10.00  sec  57.0 MBytes   478 Mbits/sec    0   88.4 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   587 MBytes   493 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   587 MBytes   492 Mbits/sec                  receiver

iperf Done.

Why your environment is so special? :(

@dev4u
Copy link

dev4u commented Jan 20, 2022

1:
./stress -n -1 -c 10000 -m POST https://www.google.com/
2: tests speedtest

两点小建议,不妨参考一下:
1.不要测试带cdn的域名地址,因为请求到哪带有不确定性。
2.一般来说,对一个工具做性能压测,会通过本地端口来评估,因为网络是个很不稳定的因素。

@dev4u
Copy link

dev4u commented Jan 20, 2022

@f4nff 你试试改短net.netfilter.nf_conntrack_tcp_timeout_established=7440这个设置。

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 20, 2022

image

Actually speedtest working quite well in my environment. You may test it again with the latest build (the commit you tested has bugs).

Another test command line didn't show any results:

./stress -n -1 -c 10000 -m POST https://www.google.com/

I modified -n -1 to -n 10000 and got:


Summary:
  Total:		20.7056 secs
  ReqBeforeTotal:	0.0003 secs
  ResAfterTotal:	0.0005 secs
  Slowest:		20.0010 secs
  Fastest:		0.6379 secs
  Average:		11.9593 secs
  Requests/sec:		482.9617

Detailed Report:

  URL:  [POST] https://www.google.com/

	DNS+dialup:
  		Average:	1.0319 secs
  		Fastest:	0.0968 secs
  		Slowest:	13.9927 secs

	DNS-lookup:
  		Average:	0.0015 secs
  		Fastest:	0.0000 secs
  		Slowest:	0.0411 secs

	Request Before:
  		Average:	0.0000 secs
  		Fastest:	0.0000 secs
  		Slowest:	0.0000 secs

	Request Write:
  		Average:	0.0000 secs
  		Fastest:	0.0000 secs
  		Slowest:	0.0042 secs

	Response Wait:
  		Average:	0.0485 secs
  		Fastest:	0.0509 secs
  		Slowest:	1.2964 secs

	Response After:
  		Average:	0.0000 secs
  		Fastest:	0.0000 secs
  		Slowest:	0.0000 secs

	Response Read:
  		Average:	0.0073 secs
  		Fastest:	0.0000 secs
  		Slowest:	3.2125 secs

	Response Summary:
		Total data:	6820640 bytes
		Size/request:	682 bytes

	Status code distribution:
		[429]	2694 responses

	Error distribution:
		[21]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOe7po8GIhCxomn5220Cxw5qxslSMQiZMgFy": EOF
		[5]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOO7po8GIhBHbdl6d0xmXstii_iO4l7xMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOi7po8GIhCqpnY-DLpHBi22_lq7QGYRMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[6]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOq7po8GIhClY2GV2vvM_xTr_6_NRFDaMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[47]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGNy7po8GIhCkNUhcJiMCeV669nQq3hbuMgFy": EOF
		[17]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGN27po8GIhAkSq-WNYcKcRKOldMV23LbMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOS7po8GIhBfzhN4e2NDh04i0WSE8DQXMgFy": read tcp 10.255.0.1:52560->142.250.204.68:443: read: connection reset by peer
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOS7po8GIhBfzhN4e2NDh04i0WSE8DQXMgFy": read tcp 10.255.0.1:52807->142.250.204.68:443: read: connection reset by peer
		[28]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOK7po8GIhAr13Oi2M0qIeiILougF--yMgFy": EOF
		[5]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOm7po8GIhCqpTt_zBMwBeMIWjP3zeWxMgFy": EOF
		[2]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGN-7po8GIhBI46XyWbYJYkPmc4Lu2A9xMgFy": EOF
		[5]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOK7po8GIhBr79MmgyzxNgWWcK8_kzzCMgFy": EOF
		[17]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOi7po8GIhClAp-r8dzK12D8LpKGhxu5MgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOm7po8GIhBi_SoJy-X8l_YTb1tJSUQoMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOe7po8GIhDYSnV5sHZqK5I50fyMpYBgMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOe7po8GIhCxomn5220Cxw5qxslSMQiZMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOS7po8GIhBfzhN4e2NDh04i0WSE8DQXMgFy": EOF
		[4]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOm7po8GIhBi_SoJy-X8l_YTb1tJSUQoMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[4]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOq7po8GIhDB1sxSNWAOPxLu10drmY-qMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOa7po8GIhAFLGbwdq0mEJMKKlNw6OPVMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[74]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOq7po8GIhDB1sxSNWAOPxLu10drmY-qMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[48]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOm7po8GIhCqpTt_zBMwBeMIWjP3zeWxMgFy": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
		[6823]	Post "https://www.google.com/": EOF
		[3]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGN67po8GIhAnazwRp5Ghcn6LC3G8a_aGMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOC7po8GIhAwsxSKNciqO6mOOQQvF-3WMgFy": read tcp [2003:c8:b711:300:c18:348e:7db5:9999]:64652->[2404:6800:4005:813::2004]:443: read: connection reset by peer
		[69]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOe7po8GIhDYSnV5sHZqK5I50fyMpYBgMgFy": EOF
		[118]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EhAkAxjAAAMCwwAAAAAAAAAAGOi7po8GIhCqpnY-DLpHBi22_lq7QGYRMgFy": EOF
		[1]	Post "https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgQNS14rGOq7po8GIhClY2GV2vvM_xTr_6_NRFDaMgFy": EOF

Response time histogram:
  0.638 [1]	|
  2.574 [391]	|======
  4.510 [857]	|============
  6.447 [376]	|=====
  8.383 [1195]	|=================
  10.319 [2424]	|===================================
  12.256 [542]	|========
  14.192 [445]	|======
  16.128 [781]	|===========
  18.065 [245]	|====
  20.001 [2743]	|========================================

This stress tool doesn't seem to help. I would prefer iperf3.

@zonyitoo

This comment has been minimized.

@f4nff
Copy link
Author

f4nff commented Jan 21, 2022

sslocal

1:speedtest before concurrency testing
image

@f4nff
Copy link
Author

f4nff commented Jan 21, 2022

It's not convenient for me to test now, I will retest later

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 25, 2022

Did you got any results? @f4nff Would you consider using another OpenWRT distribution instead of FriendlyWRT?

@f4nff
Copy link
Author

f4nff commented Jan 29, 2022

image

After the concurrent test, the speed is still reduced. After I use https://github.com/xjasonlyu/tun2socks for tcp speed measurement, the speed can still get more than 300M,

@f4nff
Copy link
Author

f4nff commented Jan 29, 2022

However, this version is obviously much faster than before, but after the concurrent udp memory, it still occupies a lot and will not be released. Generally speaking, it is still not as good as https://github.com/xjasonlyu/tun2socks

@f4nff
Copy link
Author

f4nff commented Jan 29, 2022

image

@f4nff
Copy link
Author

f4nff commented Jan 29, 2022

I analyzed it, and what tun2socks-go uses is:
tun -device tun1 -proxy socks5://[::1]:1080 -loglevel silent -stats :9000
ssloacl is using:
sslocal --protocol tun -s "[::1]:1081" -m "none" -k "" --outbound-bind-interface lo --tun-interface-name tun1 -U --udp-timeout 60 - -udp-max-associations 65535
In addition to tun's own protocol stack processing, the handshake protocol is also different. There may be two reasons why sslocal performs tcp concurrent speed measurement and then slows down. There is a problem with the tun protocol stack itself, but the handshake -s "[::1]:1081 The problem of " -m "none" -k "", one uses socks5 and one uses ss://none

The author is reluctant to add socks5 to the handshake protocol, so I can't give more exact comparison data now.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jan 29, 2022

I just couldn't reproduce it locally because I got 500M in speedtest.net (this is the maximum bandwidth of my public network). So I still think there is just something special only happen in your environment. You can try to run iperf / iperf3 on local network to avoid the bottleneck of your outbound network (300M) for reducing variables in your environment.

The current TCP stack still have some performance problems comparing to system's builtin, but it only happens when reaching to 1000M. I will continue working on this problem.

And I don't think the protocol is the problem because R4S should easily run encryption protocols when handling 1000M bandwidth.

On the other hand, the memory consumption problem should already have a conclusion.

BTW, did you have software development background? If so, could you just help to findout what exactly differences between tun2socks and sslocal in your local environment? It is very hard to locate the exact problem if I couldn't reproduce it locally. The optimization I just did is based on imagination and experiences.

@zonyitoo zonyitoo changed the title High memory consumption for UDP associations on Linux (OpenWRT) local-tun TCP stack slow Jan 29, 2022
@zonyitoo zonyitoo changed the title local-tun TCP stack slow High memory consumption for UDP associations on Linux (OpenWRT) Jan 29, 2022
@f4nff
Copy link
Author

f4nff commented Jan 30, 2022

I don't know how to develop, but I have been in charge of testing. I think you'd better add socks5 protocol, so that I can give you accurate test feedback. Besides, socks5 is used very frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants