Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when bind() fails on transparent connection #355

Closed
gilcu3 opened this issue Sep 21, 2022 · 7 comments
Closed

segfault when bind() fails on transparent connection #355

gilcu3 opened this issue Sep 21, 2022 · 7 comments

Comments

@gilcu3
Copy link

gilcu3 commented Sep 21, 2022

Once every few days my systemd sslh-select service fails with the following error:
sslh-select v1.22c

Sep 21 07:27:08 archlinux-4 sslh-select[105145]: too many open file descriptor to monitor them all -- dropping connection
Sep 21 07:27:08 archlinux-4 sslh-select[105145]: too many open file descriptor to monitor them all -- dropping connection
Sep 21 07:27:08 archlinux-4 sslh-select[105145]: *** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Main process exited, code=dumped, status=6/ABRT
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Failed with result 'core-dump'.
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Consumed 1min 18.184s CPU time.

@yrutschle
Copy link
Owner

This would happen when there is a large number of concurrent connection, and that number goes above FD_SETSIZE (usually 1024). sslh-select will have no other choice than dropping connections in that case (sslh-ev should have no such limitation).

However, the segmentation fault that follows is not normal. I think I did test large number of connections. Could you have a look at the FAQ on how to gather information on segmentation faults and report back the stack trace and line where the error happens?

@gilcu3
Copy link
Author

gilcu3 commented Sep 22, 2022

After a long time trying to make valgring work, I finally managed to do it.
In archlinux the command needed (just for reference):
sudo DEBUGINFOD_URLS="https://debuginfod.archlinux.org" G_SLICE=always-malloc valgrind --leak-check=full ./sslh-select --config=/etc/sslh.cfg --foreground

Up to now, the logs show:

==1391== Memcheck, a memory error detector
==1391== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1391== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==1391== Command: ./sslh-select --config=/etc/sslh.cfg --foreground
==1391== 
==1391== Conditional jump or move depends on uninitialised value(s)
==1391==    at 0x10E92F: watchers_add_read (sslh-select.c:67)
==1391==    by 0x10E92F: watchers_init (sslh-select.c:59)
==1391==    by 0x10E92F: main_loop (sslh-select.c:134)
==1391==    by 0x10DB6D: main (sslh-main.c:285)
==1391== 
matching [h2] with [bep/1.0]
Unknown ALPN name: h2
matching [http/1.1] with [bep/1.0]
Unknown ALPN name: http/1.1
matching [bep/1.0] with [bep/1.0]
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [h2] with [bep/1.0]
Unknown ALPN name: h2
matching [http/1.1] with [bep/1.0]
Unknown ALPN name: http/1.1
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use

I will update as soon as it crashes.

utoni added a commit to utoni/sslh that referenced this issue Sep 22, 2022
==1391== Conditional jump or move depends on uninitialised value(s)
==1391==    at 0x10E92F: watchers_add_read (sslh-select.c:67)
==1391==    by 0x10E92F: watchers_init (sslh-select.c:59)
==1391==    by 0x10E92F: main_loop (sslh-select.c:134)
==1391==    by 0x10DB6D: main (sslh-main.c:285)

Signed-off-by: Toni Uhlig <[email protected]>
yrutschle added a commit that referenced this issue Sep 22, 2022
Fixes unitialised memory access as seen in issue #355.
@gilcu3
Copy link
Author

gilcu3 commented Sep 23, 2022

I have not gotten any crashes till now, and I can confirm after #356 the warning at the beginning has disappeared.

I am starting to think that the crash and too many files open issue was related to #352, aka using transparent mode for a tls connection with alpn filter (or maybe specific to syncthing). Does it make sense?

@yrutschle
Copy link
Owner

Not really -- there was a bug whereby max_fd was not initialised, which means my comment above was wrong; utoni's patch will have fixed that, so I expect this issue is closed. I expect #352 is something else.

@gilcu3
Copy link
Author

gilcu3 commented Oct 2, 2022

Finally I got a crash:

[...]
matching [bep/1.0] with [bep/1.0]                                                                                      
common.c:281:bind:98:Address already in use                                                                            
common.c:351:bind_peer:98:Address already in use
matching [h2] with [bep/1.0]             
Unknown ALPN name: h2                                                                                                  
matching [http/1.1] with [bep/1.0]                                                                                     
Unknown ALPN name: http/1.1                         
matching [bep/1.0] with [bep/1.0]                                                                                      
common.c:281:bind:98:Address already in use                                                                            
common.c:351:bind_peer:98:Address already in use                                                                       
==55677== Invalid read of size 4                                                                                       
==55677==    at 0x10F905: active_queue (tcp-listener.c:73)                                                             
==55677==    by 0x10F905: cnx_write_process (tcp-listener.c:310)                                                       
==55677==    by 0x10EBBF: main_loop (sslh-select.c:172)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)   
==55677==  Address 0x20 is not stack'd, malloc'd or (recently) free'd                                                  
==55677==                                                                                                              
==55677==                                                                                                              
==55677== Process terminating with default action of signal 11 (SIGSEGV)                                               
==55677==  Access not within mapped region at address 0x20                                                             
==55677==    at 0x10F905: active_queue (tcp-listener.c:73)
==55677==    by 0x10F905: cnx_write_process (tcp-listener.c:310)                                                       
==55677==    by 0x10EBBF: main_loop (sslh-select.c:172)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)    
==55677==  If you believe this happened as a result of a stack                                                         
==55677==  overflow in your program's main thread (unlikely but                                                        
==55677==  possible), you can try to increase the size of the                                                          
==55677==  main thread stack using the --main-stacksize= flag.                                                         
==55677==  The main thread stack size used in this run was 8388608.                                                    
==55677==                  
==55677== HEAP SUMMARY:                                  
==55677==     in use at exit: 297,184 bytes in 1,171 blocks                                                            
==55677==   total heap usage: 11,809,789 allocs, 11,808,618 frees, 34,321,275,086 bytes allocated                      
==55677==                                               
==55677== 72 bytes in 3 blocks are possibly lost in loss record 28 of 101                                              
==55677==    at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x4BFFB96: tsearch (tsearch.c:337)                                                                     
==55677==    by 0x4B3BE6A: __add_to_environ (setenv.c:233)
==55677==    by 0x484D37F: setenv (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x4AF39C1: setproctitle_init (in /usr/lib/libbsd.so.0.11.6)                                            
==55677==    by 0x10D7FD: main (sslh-main.c:224)                                                                       
==55677==                                                                                                              
==55677== 264 bytes in 1 blocks are definitely lost in loss record 81 of 101
==55677==    at 0x4841798: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                                                                                                                                            [0/1926]
==55677==    by 0x49F68E0: libconfig_strvec_append (in /usr/lib/libconfig.so.11.1.0)                                   
==55677==    by 0x49F3561: ??? (in /usr/lib/libconfig.so.11.1.0)                                                       
==55677==    by 0x49F3724: config_read_file (in /usr/lib/libconfig.so.11.1.0)                                          
==55677==    by 0x115081: c2s_parse_file (sslh-conf.c:2068)                                                            
==55677==    by 0x115081: sslhcfg_cl_parse (sslh-conf.c:2228)                                                          
==55677==    by 0x10D81F: main (sslh-main.c:228)
==55677==                         
==55677== 1,695 (64 direct, 1,631 indirect) bytes in 1 blocks are definitely lost in loss record 94 of 101             
==55677==    at 0x4846A73: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x49F34CD: config_clear (in /usr/lib/libconfig.so.11.1.0)                                              
==55677==    by 0x49F3518: ??? (in /usr/lib/libconfig.so.11.1.0)                                                       
==55677==    by 0x49F3724: config_read_file (in /usr/lib/libconfig.so.11.1.0)                                          
==55677==    by 0x115081: c2s_parse_file (sslh-conf.c:2068)                                                            
==55677==    by 0x115081: sslhcfg_cl_parse (sslh-conf.c:2228)                                                          
==55677==    by 0x10D81F: main (sslh-main.c:228)
==55677==                                
==55677== 4,608 bytes in 2 blocks are possibly lost in loss record 98 of 101                                           
==55677==    at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x400444B: malloc (rtld-malloc.h:56)
==55677==    by 0x400444B: _dlfo_mappings_segment_allocate (dl-find_object.c:217)                                      
==55677==    by 0x400444B: _dl_find_object_update_1 (dl-find_object.c:671)                                             
==55677==    by 0x400444B: _dl_find_object_update (dl-find_object.c:805)                                               
==55677==    by 0x400BF97: dl_open_worker_begin (dl-open.c:735)                                                        
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x400B4FA: dl_open_worker (dl-open.c:782)                                                              
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x400B8EC: _dl_open (dl-open.c:886)
==55677==    by 0x4C4E1B0: do_dlopen (dl-libc.c:95)                                                                    
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x4C4DE82: _dl_catch_error (dl-error-skeleton.c:227)                                                   
==55677==    by 0x4C4E120: dlerror_run (dl-libc.c:45)                                                                  
==55677==    by 0x4C4E2EE: __libc_dlopen_mode (dl-libc.c:162)                                                          
==55677==                                                 
==55677== 243,775 bytes in 971 blocks are definitely lost in loss record 101 of 101                                    
==55677==    at 0x4841798: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x10FE48: defer_write (common.c:380)
==55677==    by 0x11349D: probe_client_protocol (tcp-probe.c:47)                                                       
==55677==    by 0x10F347: probing_read_process (tcp-listener.c:271)                                                    
==55677==    by 0x10ECAD: main_loop (sslh-select.c:197)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)                                                                       
==55677==                                                                                                              
==55677== LEAK SUMMARY:    
==55677==    definitely lost: 244,103 bytes in 973 blocks
==55677==    indirectly lost: 1,631 bytes in 26 blocks                                                                 
==55677==      possibly lost: 4,680 bytes in 5 blocks                                                                  
==55677==    still reachable: 46,770 bytes in 167 blocks
==55677==         suppressed: 0 bytes in 0 blocks                                                                      
==55677== Reachable blocks (those to which a pointer was found) are not shown.                                         
==55677== To see them, rerun with: --leak-check=full --show-leak-kinds=all                                             
==55677==                                                 
==55677== For lists of detected and suppressed errors, rerun with: -s                                                  
==55677== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)                                               
==55677== could not unlink /tmp/vgdb-pipe-from-vgdb-to-55677-by-root-on-???                                            
==55677== could not unlink /tmp/vgdb-pipe-to-vgdb-from-55677-by-root-on-???                                            
==55677== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-55677-by-root-on-???                                         
fish: Job 1, 'sudo DEBUGINFOD_URLS="https://d…' terminated by signal SIGSEGV (Address boundary error)

@yrutschle yrutschle changed the title too many open file descriptor to monitor them all -- dropping connection segfault when bind() fails on transparent connection Oct 30, 2022
@yrutschle
Copy link
Owner

renamed the title to fit where we're at.

@yrutschle
Copy link
Owner

@gilcu3 the commit I just pushed closes an execution path that would result in a segfault. It seems "unlikely", but I guess over time it is "guaranteed to happen" :-) Please re-open if you get another crash!

yrutschle added a commit that referenced this issue Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants