segfault when bind() fails on transparent connection #355

gilcu3 · 2022-09-21T08:09:49Z

Once every few days my systemd sslh-select service fails with the following error:
sslh-select v1.22c

Sep 21 07:27:08 archlinux-4 sslh-select[105145]: too many open file descriptor to monitor them all -- dropping connection
Sep 21 07:27:08 archlinux-4 sslh-select[105145]: too many open file descriptor to monitor them all -- dropping connection
Sep 21 07:27:08 archlinux-4 sslh-select[105145]: *** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Main process exited, code=dumped, status=6/ABRT
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Failed with result 'core-dump'.
Sep 21 07:27:08 archlinux-4 systemd[1]: sslh-select.service: Consumed 1min 18.184s CPU time.

The text was updated successfully, but these errors were encountered:

yrutschle · 2022-09-21T18:39:30Z

This would happen when there is a large number of concurrent connection, and that number goes above FD_SETSIZE (usually 1024). sslh-select will have no other choice than dropping connections in that case (sslh-ev should have no such limitation).

However, the segmentation fault that follows is not normal. I think I did test large number of connections. Could you have a look at the FAQ on how to gather information on segmentation faults and report back the stack trace and line where the error happens?

gilcu3 · 2022-09-22T07:54:34Z

After a long time trying to make valgring work, I finally managed to do it.
In archlinux the command needed (just for reference):
sudo DEBUGINFOD_URLS="https://debuginfod.archlinux.org" G_SLICE=always-malloc valgrind --leak-check=full ./sslh-select --config=/etc/sslh.cfg --foreground

Up to now, the logs show:

==1391== Memcheck, a memory error detector
==1391== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1391== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==1391== Command: ./sslh-select --config=/etc/sslh.cfg --foreground
==1391== 
==1391== Conditional jump or move depends on uninitialised value(s)
==1391==    at 0x10E92F: watchers_add_read (sslh-select.c:67)
==1391==    by 0x10E92F: watchers_init (sslh-select.c:59)
==1391==    by 0x10E92F: main_loop (sslh-select.c:134)
==1391==    by 0x10DB6D: main (sslh-main.c:285)
==1391== 
matching [h2] with [bep/1.0]
Unknown ALPN name: h2
matching [http/1.1] with [bep/1.0]
Unknown ALPN name: http/1.1
matching [bep/1.0] with [bep/1.0]
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [h2] with [bep/1.0]
Unknown ALPN name: h2
matching [http/1.1] with [bep/1.0]
Unknown ALPN name: http/1.1
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use
matching [bep/1.0] with [bep/1.0]
common.c:281:bind:98:Address already in use
common.c:351:bind_peer:98:Address already in use

I will update as soon as it crashes.

==1391== Conditional jump or move depends on uninitialised value(s) ==1391== at 0x10E92F: watchers_add_read (sslh-select.c:67) ==1391== by 0x10E92F: watchers_init (sslh-select.c:59) ==1391== by 0x10E92F: main_loop (sslh-select.c:134) ==1391== by 0x10DB6D: main (sslh-main.c:285) Signed-off-by: Toni Uhlig <[email protected]>

Fixes unitialised memory access as seen in issue #355.

gilcu3 · 2022-09-23T07:51:39Z

I have not gotten any crashes till now, and I can confirm after #356 the warning at the beginning has disappeared.

I am starting to think that the crash and too many files open issue was related to #352, aka using transparent mode for a tls connection with alpn filter (or maybe specific to syncthing). Does it make sense?

yrutschle · 2022-09-23T16:48:54Z

Not really -- there was a bug whereby max_fd was not initialised, which means my comment above was wrong; utoni's patch will have fixed that, so I expect this issue is closed. I expect #352 is something else.

gilcu3 · 2022-10-02T16:29:10Z

Finally I got a crash:

[...]
matching [bep/1.0] with [bep/1.0]                                                                                      
common.c:281:bind:98:Address already in use                                                                            
common.c:351:bind_peer:98:Address already in use
matching [h2] with [bep/1.0]             
Unknown ALPN name: h2                                                                                                  
matching [http/1.1] with [bep/1.0]                                                                                     
Unknown ALPN name: http/1.1                         
matching [bep/1.0] with [bep/1.0]                                                                                      
common.c:281:bind:98:Address already in use                                                                            
common.c:351:bind_peer:98:Address already in use                                                                       
==55677== Invalid read of size 4                                                                                       
==55677==    at 0x10F905: active_queue (tcp-listener.c:73)                                                             
==55677==    by 0x10F905: cnx_write_process (tcp-listener.c:310)                                                       
==55677==    by 0x10EBBF: main_loop (sslh-select.c:172)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)   
==55677==  Address 0x20 is not stack'd, malloc'd or (recently) free'd                                                  
==55677==                                                                                                              
==55677==                                                                                                              
==55677== Process terminating with default action of signal 11 (SIGSEGV)                                               
==55677==  Access not within mapped region at address 0x20                                                             
==55677==    at 0x10F905: active_queue (tcp-listener.c:73)
==55677==    by 0x10F905: cnx_write_process (tcp-listener.c:310)                                                       
==55677==    by 0x10EBBF: main_loop (sslh-select.c:172)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)    
==55677==  If you believe this happened as a result of a stack                                                         
==55677==  overflow in your program's main thread (unlikely but                                                        
==55677==  possible), you can try to increase the size of the                                                          
==55677==  main thread stack using the --main-stacksize= flag.                                                         
==55677==  The main thread stack size used in this run was 8388608.                                                    
==55677==                  
==55677== HEAP SUMMARY:                                  
==55677==     in use at exit: 297,184 bytes in 1,171 blocks                                                            
==55677==   total heap usage: 11,809,789 allocs, 11,808,618 frees, 34,321,275,086 bytes allocated                      
==55677==                                               
==55677== 72 bytes in 3 blocks are possibly lost in loss record 28 of 101                                              
==55677==    at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x4BFFB96: tsearch (tsearch.c:337)                                                                     
==55677==    by 0x4B3BE6A: __add_to_environ (setenv.c:233)
==55677==    by 0x484D37F: setenv (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x4AF39C1: setproctitle_init (in /usr/lib/libbsd.so.0.11.6)                                            
==55677==    by 0x10D7FD: main (sslh-main.c:224)                                                                       
==55677==                                                                                                              
==55677== 264 bytes in 1 blocks are definitely lost in loss record 81 of 101
==55677==    at 0x4841798: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                                                                                                                                            [0/1926]
==55677==    by 0x49F68E0: libconfig_strvec_append (in /usr/lib/libconfig.so.11.1.0)                                   
==55677==    by 0x49F3561: ??? (in /usr/lib/libconfig.so.11.1.0)                                                       
==55677==    by 0x49F3724: config_read_file (in /usr/lib/libconfig.so.11.1.0)                                          
==55677==    by 0x115081: c2s_parse_file (sslh-conf.c:2068)                                                            
==55677==    by 0x115081: sslhcfg_cl_parse (sslh-conf.c:2228)                                                          
==55677==    by 0x10D81F: main (sslh-main.c:228)
==55677==                         
==55677== 1,695 (64 direct, 1,631 indirect) bytes in 1 blocks are definitely lost in loss record 94 of 101             
==55677==    at 0x4846A73: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x49F34CD: config_clear (in /usr/lib/libconfig.so.11.1.0)                                              
==55677==    by 0x49F3518: ??? (in /usr/lib/libconfig.so.11.1.0)                                                       
==55677==    by 0x49F3724: config_read_file (in /usr/lib/libconfig.so.11.1.0)                                          
==55677==    by 0x115081: c2s_parse_file (sslh-conf.c:2068)                                                            
==55677==    by 0x115081: sslhcfg_cl_parse (sslh-conf.c:2228)                                                          
==55677==    by 0x10D81F: main (sslh-main.c:228)
==55677==                                
==55677== 4,608 bytes in 2 blocks are possibly lost in loss record 98 of 101                                           
==55677==    at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x400444B: malloc (rtld-malloc.h:56)
==55677==    by 0x400444B: _dlfo_mappings_segment_allocate (dl-find_object.c:217)                                      
==55677==    by 0x400444B: _dl_find_object_update_1 (dl-find_object.c:671)                                             
==55677==    by 0x400444B: _dl_find_object_update (dl-find_object.c:805)                                               
==55677==    by 0x400BF97: dl_open_worker_begin (dl-open.c:735)                                                        
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x400B4FA: dl_open_worker (dl-open.c:782)                                                              
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x400B8EC: _dl_open (dl-open.c:886)
==55677==    by 0x4C4E1B0: do_dlopen (dl-libc.c:95)                                                                    
==55677==    by 0x4C4DDCD: _dl_catch_exception (dl-error-skeleton.c:208)                                               
==55677==    by 0x4C4DE82: _dl_catch_error (dl-error-skeleton.c:227)                                                   
==55677==    by 0x4C4E120: dlerror_run (dl-libc.c:45)                                                                  
==55677==    by 0x4C4E2EE: __libc_dlopen_mode (dl-libc.c:162)                                                          
==55677==                                                 
==55677== 243,775 bytes in 971 blocks are definitely lost in loss record 101 of 101                                    
==55677==    at 0x4841798: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)                             
==55677==    by 0x10FE48: defer_write (common.c:380)
==55677==    by 0x11349D: probe_client_protocol (tcp-probe.c:47)                                                       
==55677==    by 0x10F347: probing_read_process (tcp-listener.c:271)                                                    
==55677==    by 0x10ECAD: main_loop (sslh-select.c:197)                                                                
==55677==    by 0x10DB6D: main (sslh-main.c:285)                                                                       
==55677==                                                                                                              
==55677== LEAK SUMMARY:    
==55677==    definitely lost: 244,103 bytes in 973 blocks
==55677==    indirectly lost: 1,631 bytes in 26 blocks                                                                 
==55677==      possibly lost: 4,680 bytes in 5 blocks                                                                  
==55677==    still reachable: 46,770 bytes in 167 blocks
==55677==         suppressed: 0 bytes in 0 blocks                                                                      
==55677== Reachable blocks (those to which a pointer was found) are not shown.                                         
==55677== To see them, rerun with: --leak-check=full --show-leak-kinds=all                                             
==55677==                                                 
==55677== For lists of detected and suppressed errors, rerun with: -s                                                  
==55677== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)                                               
==55677== could not unlink /tmp/vgdb-pipe-from-vgdb-to-55677-by-root-on-???                                            
==55677== could not unlink /tmp/vgdb-pipe-to-vgdb-from-55677-by-root-on-???                                            
==55677== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-55677-by-root-on-???                                         
fish: Job 1, 'sudo DEBUGINFOD_URLS="https://d…' terminated by signal SIGSEGV (Address boundary error)

yrutschle · 2022-10-30T13:15:17Z

renamed the title to fit where we're at.

yrutschle · 2022-11-03T16:25:59Z

@gilcu3 the commit I just pushed closes an execution path that would result in a segfault. It seems "unlikely", but I guess over time it is "guaranteed to happen" :-) Please re-open if you get another crash!

…ctivity on both file descriptors (fix #355)

yrutschle added a commit that referenced this issue Sep 22, 2022

Merge pull request #356 from utoni/fix/watcher-unitialised-value

7e3f723

Fixes unitialised memory access as seen in issue #355.

yrutschle changed the title ~~too many open file descriptor to monitor them all -- dropping connection~~ segfault when bind() fails on transparent connection Oct 30, 2022

yrutschle closed this as completed in 02573eb Nov 3, 2022

yrutschle added a commit that referenced this issue Nov 9, 2022

defensive programming in case connections get tidied while there is a…

555717e

…ctivity on both file descriptors (fix #355)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault when bind() fails on transparent connection #355

segfault when bind() fails on transparent connection #355

gilcu3 commented Sep 21, 2022 •

edited

Loading

yrutschle commented Sep 21, 2022

gilcu3 commented Sep 22, 2022 •

edited

Loading

gilcu3 commented Sep 23, 2022

yrutschle commented Sep 23, 2022

gilcu3 commented Oct 2, 2022

yrutschle commented Oct 30, 2022

yrutschle commented Nov 3, 2022

segfault when bind() fails on transparent connection #355

segfault when bind() fails on transparent connection #355

Comments

gilcu3 commented Sep 21, 2022 • edited Loading

yrutschle commented Sep 21, 2022

gilcu3 commented Sep 22, 2022 • edited Loading

gilcu3 commented Sep 23, 2022

yrutschle commented Sep 23, 2022

gilcu3 commented Oct 2, 2022

yrutschle commented Oct 30, 2022

yrutschle commented Nov 3, 2022

gilcu3 commented Sep 21, 2022 •

edited

Loading

gilcu3 commented Sep 22, 2022 •

edited

Loading