segmentation fault on latest release on high speed traffic ... #41

ulysse31 · 2024-07-26T09:08:56Z

Hello,

I'm using SELKS project docker install, which is based on docker image jasonish/suricata:master-amd64.
I have two nodes running the same install (SELKS).
I updated today both instances (all docker ocntainers, including suricata), and for some unknown reason, one of the two had suricata container crashing in loop (after around a min run).
I firstly though on a SELKS issue, potentially related on rule generation ... but even after wiping all containers / image / volumes / data ... the suricata container still crash loop with a segmentation fault...

[Fri Jul 26 10:44:31 2024] W#06-bond1[78735]: segfault at 0 ip 00000000009349a9 sp 00007f853fffc270 error 4 in suricata[4d4000+637000] likely on CPU 22 (core 14, socket 0)
[Fri Jul 26 10:44:31 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

you'll find an output of the docker log suricata -f

suricata_docker_output.txt

Last lines being :

Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598]
Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598]
Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598]
Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598]
Notice: threads: Threads created -> W: 64 FM: 1 FR: 1 Engine started. [TmThreadWaitOnThreadRunning:tm-threads.c:1905]

And after that, comes the dmesg segmentation error, and the container crash then boot loop ...

The only difference between the two servers, is that one is using a bonding interface to listen to (bond1), and the other one, listens directly to a physical one ...
So from what I see, it can be either something related to the recent update on the suricata image (11hours ago), or potentially a hw issue ? but that seems unlikely because there is no error message on host and on switch ...

Is there a possibility that the latest version would have issues on 10Gbit interface bondigs ?
Do you have any additional debug that would give more hints ?
Thanks a lot.

ulysse31 · 2024-07-26T12:08:28Z

UPDATE:

I was thinking that it may be related to bonding ...
But it seems that it does also segmentation fault on the other "interface direct" server :

[Fri Jul 26 06:24:41 2024] W#09-eno2np1[3532764]: segfault at 0 ip 00000000009349a9 sp 00007f1f5fffc270 error 4 in suricata[4d4000+637000] likely on CPU 4 (core 1, socket 0)
[Fri Jul 26 06:24:41 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

This one is New york time zone (the other one is Paris timezone)
So it segmentation fault on both ... but the big difference is potentially on the bandwidth: one is a single 10Gbps interface, the other one is a bonding of 2 10Gbps interface, because of the traffic volume.
So, to reformulate, the latest version of docker suricata, seems to segmentation fault on High traffic (average 20MBytes/s on bond1)
The other one in New York is right now arround 2/3Mbytes/s (low activity / early morning)

ulysse31 · 2024-07-26T12:12:37Z

UPDATE2:

Confirmed after traffic waking up in New York ...

[Fri Jul 26 08:10:34 2024] W#31-eno2np1[3671915]: segfault at 0 ip 00000000009349a9 sp 00007f31ad4f1270 error 4 in suricata[4d4000+637000] likely on CPU 6 (core 6, socket 0)
[Fri Jul 26 08:10:34 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

Seems that docker suricata no longer support high traffic and crashes on high traffic ...

ulysse31 · 2024-07-26T12:48:03Z

UPDATE3:

Updated the title, since I can now confirm that the segmentation fault / crash appear starting from a certain traffic activity on both of my test systems ...
I've tried master-amd64, master-profiling, master ... they all do the same segmentation fault crash loop on high traffic ...

ulysse31 changed the title ~~segmentation fault on latest release of master-amd64~~ segmentation fault on latest release on high speed traffic ... Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segmentation fault on latest release on high speed traffic ... #41

segmentation fault on latest release on high speed traffic ... #41

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024 •

edited

Loading

segmentation fault on latest release on high speed traffic ... #41

segmentation fault on latest release on high speed traffic ... #41

Comments

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024

ulysse31 commented Jul 26, 2024 • edited Loading

ulysse31 commented Jul 26, 2024 •

edited

Loading