[HELP] STM32H7 hardfault in lpwork #13614

vladsomai · 2024-09-25T18:36:34Z

Description

Hello everyone,

We are using a STM32H7 based board and recently updated our NuttX version from 9.1.0 to the 12.6.0-RC1 tag.
We are using the TCP/IP unbuffered networking to send data from the STM32H7 MCU to our main controller. When there is alot of traffic between the two, we notice some big delays in replies and after a while the app crashes, on NuttX 9.1.0 we did not have this issue.
When we switch the NuttX 12.6.0 networking to be buffered (CONFIG_NET_TCP_WRITE_BUFFERS), the network behaves better as there are not so many frequent delays in the replies and the app does not crash.

What could be the cause of a hardfault in the lpwork when using unbuffered networking?

The Network Driver buffer configuration is set as follows: CONFIG_NET_RECV_BUFSIZE = 32768 (32kb)
The stack dump on hard fault looks like this when using unbuffered networking (used arm-none-eabi-addr2line to convert all the flash addresses):

0x0809b2bf
tcp_recvhandler
/Nuttx/net/tcp/tcp_recvfrom.c:500

0x080263a7
devif_conn_event
/Nuttx/net/devif/devif_callback.c:521

0x08025c87
tcp_callback
/Nuttx/net/tcp/tcp_callback.c:308

0x08027319
tcp_input
/Nuttx/net/tcp/tcp_input.c:1547

0x08026405
ipv4_in
/Nuttx/net/devif/ipv4_input.c:149

0x080264bf
ipv4_in
/Nuttx/net/devif/ipv4_input.c:153

0x080269cf
netdev_input
/Nuttx/net/netdev/netdev_input.c:91

0x08021dbd
stm32_receive
/Nuttx/arch/arm/src/chip/stm32_ethernet.c:1918

0x08022fe5
up_irq_save
/Nuttx/include/arch/armv7-m/irq.h:416

0x08023a71
nxtask_start
/Nuttx/sched/task/task_start.c:122

Verification

I have verified before submitting the report.

acassis · 2024-09-26T18:00:06Z

Hi @vladsomai is it possible to reproduce this issue in some common STM32 board with Ethernet? like nucleo-f746, etc? Is it possible to reproduce it using some existing net test application existing on nuttx-apps?

If you can reproduce it, please submit a board config that we could use for testing.

If you only can reproduce it on your board, I suggest you doing this test discover when the issue was introduced: grab some release version between 9.1 and 12.6 and copy your boards/arm/stm32h7/boardname to there (and include the 3 entries at boards/Kconfig). Repeat this search process until you discover in which release the issue was introduced, then you can do a quich git bisect to find the commit that introduced the issue.

@wengzhe do you have some idea about this issue?

wengzhe · 2024-09-30T07:04:31Z

@wengzhe do you have some idea about this issue?

@acassis I'll try to take a look, we haven't tried unbuffered tcp with too much traffic before, because we're always using buffered one if we have a lot of data to send.

vladsomai · 2024-09-30T11:18:51Z

Hello @acassis, I just got a nucleo-h743zi board and I will come back soon with a config if I can reproduce it.
Trying multiple NuttX versions is time-consuming because migrating our app to a new version may take a couple of days.

@wengzhe thank you for replying on this thread, We tested the buffered send but we see a lot of "Spurious Retransmission" messages in the WireShark tracing when the packets are usually greater than CONFIG_NET_ETH_PKTSIZE. These retransmissions affect throughput badly.

So to summerize these are the current issues we noticed when stressing the network:

Buffered send has Spurious Retransmissions, affecting network throughput when having high traffic with packets greater than NET_ETH_PKTSIZE.
Unbuffered send does not have Spurious Retransmissions when the STM32H7 based board sends the packets, but the app crashes (with the above stack trace) when a message is sent from the main controller to the STM32H7 based board while traffic comes from the STM32H7 board.

Q1: Did you ever encounter Spurious Retransmissions when using buffered tcp?
Q2: Which board config are you using to test the buffered and unbuffered tcp? We would like to take a look in the config you work with and test the most.

wengzhe · 2024-09-30T15:00:37Z

Q1: Did you ever encounter Spurious Retransmissions when using buffered TCP?

If you encountered a "Spurious Retransmission", maybe the previous ACK is dropped or delayed in some place before sending into tcp_input, likely to be a driver or checksum issue.

Q2: Which board config are you using to test the buffered and unbuffered tcp? We would like to take a look in the config you work with and test the most.

We're not always using configs in the community (normally we use NuttX on our own product with corresponding driver), but we do have some tests on esp32c3-devkit:wifi these days (buffered only), or maybe sim:tcpblaster/qemu-armv8a:netnsh which are independent to any hardware.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HELP] STM32H7 hardfault in lpwork #13614

[HELP] STM32H7 hardfault in lpwork #13614

vladsomai commented Sep 25, 2024

acassis commented Sep 26, 2024 •

edited

Loading

wengzhe commented Sep 30, 2024

vladsomai commented Sep 30, 2024

wengzhe commented Sep 30, 2024

[HELP] STM32H7 hardfault in lpwork #13614

[HELP] STM32H7 hardfault in lpwork #13614

Comments

vladsomai commented Sep 25, 2024

Description

Verification

acassis commented Sep 26, 2024 • edited Loading

wengzhe commented Sep 30, 2024

vladsomai commented Sep 30, 2024

wengzhe commented Sep 30, 2024

acassis commented Sep 26, 2024 •

edited

Loading