Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one more time: initialisation problems #164

Open
DrMarkusKrug opened this issue Jan 15, 2025 · 3 comments
Open

one more time: initialisation problems #164

DrMarkusKrug opened this issue Jan 15, 2025 · 3 comments

Comments

@DrMarkusKrug
Copy link

Dear all,
I was already writing about strange problems with the memory handling in the uros library. I manged a workaround a couple of months ago - but came across the same problem again.
So this should be the first post of a series where I want to solve the problem systematically (if I'm able to). Everybody is welcome to join and contribute to this series.

From my perspective there are at least two severe problems with the uros library:
1.) Wrong memory allocation/free treatment
2.) Manipulating the ISR settings of the application inside the library with no documentation about it.

Some words for the first problem. I'm working with a STM32L4 device that has not much memory resources. I guess some others that work on better equipped STM32 devices might have the same problem - but haven't realized that so far because the comparable large memory is covering the issue. However, I'm deeply concerned that the issue is quite serious (writing into memory spaces that are already used by something else). So it might be worth that even if your STM32 uros application is running to observe the stack usage of the tasks that deal with uros.
Additional to that I realized a strange behaviour when rclc_support_init() fails. It tooks on my device about 12sec until the function call comes back. So there must be a lot of delay() calls somewhere in the uros library - but why?

I want to point to something else that triggers me to be suspicious. In my application (currently not running because of uros initialization problems) I see the following in the memory map:
Screenshot from 2025-01-15 17-15-12

So these custom_xxx variables are consuming a lot of RAM - unusual high. Whatever those data structures are doing - it is not really meaningful for embedded systems. However, that is not explaining the intialization trouble I have because these variables are allocated statically and therefore checked during the build process.
Finally, my libmicroros.a file is about 32MByte. I never saw a library file that is close to that size. Yes, I know that shouldn't be a big thing because the linker is chosing the needed parts only. However, I ask myself why is this library file so unusual big.

Some words for the second problem. I was just trying to use a simple periodic interrupt in the dma_transport functions. This is because I think it might be a chance to work around the memory treatment of the uros library by initializing uros outside of a task. To stay compatible with the uros provided code I add something to the transport functions like this:
image
You can see that I check if the kernel is active or not and use osDelay or a custom delay loop (that is using a periodic interrupt for each millisecond). However, this interrupt is never triggered. If I call this custom delay before doing anything with the uros library it works perfectly fine. So it looks to me that the uros library is manipulating the ISR settings of my application. Very disappointing I have to say.
From this screenshot:
image
you can see the the function calling stack contains no isr at all - nevertheless, one of the listed uros functions is manipulating the ISR settings - otherwise the variable custom_delay_flag will be set to true in the TIM6 ISR (that is never triggered). So I hang in an endless loop. HAL_Delay() isn't a good alternative because it gets in conflict with the osDelay() - but I will work on this tomorrow to have a closer look.

I will continue as soon as I can report something new.

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Jan 15, 2025

Hello @DrMarkusKrug,

Some comments regarding your assertions:

Wrong memory allocation/free treatment

micro-ROS is designed as a port of the ROS 2 stack to MCUs, as you might know. This means that above the RMW layer, we use ROS 2 packages. Below the RMW layer (the rmw package itself and the XRCE-DDS middleware), dynamic memory is avoided by default (unless you configure it otherwise). In summary, the memory allocation problem you mention may lie in a ROS 2 package, or it may stem from static memory allocation. As far as I know, there is no known memory error in the micro-ROS stack, hopefully this also apply to this specific packet, unless some STM32-specific code is messing around.

However, I'm deeply concerned that the issue is quite serious (writing into memory spaces that are already used by something else). So it might be worth checking, even if your STM32 micro-ROS application is running, how the stack usage behaves for the tasks dealing with micro-ROS.

Usually, embedded development requires carefully monitoring the stack usage of your spawned tasks. micro-ROS is indeed stack-hungry. It wouldn’t be the first time I’ve seen stack smashing caused by a poorly dimensioned stack size in micro-ROS. Please take a look. Returning to the previous point, if there is “memory smashing” beyond stack overflows, it could be in a ROS 2 package (and would likely be reported on more platforms), or in a set of layers that are statically allocated.

Beyond that, I have personally developed numerous micro-ROS applications on multiple platforms (though honestly, STM32 Cube IDE is not my favorite), ranging from very simple publishers to complex micro-ROS architectures. I haven’t encountered any major memory bugs in the micro-ROS software stack. That doesn’t mean the stack is bug-free, of course, but it does mean that in the years I’ve been working with micro-ROS, all known memory concerns have been addressed as far as we know.

So it might be worth that even if your STM32 uros application is running to observe the stack usage of the tasks that deal with uros.

Beware of the tasks (in plural) that deal with uros, because micro-ROS is a single threaded library, that by default is not thread-safe. So using micro-ROS from taskS can be problematic.

Additionally, I noticed strange behavior when rclc_support_init() fails. It takes about 12 seconds for the function call to return on my device. So there must be a lot of delay() calls somewhere in the micro-ROS library — but why?

This isn’t actually strange behavior. Calling rclc_support_init() triggers the XRCE-DDS Client-to-Agent session initialization, which has a maximum number of attempts and a certain time window for each attempt:
https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/0301e0dc2312a908b732c3d296d74563a385da35/CMakeLists.txt#L57-L58

By default, there are 10 attempts, each with a 1000 ms timeout, which is roughly 10 seconds total. This can be configured in the colcon.meta file. In fact, instructions on how to create applications that “reconnect” to the agent are documented here:
https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/handle_reconnections/handle_reconnections.html

The delay followed by an error code usually means that your micro-ROS client is not reaching the micro-ROS Agent. Which 99% of the time is due to transport not communicating client and agent, and this ends in an error code it rclc_support_init(), which cannot "init the micro-ROS support service (aka middleware, or client-agent connection)".

I want to point to something else that makes me suspicious. In my application (currently not running because of micro-ROS initialization problems), I see the following in the memory map:

Once again, there are static memory pools whose sizes are fully configurable in the colcon.meta file:
https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/memory_management/memory_management.html#entity-creation

So these custom_xxx variables are consuming a lot of RAM — unusually high.

You can reduce them to fit your platform’s constraints.

Finally, my libmicroros.a file is about 32 MB. I’ve never seen a library file that large. Yes, I know it shouldn’t be a big issue because the linker only chooses the necessary parts, but I still wonder why this library file is so unusually big.

We’re aware of this, and it’s intentional. Unless your computer cannot store a 32 MB .a file, this shouldn’t be a practical issue. As you mentioned, link-time optimization will include only the objects and methods that your application actually uses. Unfortunately, we don’t know which among the hundreds of ROS 2 message definitions and core packages a user’s application will need, so we chose to include most of them to cover most use cases in the most straightforward way. We made this decision fully aware that the ~30 MB static library will shrink to tens of kilobytes once the user’s application is linked. More than enough to fit in modern MCU code storages, and that even can be built with [size optimization (-Os) to fit in smaller platforms].(https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/custom_platforms/custom_platforms.html)

Very disappointing, I have to say.

Lastly, regarding your second issue: embedded development is often tricky, obscure, and highly platform-dependent. We on the micro-ROS team try to provide an open and free solution for as many relevant platforms as possible. However, these ports don’t cover every single use case for every single transport and configuration. Most of the time, they require fine-tuning, adaptations, and user-side configurations. Personally, I understand that it can be frustrating, but we cannot provide an out-of-the-box solution for everything, as our team is very very very resource-limited.

So one more time, I hope that some of my comments helps, and please if you find proper solutions, do not hesitate to contribute back with bug fixes or documentation, that will help a lot.

@DrMarkusKrug
Copy link
Author

Hello everyone, hello Pablo,

I might sound very angry in my last post - however, I'm not. I just want to report what happens on my computer with my application. I do this first of all to get help/hints - so thank you for all your comments and suggestions. I really appreciate. Second, I want to archive what happens to me for others that will run into similar problems in the future. From the list of reported issues I get the impression that the problems I reported yesterday (memory management and interrupt issues) happend to others as well - so I think its worth to get the root cause.

Anyway, short update from my debugging journey.
At the first screenshot you can see that the basepri register on the right hand side (orange colored) is 0x00 (all interrupt levels allowed). This is right before the call of rclc_support_init() function.

Image

When the function starts to send the first message (after calling a number of other functions) it looks like this:

Image
So at some point in time it was changed to 0x50 -> all interrupts between level 5 and 15 are blocked. The consequences are:
No Systick update interrupt (that is usually on level 15), no DMA interrupt update (that is usually between 5-15, but not used by microros anyway). If there is no Systick update the osDelay() function is not working. If there is no DMA interrupt the content of uart->gState will not change to HAL_UART_STATE_READY (that actually needs an additional ISR in the ST HAL software driver). Hence, we run into a deadlock situation.
Question: why is that? I can only speculate here. I speculate that it's because (to my knowledge) malloc/free is not reentrant by default. So the interrupt level is raised to try to avoid this. Second speculation: Why is the level is set to 0x50 - maybe because in the standard FreeRTOS configuration of STM32CubeIDE all FreeRTOS aware interrupts are running from this level downwards (so towards higher numbers). However, the STM32CubeIDE/CubeMX allows you to set an option that will use a reentrant version of this crucial functions (by using a header that wraps around the standard lib functions). Actually the ST people point you very penetrant to this by issuing a warning all the time when you are generating code. So, if my speculation is correct - there is no need to do this if you are using STM32CubeIDE/CubeMX.

At the moment I cannot find out which function is changing the basepri register because I build the library with the recommended docker commands. Can someone point me to the sources that are used for the library so that I can have a look on them?

But there is also 'good news'. Right before the buffer transmit I took the following screenshot. I'm not to sure about the frame format of XRCE messages (shouldn't they start with 0xFF?) but at least I can read some of the content that seems to make sense (buf[12] onwards).

Image
I keep you updated
Markus

@DrMarkusKrug
Copy link
Author

DrMarkusKrug commented Jan 26, 2025

Dear all,

I like to continue in reporting about my debugging.

  1. I moved all node, publisher, subscriber initialization routines to a function that is called before FreeRTOS starts and therefore is running outside the task heap memory. Additionally I changed all the allocator routines to standard malloc, free, ... . At the moment I have no time in debugging the exact reason why the task heap is getting destroyed by these microROS init functions. Even if I allow the corresponding task to consum 12kBytes it didn't work. I couldn't go beyond 12kByte becaue more is not available on the microcontroller I'm using. However, I have some doubts if that will really cure the problem if you reserve more. I didn't performe some measurments but from the past I remember that the actual memory usage of this initialization routines is about 1.2kByte. So it's not the amount - it's the way the memory allocation works on the FreeRTOS task heap. Maybe it's a combined problem of FreeRTOS and microROS. Bytheway, I'm using the heap4 scheme. So I'have a workaround - although I'm still confused why this issue is not reported all the time. Maybe a corrupted task heap is not necessarily showing its consequences immediately. Depending what your other tasks are doing it might be undetected for a while. In my case I have 4 publisher in 3 different tasks and 2 subscriber in 2 different tasks. Each of them doing a lot of computing steps and have a cycle time between 10 and 30 milliseconds.
  2. I'm using the serial transport with DMA. Here we come to the next problem. First I had to adjust the 'delay' function that are used for the read/write functions because they are also used during the initialization routines where no FreeRTOS is available (e.g. see line 40 in the screenshot above). To do so I had to adjust the used ISR priority levels above the (usually 5) level of the FreeRTOS.
  3. The more severe problem I realized with the DMA transport function is the buffer length of the DMA transport. The write function limits itself to 42 Bytes. I have absolutly no idea why this was decided. The corresponding DMA buffer is allocated with 512 Bytes. So it might be because of the stream buffer - but I'm not sure. I init my publishers with 'best effort'. For example I have an IMU message that consists of more than 320 Bytes. It's now divided in 8 pieces. If you are not doing anything, you find yourself with 8 times 1millisecond delay. I'm transmitting with 921.600Bit/sec. So the 42 Byte package can be transmitted in about 450usec. However, it takes more than twice the time. At the moment I'm working on a scheme to use mutex or semaphore or flags to overcome this problem. It's not so easy because you don't know when the last data chunk of the current published message is transmitted (beside gambling and look at the chunk size).
  4. Because of the issue above it takes quite long to publish data. In my application other tasks are publishing in the meantime. So the DMA transport behaves 'unusual' because it simply sets an error for the second (third, fourth, ...) publishing while the current one is ongoing. So I have to coordinate this condition as well - but haven't find a proper and efficient solution yet. Sometimes in situations like that it's recommend to create a 'publishing task' that is getting all the informations via a message queue. Maybe I will give this idea a try.

As soon I have some stable running code I will distribute it.
Keep your updated
Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants