-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DMA SPI driver aware of CPU cache, fix data corruption and other SPI issues on STM32H7 #199
Conversation
targets/TARGET_Freescale/TARGET_MCUXpresso_MCUS/TARGET_MCU_K64F/spi_api.c
Show resolved
Hide resolved
targets/TARGET_STM/TARGET_STM32H7/STM32Cube_FW/STM32H7xx_HAL_Driver/stm32h7xx_hal_dma_ex.c
Show resolved
Hide resolved
hard stuff, hard to review.
from my experience, alignas works fine (just check the mapfile) and does not need linker script support. You'll need the linker script when put data in defined sections with attribute(section(".some_section")). There is a runtime function for getting aligned memory, this is working now in current gcc versions. But these memory blocks are not working with the Mbed heap tracing, because the memory pointer can be modified and the heap trace relies on some allocation internals. I have an open issue for this. |
Hey @JojoS62 , great to have you back! Thanks for looking at this PR, you're right it's a tough issue and took some thought to work out. What is this heap function to allocate aligned memory? I'll check it out! But, I still think I might stick with how I have it now unless there's an issue with the current implementation, because I like the fact that CacheAlignedBuffers can be allocated as globals or from the heap without worrying about special aligned allocations. |
@multiplemonomials this is the function: https://en.cppreference.com/w/c/memory/aligned_alloc |
@Ky-Ng @JojoS62 I've updated this MR to split CacheAlignedBuffer into two variants: StaticCacheAlignedBuffer, which allocates from fixed backing memory, and DynamicCacheAlignedBuffer, which allocates off the heap. This makes it possible to create buffers whose size is only known at runtime (which you couldn't do in the initial implementation). I'd rather not depend on the C library functionality at this point since it sounds like it isn't very well supported across platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a few questions on the use of static
and inline
functions. Other than that, the PR is amazing! Excited to use the STM32H7.
Thanks @multiplemonomials!
Summary of changes
I started this effort to track down some DMA SPI issues that were showing up in the test cases for Nucleo H743ZI. I thought there might just be some random bugs in the HAL driver, and I did in fact find two such issues. However, I also found a more serious problem: the DMA SPI code did not correctly manage the cache for Cortex-M7 devices which have a CPU cache.
For the initial DMA SPI implementation, I thought it would be enough to just invalidate the cache for the Rx buffer after the transfer is over. And this almost works -- it does ensure that the correct Rx data gets read out at the end of the transfer. However, if there are any other variables being stored in the same cache lines as the Rx data, these can actually be corrupted, because the variable could be up to date in the CPU cache but out of date in main memory, in which case the cache invalidation throws out the correct value. I actually found the issue because of a test case which declared an Rx buffer on the stack and was seeing stack corruption (!).
At its core, the issue is kinda a design problem. Mbed acts like you can just do a DMA SPI operation into any old memory buffer, but you really can't. The destination buffer has to be cache aligned in order for things to work, because we need to be able to safely cache invalidate the buffer.
So, I had to fix the design problem with an API change. I added a new
CacheAlignedBuffer<DataT, BufferSize>
template, and changed the SPI code to require that this structure be used when declaring rx buffers for asynchronous operations. On devices without a CPU cache (according to CMSIS definitions), the CacheAlignedBuffer template works basically like std::array, and just allocates an array of the specified type. On devices with a CPU cache, it allocates additional space (on the order of 64 bytes) to make sure that the buffer can be cache aligned. I thought about declaring CacheAlignedBuffer withalignas(32)
instead of doing the alignment "manually", but believe that needs linker script support to work, and also I don't know if it would work with malloc() allocated structures.I also had to make one more change, this time to the HAL API layer. So that we know if we need to invalidate the cache, we need some way to know if DMA actually got used for a transfer. The "DMA hint" is just that -- a hint. The HAL layer is free to always use interrupts or always use DMA if that's all that's supported. So, I updated the signature of
spi_master_transfer()
so that the HAL layer must return true if DMA actually got used for a transfer, and false otherwise. I then went through every HAL implementation which supports async SPI (8 of them, not counting copy pasted files) and had it return the right value based on what it supports.With these pieces in place, I could finally implement cache management in the SPI driver. It follows the three step process outlined here, and does:
I wanted to do this at the driver level, since the implementation is common to all ARM processors with a CPU cache, and also it's easy to screw up so I didn't want it to be up to the HAL vendors to get right. It does require some ifdefs and some new added variables for each SPI instance supporting async, but I think it's worth it so that we can get cache management right.
Finally, after all these fixes, my entire SPI test suite is passing on STM32H743!
Impact of changes
The API for SPI::transfer() and SPI::transfer_and_wait() is changing -- a
CacheAlignedBuffer
structure must be passed instead of a raw pointer for the Rx buffer. This is to support MCUs where the Rx buffer has to be cache aligned when receiving memory from a peripheral with DMA.Due to the relative obscurity and poor documentation of these APIs until recently, I doubt that a large number of applications were using them. However, applications that were using them will need updates.
Migration actions required
CacheAlignedBuffer
structures will need to be declared and used for the rx_buffers when callingSPI::transfer()
andSPI::transfer_and_wait()
.Documentation
Included in this PR!
Pull request type
Test results
Show test log