Fault tolerance issue in FAT12 #14

smithBraun · 2022-08-11T15:17:47Z

background:
Upon deleting file, there is deletion of the FAT chain of the file from the end to start (in _fx_utility_FAT_flush called by _fx_fault_tolerant_cleanup_FAT_chain).
upon power down, the fault tolerance know just the beginning of the FAT chain, and it searching again till the end of it (which may be shorter now if before the power down it started to be deleted) and continue deleting from end to beginning.

the bug:
in FAT12, FAT entries may be divided into 2 sectors, if the power down occur between writing one sector to the other, after the power down when looking on the chain this entry may point on wrong place, which will cause erasing another non related entries.

smithBraun · 2022-08-14T03:58:46Z

Hi,
I understand that it may be long time for investigation/solve this issue.
So I will appreciate if you can update if you agree/disagree it is real issue, when you have option for solution to heare about it, and to get early drop of it.

TiejunMS · 2022-08-15T01:06:33Z

@smithBraun , thanks for reporting the issue. We are working on reproducing the issue and will keep you updated.

smithBraun · 2022-08-15T05:15:22Z

HI @TiejunMS
Thank you.
Just want to mention that if one part of the FAT entry is 0 (not matter if it is the part in the first sector or in the second), there won't be issue.

smithBraun · 2022-08-16T05:07:43Z

Similar issue can happen when FAT chain is written when _fx_utility_FAT_flush called by _fx_utility_FAT_entry_write (when FX_FAULT_TOLERANT_STATE_SET_FAT_CHAIN)

smithBraun · 2022-08-21T09:40:38Z

Hi @TiejunMS ,
Any success with reproducing?

TiejunMS · 2022-08-26T06:53:14Z

@smithBraun , did you encounter this issue by analysis or run into this issue in application? Here is my analysis on this issue.

Let's say the bytes per sector is 512 and sector per cluster is 1. On FAT12, each sector can hold 341 FAT entries. The original FAT chain of the file is as below.
700(3)->400(2)->800(3)->END
The FAT entries of this file start from the third FAT sector, pointers to second FAT sector, then third sector.

When this file is deleted, in fx_fault_tolerant_cleanup_FAT_chain.c, all these three FAT entries will be cached and deleted from back to front. FAT entry 800 will be deleted first. Due to the sector of FAT entry 400 is different from 800, changes to FAT entries (from 800->END to 800->FREE) will be flushed to disk. If the power off happens before deleting FAT entry 400, the FAT chain will be like this.
700(3)->400(2)->800(3)->FREE

On next power on, we will do nothing to FAT entry 800 due to it is already freed. Only FAT entries 700 and 400 will be deleted.

after the power down when looking on the chain this entry may point on wrong place

I'm not sure about the entry pointing to wrong place. Did you mean FAT entry 400 still pointers to 800?

If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?

smithBraun · 2022-08-28T09:38:59Z

@TiejunMS sorry for being not clear enough, I see you understand wrongly the bug I described.

did you encounter this issue by analysis or run into this issue in application
I ran into this issue while running power down tests on FILEX

If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?
Sure, let take your example of bytes per sector is 512 and sector per cluster is 1, I have two chains:
FAT(0x155) == 0x014->FAT(0x014) == 0xfff->END
FAT(0x010) == 0xfff->END
Looking at the entry sitting in 0x155, as 512 bytes sectors contain 0x155+1/3 FAT entries, so mapping the entries to sectors - this entry is separated into two, the 0x004 is in sector 1 and the 0x010 is in sector 2:
(1,2) FAT(0x155) == 0x014 ->(1) FAT(0x014) == 0xfff->END
(1) FAT(0x010) == 0xfff->END
Now let say the delete process of the first chain is beginning, from back to front as you mentioned, so first sector 1 will be updated so FAT entry 0x014 will be freed but entry 0x155 will be just partially updated!! :
(1,2) FAT(0x155) == 0x010 -> (1) FAT(0x010) == 0xfff->END
FAT(0x014) == 0x000
Power down in this state will cause corruption, as now the FAT chain clear will restart, now FAT entry 0x155 pointing to wrong place, so FAT entry 0x10 is going to get free.

You can simulate the power down in -
https://github.com/azure-rtos/filex/blob/89976978ff0ae62588e1871ea82fe05c67614c85/common/src/fx_utility_FAT_flush.c#L154-L155
Where the code detects place when FAT entry was separated to two and the first part written already.

TiejunMS · 2022-08-29T09:32:43Z

@smithBraun , thanks for sharing the details! I confirm this is an issue and will come with a solution. I will keep you posted.

smithBraun · 2022-08-29T10:29:51Z

Great, thanks @TiejunMS .
I will be happy to get the fix as soon as it implemented and not wait to official release, to re-run my tests and ensure I can't find more corner cases.

smithBraun · 2022-09-12T12:02:36Z

Hi @TiejunMS ,
Any updates with this issue?

TiejunMS · 2022-09-13T01:03:23Z

@smithBraun , the fix is working in progress. Could you send an email to Azure RTOS support ([email protected])? Once it is ready for test, I can share the source code with you.

TiejunMS self-assigned this Aug 15, 2022

eclipsewebmaster transferred this issue from another repository Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fault tolerance issue in FAT12 #14

Fault tolerance issue in FAT12 #14

smithBraun commented Aug 11, 2022

smithBraun commented Aug 14, 2022

TiejunMS commented Aug 15, 2022

smithBraun commented Aug 15, 2022

smithBraun commented Aug 16, 2022

smithBraun commented Aug 21, 2022

TiejunMS commented Aug 26, 2022

smithBraun commented Aug 28, 2022

TiejunMS commented Aug 29, 2022

smithBraun commented Aug 29, 2022

smithBraun commented Sep 12, 2022

TiejunMS commented Sep 13, 2022

Fault tolerance issue in FAT12 #14

Fault tolerance issue in FAT12 #14

Comments

smithBraun commented Aug 11, 2022

smithBraun commented Aug 14, 2022

TiejunMS commented Aug 15, 2022

smithBraun commented Aug 15, 2022

smithBraun commented Aug 16, 2022

smithBraun commented Aug 21, 2022

TiejunMS commented Aug 26, 2022

smithBraun commented Aug 28, 2022

TiejunMS commented Aug 29, 2022

smithBraun commented Aug 29, 2022

smithBraun commented Sep 12, 2022

TiejunMS commented Sep 13, 2022