Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault tolerance issue in FAT12 #14

Open
smithBraun opened this issue Aug 11, 2022 · 11 comments
Open

Fault tolerance issue in FAT12 #14

smithBraun opened this issue Aug 11, 2022 · 11 comments
Assignees

Comments

@smithBraun
Copy link

background:
Upon deleting file, there is deletion of the FAT chain of the file from the end to start (in _fx_utility_FAT_flush called by _fx_fault_tolerant_cleanup_FAT_chain).
upon power down, the fault tolerance know just the beginning of the FAT chain, and it searching again till the end of it (which may be shorter now if before the power down it started to be deleted) and continue deleting from end to beginning.

the bug:
in FAT12, FAT entries may be divided into 2 sectors, if the power down occur between writing one sector to the other, after the power down when looking on the chain this entry may point on wrong place, which will cause erasing another non related entries.

@smithBraun
Copy link
Author

Hi,
I understand that it may be long time for investigation/solve this issue.
So I will appreciate if you can update if you agree/disagree it is real issue, when you have option for solution to heare about it, and to get early drop of it.

@TiejunMS
Copy link
Contributor

@smithBraun , thanks for reporting the issue. We are working on reproducing the issue and will keep you updated.

@TiejunMS TiejunMS self-assigned this Aug 15, 2022
@smithBraun
Copy link
Author

HI @TiejunMS
Thank you.
Just want to mention that if one part of the FAT entry is 0 (not matter if it is the part in the first sector or in the second), there won't be issue.

@smithBraun
Copy link
Author

Similar issue can happen when FAT chain is written when _fx_utility_FAT_flush called by _fx_utility_FAT_entry_write (when FX_FAULT_TOLERANT_STATE_SET_FAT_CHAIN)

@smithBraun
Copy link
Author

Hi @TiejunMS ,
Any success with reproducing?

@TiejunMS
Copy link
Contributor

@smithBraun , did you encounter this issue by analysis or run into this issue in application? Here is my analysis on this issue.

Let's say the bytes per sector is 512 and sector per cluster is 1. On FAT12, each sector can hold 341 FAT entries. The original FAT chain of the file is as below.
700(3)->400(2)->800(3)->END
The FAT entries of this file start from the third FAT sector, pointers to second FAT sector, then third sector.

When this file is deleted, in fx_fault_tolerant_cleanup_FAT_chain.c, all these three FAT entries will be cached and deleted from back to front. FAT entry 800 will be deleted first. Due to the sector of FAT entry 400 is different from 800, changes to FAT entries (from 800->END to 800->FREE) will be flushed to disk. If the power off happens before deleting FAT entry 400, the FAT chain will be like this.
700(3)->400(2)->800(3)->FREE

On next power on, we will do nothing to FAT entry 800 due to it is already freed. Only FAT entries 700 and 400 will be deleted.

after the power down when looking on the chain this entry may point on wrong place

I'm not sure about the entry pointing to wrong place. Did you mean FAT entry 400 still pointers to 800?

If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?

@smithBraun
Copy link
Author

@TiejunMS sorry for being not clear enough, I see you understand wrongly the bug I described.

did you encounter this issue by analysis or run into this issue in application
I ran into this issue while running power down tests on FILEX

If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?
Sure, let take your example of bytes per sector is 512 and sector per cluster is 1, I have two chains:
FAT(0x155) == 0x014->FAT(0x014) == 0xfff->END
FAT(0x010) == 0xfff->END
Looking at the entry sitting in 0x155, as 512 bytes sectors contain 0x155+1/3 FAT entries, so mapping the entries to sectors - this entry is separated into two, the 0x004 is in sector 1 and the 0x010 is in sector 2:
(1,2) FAT(0x155) == 0x014 ->(1) FAT(0x014) == 0xfff->END
(1) FAT(0x010) == 0xfff->END
Now let say the delete process of the first chain is beginning, from back to front as you mentioned, so first sector 1 will be updated so FAT entry 0x014 will be freed but entry 0x155 will be just partially updated!! :
(1,2) FAT(0x155) == 0x010 -> (1) FAT(0x010) == 0xfff->END
FAT(0x014) == 0x000
Power down in this state will cause corruption, as now the FAT chain clear will restart, now FAT entry 0x155 pointing to wrong place, so FAT entry 0x10 is going to get free.

You can simulate the power down in -
https://github.com/azure-rtos/filex/blob/89976978ff0ae62588e1871ea82fe05c67614c85/common/src/fx_utility_FAT_flush.c#L154-L155
Where the code detects place when FAT entry was separated to two and the first part written already.

@TiejunMS
Copy link
Contributor

@smithBraun , thanks for sharing the details! I confirm this is an issue and will come with a solution. I will keep you posted.

@smithBraun
Copy link
Author

Great, thanks @TiejunMS .
I will be happy to get the fix as soon as it implemented and not wait to official release, to re-run my tests and ensure I can't find more corner cases.

@smithBraun
Copy link
Author

Hi @TiejunMS ,
Any updates with this issue?

@TiejunMS
Copy link
Contributor

@smithBraun , the fix is working in progress. Could you send an email to Azure RTOS support ([email protected])? Once it is ready for test, I can share the source code with you.

@eclipsewebmaster eclipsewebmaster transferred this issue from another repository Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants