Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using rwsem lock as inode_lock to avoid deadlock #13736

Merged
merged 4 commits into from
Oct 1, 2024

Conversation

Donny9
Copy link
Contributor

@Donny9 Donny9 commented Sep 29, 2024

Summary

  1. using rwsem lock as inode_lock to avoid deadlock
    deadlock examples:
    When executing df -h on Core A to view mount information, this process will traverse inode nodes, thereby holding the inode_lock. Since the inode type of the mount point may be rpmsgfs, it will fetch staffs information from another Core B.
    Meanwhile, rcS on Core B needs to obtain file information from Core A, which will be achieved by fetching stat information through rpmsgfs. When this message arrives at Core A, a deadlock can occur between Core A's rptun ap and nsh task.
    However, both of these places involve read operations only, thus a reader-writer lock can be utilized to prevent such a deadlock.
  2. simply inode_lock code, remove return value check
  3. support recursive write for same process in sem_rw lock

Impact

Bug fix

Testing

Vela

@github-actions github-actions bot added Area: File System File System issues Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Sep 29, 2024
@nuttxpr
Copy link

nuttxpr commented Sep 29, 2024

[Experimental Bot, please feedback here]

Let's analyze if this Pull Request (PR) description fulfills the NuttX requirements:

Strengths:

  • Addresses a Deadlock: The PR tackles a critical issue—a deadlock scenario. This is clearly a significant bug fix, directly enhancing the stability of NuttX.
  • Provides a Root Cause Explanation: The description explains the deadlock's origin, involving interactions between inode traversal, rpmsgfs, and Core A/B communication. This clarity is valuable for reviewers.
  • Solution Rationale: Justifies the use of a reader-writer (rwsem) lock as a solution. It correctly points out that the deadlock arises from concurrent read operations, making rwsem a suitable mitigation.
  • Mentions Additional Changes: Briefly notes code simplification (removing return value checks) and recursive write support. While needing more detail, it shows awareness of broader impacts.

Weaknesses:

  • Impact Section Lacking: The "Impact" section is overly brief. It correctly identifies the change as a bug fix but needs to elaborate on the potential ramifications as the PR description template requests.
  • Testing Details Insufficient: "Testing: Vela" is insufficient. The requirements demand:
    • Specifics about the testing environment: Which build hosts (OS, CPU, compiler) and targets (architecture, board, configuration) were used?
    • Testing logs: Provide "before" and "after" logs to demonstrate the deadlock's presence initially and its resolution after the change.
  • Code Simplification and Recursion Under-explained: The description mentions simplifying inode_lock code and adding recursive write support but lacks detail. These changes deserve elaboration:
    • Code Simplification: What specific code was simplified? Why was the return value check unnecessary?
    • Recursive Writes: Why was this support added? Does it introduce any potential risks?

Verdict & Recommendations:

This PR description is a good starting point but does not fully meet NuttX requirements.

Here's how to improve it:

  1. Expand the Impact Section: Address all the points outlined in the template, even if the answer is "NO" (with a brief explanation). For example, "Impact on build (will build process change)? NO – The changes are localized to locking mechanisms and do not affect the build process."
  2. Provide Thorough Testing Information:
    • List all build host and target environments used for validation.
    • Include relevant "before" and "after" testing logs. These logs are crucial for demonstrating the deadlock and its successful resolution.
  3. Explain Code Simplification: Briefly describe the removed code and why its return value check was redundant.
  4. Justify Recursive Write Support: Explain the rationale behind adding this feature and address any potential downsides or complexities it might introduce.

By addressing these points, you'll create a more robust and informative PR description, increasing the likelihood of a smoother review and acceptance.

fs/vfs/fs_pseudofile.c Outdated Show resolved Hide resolved
fs/shm/shmfs.c Show resolved Hide resolved
return ret;
}

inode_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inode_rlock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some scenarios, inode_find and inode_remove are used in combination, such as in shm_open and shm_unlink. The remove operation requires a write lock. If we were to switch to a read lock in inode_find, then a lock would need to be added in inode_remove, and additional checks would be required elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's create a patch to change rwsem which promote rlock to wlock if the same thread already hold wlock.

fs/inode/fs_inodeaddref.c Outdated Show resolved Hide resolved
Example:
When executing "df -h" on Core A to view mount information, this
process will traverse inode nodes, thereby holding the inode_lock.
Since the inode type of the mount point may be rpmsgfs, it will fetch statfs
information from another Core B.

Meanwhile, rcS on Core B needs to obtain file information from Core A,
which will be achieved by fetching stat information through rpmsgfs.
When this message arrives at Core A, a deadlock can occur between Core A's
rptun ap and nsh task.

However, both of these places involve read operations only, thus a reader-writer lock
can be utilized to prevent such a deadlock.

Signed-off-by: dongjiuzhu1 <[email protected]>
@xiaoxiang781216 xiaoxiang781216 merged commit b2e69b8 into apache:master Oct 1, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: File System File System issues Area: OS Components OS Components issues Size: M The size of the change in this PR is medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants