Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug in process closefile function which may cause metaserver crash #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hmings888
Copy link

In InodeFileStore.cpp:closeFile,use entryID to search in the global inode map, when found, the iter point to the "inode" which maybe different from the "inode" that the closeFile function use as input parameter. In fact, the latter "inode" maybe in the parent dir's inode map but global inode map. This mistake would make the reference count of inode wrong in subsequent request of closefile, and finally cause metaserver to coredump. So add the condition to make the "inode" is what we found.

in InodeFileStore.cpp:closeFile, only use entryID to search in the global inode map,  when found, the iter point to the "inode" which maybe different from the "inode" that the closeFile function input parameter.  in fact, the latter "inode" maybe in the parent dir's inode map.
@iamjoemccormick
Copy link
Member

Hi @hmings888, thanks for the PR!

One initial question if you've observed any meta crashes as a result of this, or just noticed a crash is theoretically possible based on code analysis? If you have steps to reproduce the issue, it would make it easier to analyze and test the proposed fix.

@iamjoemccormick iamjoemccormick added the bug Something isn't working label Jul 12, 2024
@hmings888
Copy link
Author

Hi @hmings888, thanks for the PR!

One initial question if you've observed any meta crashes as a result of this, or just noticed a crash is theoretically possible based on code analysis? If you have steps to reproduce the issue, it would make it easier to analyze and test the proposed fix.

It raised metasever segfault many times due to this, but all the coredump files generated have been deleted accidentally :(
In fact,I don't completed know how to reproduce this problem in a simple test case, but when running a complicated application using beegfs,it raise often. And I deep into the codes and fixed as the mention before,it dose works:)

@iamjoemccormick
Copy link
Member

Hello @hmings888 ,

I wanted to close the loop on this. There were a number of improvements in this area of the metadata service that went into both BeeGFS 7.4.4 and 7.4.5. While they don't directly make the change you're proposing here, I had a chat with the team and there were concerns this approach might also have other side effects, and the changes that were already made might actually solve this issue. Could you try upgrading to BeeGFS BeeGFS 7.4.5 (without this patch) and let me know if you continue to see the issue?

@hmings888
Copy link
Author

Hello @hmings888 ,

I wanted to close the loop on this. There were a number of improvements in this area of the metadata service that went into both BeeGFS 7.4.4 and 7.4.5. While they don't directly make the change you're proposing here, I had a chat with the team and there were concerns this approach might also have other side effects, and the changes that were already made might actually solve this issue. Could you try upgrading to BeeGFS BeeGFS 7.4.5 (without this patch) and let me know if you continue to see the issue?

Got it, I'll try the new version later. Thank you for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants