-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnode_is_dirty: use dn_dirty_txg to check dirtiness #15615
base: master
Are you sure you want to change the base?
Conversation
dn_dirty_ctx is always set to the highest txg that has ever dirtied the dnode. It is set in dbuf_dirty() when a data or metadnode dbuf is dirtied, and never cleared. [analysis of bug openzfs#15526 and fix openzfs#15571 below, for future readers] The previous dirty check was: for (int i = 0; i < TXG_SIZE; i++) { if (multilist_link_active(&dn->dn_dirty_link[i]) [dnode is dirty] However, this check is not "is the dnode dirty?" but rather, "is the dnode on a list?". There is a gap in dmu_objset_sync_dnodes() where the dnode is moved from os_dirty_dnodes to os_synced_dnodes, before dnode_sync() is called to write out the dirty dbufs. So, there is a moment when the dnode is not on a list, and so the check fails. It doesn't matter that the dirty check takes dn_mtx, because that lock isn't used for dn_dirty_link. The os_dirty_dnodes sublist lock is held in dmu_objset_sync_dnodes(), but trying to take that would mean possibly waiting until everything on that sublist has been synced. The correct fix has to check something that positively asserts the dnode is dirty, rather than an implementation detail. dn_dirty_txg (via DNODE_IS_DIRTY()) is that - its a normal bit of dnode state, under the dn_mtx lock, and unambiguously indicates whether or not there's changes pending. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the cleanness of this approach, but as I can see dn_dirty_txg is now updated only when one of dnode dbufs is getting dirty. Same time dnode_next_offset() seems to heavily depend on random structural data protected by dn_struct_rwlock, like number of blocks in a file, block size, number of indirection levels, etc. We must make sure that in all cases when those can change and get stale dn_dirty_txg is also bumped.
While looking on this I've got an optimization idea: would dnode_is_dirty() return the txg in which the dnode was last dirtied, we could pass the value to txg_wait_synced() instead of the 0, and so reduce number of transactions that has to be synced before the dnode will get clean and we can call dnode_next_offset() on it. There is a chance that dnode was modified a while ago and its TXG is already in process of syncing.
I agree, this would be a nice way to perform the dirty check. Of course the devil's in the details, as @amotin pointed out currently |
Ahh, I think I see where I went wrong: I saw There is also I agree that now we've got a fix in place, its worth taking the time to study all the cases and fix this up properly. So no hurry on this. Give me your thoughts on above, and I'll come back to this as I have time (I kind of have to get back to my actual work for a little while haha). |
Shouldn't correct code do logically following?
If you have no critical section at all, the code will be racy forever. And if you only have a critical section for the test, the code will be potentially racy, too, in case the action is ever changed without checking if the critical test alone is acceptable with the modified code. Or am I missing something here? |
Nope, you're exactly right. The bit of code that syncs things out does the right thing, in that that it takes the list lock before moving things on and off the dirty list. The dirty check that caused all the trouble, however, does not take that lock first. It certainly could, but I don't want to do that because the list lock has a much wider scope than a single dnode, so it can cause a significant stall. So instead I'm fishing around for an equivalent item on the dnode itself, so we can take the dnode lock only to do the check (and of course take the dnode lock in the sync function to keep that one in correct to both observers). I feel like I didn't explain that so well though... 😅 |
(Hi all, first time contributing here. I've been spending a while reading the code and history trying to understand the precise causes and history of #15526. I have written a note with my findings here.)
Another approach is to count txgs in which the dnode is dirtied (similar to Even better is to make p.s. |
@robn See #16025 for the first step and master...rrevans:zfs:find_dirty for the rest of the approach described in my comment above. |
Motivation and Context
#15571 is a reliable fix for #15526, but it wasn't clear why it was necessary. This PR explains it, and offers the correct fix.
For avoidance of doubt: #15571 fixes the problem. There's no new problem that this fixes. There's no hurry at all to ship this (assuming its even right).
Description
dn_dirty_ctx
is always set to the highest txg that has ever dirtied the dnode. It is set indbuf_dirty()
when a data or metadnode dbuf is dirtied, and never cleared.[analysis of bug #15526 and fix #15571 below, for future readers]
The previous dirty check was:
However, this check is not "is the dnode dirty?" but rather, "is the dnode on a list?".
There is a gap in
dmu_objset_sync_dnodes()
where the dnode is moved fromos_dirty_dnodes
toos_synced_dnodes
, beforednode_sync()
is called to write out the dirty dbufs. So, there is a moment when the dnode is not on a list, and so the check fails.It doesn't matter that the dirty check takes
dn_mtx
, because that lock isn't used fordn_dirty_link
. Theos_dirty_dnodes
sublist lock is held indmu_objset_sync_dnodes()
, but trying to take that would mean possibly waiting until everything on that sublist has been synced.The correct fix has to check something that positively asserts the dnode is dirty, rather than an implementation detail.
dn_dirty_txg
(viaDNODE_IS_DIRTY()
) is that - its a normal bit of dnode state, under thedn_mtx
lock, and unambiguously indicates whether or not there's changes pending.Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
How Has This Been Tested?
Running a variant of the reproducer from #15526, with the below patch applied to substantially widen the gap:
Without the previous fix or this one, its easy to hit over and over again. With this fix in place, its silent.
Doing this also goes some way to support the analysis.
Types of changes
Checklist:
Signed-off-by
.