-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zinject: add "probe" device injection type #16947
base: master
Are you sure you want to change the base?
Conversation
tests/zfs-tests/tests/functional/cli_root/zinject/zinject_probe.ksh
Outdated
Show resolved
Hide resolved
tests/zfs-tests/tests/functional/cli_root/zinject/zinject_probe.ksh
Outdated
Show resolved
Hide resolved
@robn when you're satisfied with your changes, would you mind squashing these commits? |
@tonyhutter if/when #16950 lands, I'll rebase and then it will be just two logically-separate commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have no objections once #16950 is merged.
|
The old kstat helper function was barely used, I suspect in part because it was very limited in the kinds of kstats it could gather. This adds new functions to replace it, for each kind of thing that can have stats: global, pool and dataset. There's options in there to get a single stat value, or all values within a group. Most importantly, the interface is the same for both platforms. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
Removes other custom helpers and direct accesses to /proc. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
It's now a simple wrapper, so lets just call kstat direct. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
I'm about to add a new "type", and I need somewhere to put it! Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
I'm guessing the max 10s wait for suspend is too short. Bumped it to max 30s, why not. |
I meant the CI timed out after 10 minutes. Something might be wrong with the test, so that it never completed. |
Injecting a device probe failure is not possible by matching IO types, because probe IO goes to the label regions, which is explicitly excluded from injection. Even if it were possible, it would be awkward to do, because a probe is sequence of reads and writes. This commit adds a new IO "type" to match for injection, which looks for the ZIO_FLAG_PROBE flag instead. Any probe IO will be match the injection record and recieve the wanted error. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
I'm glad you pushed; it made me take another look at it, and took a little while to figure out (at least, I think I figured it out).
Two observations:
So probably, it suspended before the Last push tries a lot harder to control the timing, by pushing out the txg timeout, syncing the pool, doing the write (to cache) before setting the injections, and then forcing it out. It tests ok here; we'll see if it ends up being flaky or not. |
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
zinject
can't create the scenario of a drive faulting after a failed write. The write failure can be injected, which triggers a probe, but probe IO always goes to the label regions, which are explicitly excluded from device injections. So the probe succeeds, and the write is retried, triggering another probe, over and over, forever.This adds a new
probe
pseudo-IO-type for device injection, which allows a probe failure to be simulated.Description
First, we extend the device injection type to allow more than just the "fundamental" IO types.
zi_iotype
was always large enough, there just wasn't a way to do more than the regularZIO_TYPE_*
values. Now it has its own values,ZINJECT_IOTYPE_*
for injection IO types. For compatibility, the existing IO types are placed at the start so their numeric values match.all
still matches all fundamental types, so this change should be UI and ABI-compatible.Then, we add a new type,
ZINJECT_IOTYPE_PROBE
. This matches any and all ZIOs with theZIO_FLAG_PROBE
flag set, regardless of type or location on disk (including in the labels). The normal_READ
and_WRITE
injections won't match these, to preserve the existing behaviour.Taken together, this we can simulate a device fault on write with:
And indeed, this is exactly what the included test does.
How Has This Been Tested?
New test included.
All
zinject
-using tests have passed with this change in place.Types of changes
Checklist:
Signed-off-by
.(Wow! Clean sweep!)