Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-392674: nbd_client_manager retry connect on nbd device busy #6021

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liulinC
Copy link
Collaborator

@liulinC liulinC commented Sep 27, 2024

to connect to nbd devices, nbd_client_manager will

  1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock
  2. check whether nbd is being used by nbd-client -check
  3. load nbd kernel module by modprobe nbd
  4. call nbd-client to connect to nbd device

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes.

To fix the issue,

  • we patch nbd-client to report the device busy from kernel to nbd_client_manager
  • nbd_client_manager should check nbd-client exit code, and retry on device busy

@MarkSymsCtx
Copy link
Contributor

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

This would be better fixed by adding a rule to udev to ensure that system-udevd leaves the devices alone and thus doesn't result in them being found busy.

@liulinC
Copy link
Collaborator Author

liulinC commented Sep 27, 2024

However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device.

This would be better fixed by adding a rule to udev to ensure that system-udevd leaves the devices alone and thus doesn't result in them being found busy.

XS8 stream udev handle the nbd device as well. (and has this race issue as well).
I have already raise some PR to clean some unnecessary udev rules.
I doubt should we totally ignore nbd devices in udev rules. (as XS8 steam do take care of it)

BTW, udev rules is only run and executed with the device been opened and lock been hold by systemd-udevd(workers),
https://github.com/systemd/systemd/blob/main/src/udev/udev-worker.c#L184-L203
This means the udev rules can only narrow down the race time window, but can not resolve the issue.

So I prefer to resolve the races issues other than just totally ignore nbd devices in the udev rules.

to connect to nbd devices, nbd_client_manager will
1. protect the operation with
/var/run/nonpersistent/nbd_client_manager file lock
2. check whether nbd is being used by `nbd-client -check`
3. load nbd kernel module by `modprobe nbd`
4. call `nbd-client` to connect to nbd device

However, step 3 will trigger systemd-udevd run asyncly, which would
open and lock the same nbd devices, run udev rules, etc. This introduce
races with step 4, e.g. both process want to open and lock the nbd
device.

Note: the file lock in step 1 does NOT resovle the issue here,
as it only coordinate multiple nbd_client_manager processes.

To fix the issue,
- we patch nbd-client to report the device busy from kernel to nbd_client_manager
- nbd_client_manager should check nbd-client exit code, and retry on device busy

Signed-off-by: Lin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants