Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-405381 Mpath Info Does Not Automatically Refresh in XC After Disabling and Enabling Multipath #730

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LunfanZhang
Copy link
Contributor

The current implementation will skips refreshing the mpath status every time toggle mpath . This happens because enabling or disabling mpath occurs in maintenance mode, where the result must be disable from host = session.xenapi.host.get_record(hostref) under the maintenance mode.

Customers are likely to encounter issues sooner or later, which could result in an XSI ticket soon..

A better solution would be to check the actual XAPI status rather than relying on whether XAPI is in a "disabled" state by invoking host = session.xenapi.host.get_record(hostref). This is because a "disabled" status for XAPI does not necessarily mean it cannot respond to requests.

storage bvt: 211238
storage bst - mpath : 211239

… After Disabling and Enabling Multipath

Signed-off-by: Lunfan Zhang <[email protected]>
Copy link
Contributor

@MarkSymsCtx MarkSymsCtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach, nice.

@robhoes
Copy link
Member

robhoes commented Jan 27, 2025

xapi-health-check just runs an xe command that makes a no-op API call and retries up to 20 times with 5 seconds sleep in between. So we need to be happy to have a 100 second delay in this context. Alternatively, just use the existing host.get_record call as a test for xapi liveness without checking the enabled field: if it just returns a record then it's all good.

@LunfanZhang
Copy link
Contributor Author

LunfanZhang commented Feb 5, 2025

xapi-health-check just runs an xe command that makes a no-op API call and retries up to 20 times with 5 seconds sleep in between. So we need to be happy to have a 100 second delay in this context. Alternatively, just use the existing host.get_record call as a test for xapi liveness without checking the enabled field: if it just returns a record then it's all good.

Using the existing host.get_record call as a test for xapi liveness will miss the case that the xapi is happen to be unavailable for a very slight period of time, then a loop and try is better, which is what is the xapi-health-check did, and I think 100 seconds is acceptable here? as it is less than 2 minutes and better than failing immediately?

@MarkSymsCtx
Copy link
Contributor

In this particular case it doesn't actually matter how long it takes to be able to successfully communicate with xapi as this is a one-shot service triggered by udev events. Our requirement is that it manages to update xapi with the current state of the multipath system.

I do think we need a better way of being able to determine that xapi is in a state where we can reliably make requests to it, both database updates and host plugin calls. Without this clients are dancing on a knife edge of unreliability and will get sporadic and difficult to recover/triage issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants