forked from xapi-project/xen-api
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-48545: enable best effort NUMA affinity placement by default #3
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
edwintorok
force-pushed
the
private/edvint/numa
branch
from
November 28, 2023 14:45
9d81a1c
to
e7f1b9a
Compare
edwintorok
force-pushed
the
private/edvint/numa
branch
4 times, most recently
from
January 9, 2024 11:17
cf8ba1c
to
c6736de
Compare
For some workloads this is an improvement: it places VMs on a small number of NUMA node, so each VM will perform local memory accesses where possible, which has lower latency. There are some workloads where this is a regression: now a VM can only use the bandwidth of a single NUMA node, rather than all NUMA nodes. However this advantage is only present while the VM is the only one using the system, if there are multiple VMs then fairly soon they will all do remote memory accesses, overloading the QPI links. The user can still override this on a host by host basis if their workloads require the previous behaviour. TODO: some actual benchmarks to support this theory Signed-off-by: Edwin Török <[email protected]>
edwintorok
force-pushed
the
private/edvint/numa-default
branch
from
March 28, 2024 18:43
5c42609
to
82b9a39
Compare
pytype_reporter extracted 50 problem reports from pytype output. You can check the results of the job here |
This requires more work on Xen side first, and some more work on Xapi side too to handle concurrent reboots, etc |
edwintorok
changed the title
CP-38020: enable best effort NUMA affinity placement by default
CP-48545: enable best effort NUMA affinity placement by default
Apr 15, 2024
edwintorok
added a commit
that referenced
this pull request
Aug 28, 2024
Backport of 3b52b72 This enables PAM to be used in multithreaded mode (currently XAPI has a global lock around auth). Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`. `pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library. If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel. GDB can also be used to confirm this by putting a breakpoint on `sleep`: ``` #0 __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42 #1 0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122 #2 0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131 #3 NSSLOW_Init () at lowhash_vector.c:148 #4 0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0", xapi-project#5 0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>, xapi-project#6 0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111 xapi-project#7 0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777 xapi-project#8 0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178 xapi-project#9 0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110 xapi-project#10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426 xapi-project#11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34 xapi-project#12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83 xapi-project#13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42 ``` `pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again. (This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`). Upstream has fixed this problem >5 years ago by switching to libxcrypt instead. Signed-off-by: Edwin Török <[email protected]> Signed-off-by: Christian Lindig <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For some workloads this is an improvement: it places VMs on a small number of NUMA node, so each VM will perform local memory accesses where possible, which has lower latency.
There are some workloads where this is a regression: now a VM can only use the bandwidth of a single NUMA node, rather than all NUMA nodes. However this advantage is only present while the VM is the only one using the system, if there are multiple VMs then fairly soon they will all do remote memory accesses, overloading the QPI links.
The user can still override this on a host by host basis if their workloads require the previous behaviour.
TODO: some actual benchmarks to support this theory