Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fixed a bug where AMD CPUs were not correctly fingerprinting base speed #24415

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mvegter
Copy link
Contributor

@mvegter mvegter commented Nov 11, 2024

Relates to: #19468

Description

Within one of our production sites we are running on a mixed server fleet, where some are running baremetal Nomad and others are part of the virtualised fleet. We noticed a big difference in the reported CPU . On the baremetal systems they were using the boost clock at 3.5 GHz whereas on the virtualised load they were using the QEMU default for dmidecode at 2GHz

nomad version
Nomad v1.8.2
BuildDate 2024-07-16T08:50:09Z
Revision 7f0822c1e4f25907d9f60e2d595411950dd1bd28

Testing & Reproduction steps

image
image

grep -H . /sys/devices/system/cpu/cpu0/cpufreq/*
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:2450000
/sys/devices/system/cpu/cpu0/cpufreq/cpb:1
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:2450000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:3529052
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1500000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:0 128
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:2450000 2000000 1500000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:3241647
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:3529052
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1500000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>
grep -H . /sys/devices/system/cpu/cpu0/acpi_cppc/*
/sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs:ref:4782881970238275 del:6173496996912339
/sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf:255
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq:400
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf:109
/sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf:29
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:2451
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf:177
/sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf:177
/sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time:18446744073709551615

Quick search through our inventory :

AMD EPYC 7313 16-Core Processor
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:3001

AMD EPYC 7713 64-Core Processor
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:2000

AMD EPYC 7742 64-Core Processor
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:2251

AMD EPYC 7763 64-Core Processor
/sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:2451

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@mvegter mvegter changed the title client: fixed a bug where AMD CPUs where not correctly fingerprinting base speeds client: fixed a bug where AMD CPUs were not correctly fingerprinting base speed Nov 11, 2024
@mvegter mvegter force-pushed the mvegter-fix-amd-fingerprinting branch from 775121b to 7158dce Compare November 11, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

1 participant