Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XRT-LITE] add ability to configure NPU power mode #851

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

makslevental
Copy link
Collaborator

@makslevental makslevental commented Oct 19, 2024

Notes

  1. This needs sudo;
    • if you want to run the whole run_matmul_test.sh script under sudo and you have env variables you need to do sudo -E;
  2. I remembered this can actually be done using xrt-smi with something like
    sudo /opt/xilinx/xrt/bin/xrt-smi configure -d 0000:c5:00.1 --pmode turbo
    
    Still maybe this is useful to expose directly here so that xrt-smi isn't required in the env.

Using the added test I got these results:

Default

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
BM_matmul_64x64_64xbf16_/process_time/real_time              1.61 ms        0.671 ms          456 items_per_second=620.987/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.56 ms        0.641 ms          456 items_per_second=643.073/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.59 ms        0.648 ms          456 items_per_second=630.323/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.62 ms        0.653 ms          456 items_per_second=616.069/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.59 ms        0.646 ms          456 items_per_second=629.755/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.644 ms          456 items_per_second=635.695/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.58 ms        0.641 ms          456 items_per_second=633.842/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.639 ms          456 items_per_second=636.084/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.59 ms        0.642 ms          456 items_per_second=630.571/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.58 ms        0.648 ms          456 items_per_second=633/s
BM_matmul_64x64_64xbf16_/process_time/real_time_mean         1.59 ms        0.648 ms           10 items_per_second=630.94/s
BM_matmul_64x64_64xbf16_/process_time/real_time_median       1.58 ms        0.645 ms           10 items_per_second=631.786/s
BM_matmul_64x64_64xbf16_/process_time/real_time_stddev      0.019 ms        0.009 ms           10 items_per_second=7.68176/s
BM_matmul_64x64_64xbf16_/process_time/real_time_cv           1.23 %          1.42 %            10 items_per_second=1.22%

Turbo

-----------------------------------------------------------------------------------------------------------------
Benchmark                                                       Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.652 ms          433 items_per_second=638.857/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.55 ms        0.651 ms          433 items_per_second=644.931/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.650 ms          433 items_per_second=638.939/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.644 ms          433 items_per_second=638.037/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.57 ms        0.664 ms          433 items_per_second=635.318/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.58 ms        0.663 ms          433 items_per_second=631.421/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.54 ms        0.648 ms          433 items_per_second=650.474/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.54 ms        0.646 ms          433 items_per_second=649.22/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.56 ms        0.669 ms          433 items_per_second=642.177/s
BM_matmul_64x64_64xbf16_/process_time/real_time              1.60 ms        0.660 ms          433 items_per_second=623.584/s
BM_matmul_64x64_64xbf16_/process_time/real_time_mean         1.56 ms        0.655 ms           10 items_per_second=639.296/s
BM_matmul_64x64_64xbf16_/process_time/real_time_median       1.57 ms        0.652 ms           10 items_per_second=638.898/s
BM_matmul_64x64_64xbf16_/process_time/real_time_stddev      0.020 ms        0.009 ms           10 items_per_second=8.09723/s
BM_matmul_64x64_64xbf16_/process_time/real_time_cv           1.27 %          1.31 %            10 items_per_second=1.27%

Higher items_per_second is better (I'm pretty sure?).

So for BM_matmul_64x64_64xbf16_/process_time/real_time_mean we get 630.94/s under default vs. 639.296/s under turbo, but with stddev=8.09723 it's basically the same. So I'm not sure what the effect should be 🤷.

Note at least one of the things it's doing is enabling/disabling clock gating:

[13486.742867] amdxdna:aie2_pm_set_mode:90: amdxdna 0000:c5:00.1: Changing power mode from 0 to 4
[13486.742869] amdxdna:aie2_pm_clock_gating:27: amdxdna 0000:c5:00.1: Disable clock gating, 1 type(s)
...
[13493.313651] amdxdna:aie2_pm_set_mode:90: amdxdna 0000:c5:00.1: Changing power mode from 4 to 0
[13493.313653] amdxdna:aie2_pm_clock_gating:27: amdxdna 0000:c5:00.1: Enable clock gating, 1 type(s)

(via dmesg).

EDIT:

I did this test with a debug build - maybe in a release build there's a difference 🤷‍♂️

Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@makslevental makslevental force-pushed the makslevental/xrt-lite-power-mode branch from 070eaee to d827213 Compare October 19, 2024 20:32
@makslevental makslevental force-pushed the makslevental/xrt-lite-power-mode branch from d827213 to f51c793 Compare October 19, 2024 20:39
@makslevental makslevental enabled auto-merge (squash) October 19, 2024 20:52
@makslevental makslevental merged commit ef385bd into main Oct 19, 2024
5 checks passed
@makslevental makslevental deleted the makslevental/xrt-lite-power-mode branch October 19, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants