Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VectorUtil::xorBitCount perf on ARM #13545

Merged
merged 3 commits into from
Jul 8, 2024

Conversation

ChrisHegarty
Copy link
Contributor

@ChrisHegarty ChrisHegarty commented Jul 5, 2024

This commit improves the performance of VectorUtil::xorBitCount on ARM by ~4x.

This change is effectively a workaround for the lack of vectorization of Long::bitCount on ARM, see https://github.com/ChrisHegarty/hammingBench/. I'll get an issue filed against Hotspot for this. ( JDK bug tracking this issue: https://bugs.openjdk.org/browse/JDK-8336000 )

On x64 there is no issue, the long variant of xorBitCount outperforms the int variant by ~15%.

Before (measures throughput in seconds, so bigger numbers are better)

Benchmark                             (dims)     (nb)   Mode  Cnt   Score   Error  Units
HammingDistanceBenchmark.xorBitCount    1024  1000000  thrpt    5  29.128 ± 5.697  ops/s

After

Benchmark                             (dims)     (nb)   Mode  Cnt   Score   Error  Units
HammingDistanceBenchmark.xorBitCount    1024  1000000  thrpt    5  115.430 ± 3.086  ops/s

@ChrisHegarty ChrisHegarty requested a review from jimczi July 5, 2024 17:05
@ChrisHegarty ChrisHegarty merged commit 3304b60 into apache:main Jul 8, 2024
3 checks passed
ChrisHegarty added a commit that referenced this pull request Jul 8, 2024
This commit improves the performance of VectorUtil::xorBitCount on ARM by ~4x.

This change is effectively a workaround for the lack of vectorization of Long::bitCount on ARM.

On x64 there is no issue, the long variant of xorBitCount outperforms the int variant by ~15%.
@ChrisHegarty ChrisHegarty deleted the xorBitCount branch July 8, 2024 16:33
@uschindler
Copy link
Contributor

Hi,
in the backport to 9.x the benchmark file was wrongly merged. It landed in the test directory. In 9.x we have no benchmark-jmh module in Gradle, so the file should have been left out while cherry-picking.
Can you remove the file in a followup commit?

@uschindler
Copy link
Contributor

* For xorBitCount we stride over the values as either 64-bits (long) or 32-bits (int) at a time.
* On ARM Long::bitCount is not vectorized, and therefore produces less than optimal code, when
* compared to Integer::bitCount. While Long::bitCount is optimal on x64. TODO: include the
* OpenJDK JIRA url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the JIRA issue number already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uschindler
Copy link
Contributor

I reverted the addition of the file to 9.x branch: 86d080a

@ChrisHegarty
Copy link
Contributor Author

@uschindler Apologies, I didn't notice this when cherrypicking. Thanks for reverting (while I was sleeping ;-) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants