Improve VectorUtil::xorBitCount perf on ARM #13545

ChrisHegarty · 2024-07-05T17:05:16Z

This commit improves the performance of VectorUtil::xorBitCount on ARM by ~4x.

This change is effectively a workaround for the lack of vectorization of Long::bitCount on ARM, see https://github.com/ChrisHegarty/hammingBench/. I'll get an issue filed against Hotspot for this. ( JDK bug tracking this issue: https://bugs.openjdk.org/browse/JDK-8336000 )

On x64 there is no issue, the long variant of xorBitCount outperforms the int variant by ~15%.

Before (measures throughput in seconds, so bigger numbers are better)

Benchmark                             (dims)     (nb)   Mode  Cnt   Score   Error  Units
HammingDistanceBenchmark.xorBitCount    1024  1000000  thrpt    5  29.128 ± 5.697  ops/s

After

Benchmark                             (dims)     (nb)   Mode  Cnt   Score   Error  Units
HammingDistanceBenchmark.xorBitCount    1024  1000000  thrpt    5  115.430 ± 3.086  ops/s

This commit improves the performance of VectorUtil::xorBitCount on ARM by ~4x. This change is effectively a workaround for the lack of vectorization of Long::bitCount on ARM. On x64 there is no issue, the long variant of xorBitCount outperforms the int variant by ~15%.

uschindler · 2024-07-08T18:02:10Z

Hi,
in the backport to 9.x the benchmark file was wrongly merged. It landed in the test directory. In 9.x we have no benchmark-jmh module in Gradle, so the file should have been left out while cherry-picking.
Can you remove the file in a followup commit?

uschindler · 2024-07-08T18:02:34Z

See: c8b4a76#diff-dd8d7417893f9b2fecaef29491b94d5daeaae6d496c4b21bb9633b4f7b060e59

uschindler · 2024-07-08T18:05:49Z

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

+   * For xorBitCount we stride over the values as either 64-bits (long) or 32-bits (int) at a time.
+   * On ARM Long::bitCount is not vectorized, and therefore produces less than optimal code, when
+   * compared to Integer::bitCount. While Long::bitCount is optimal on x64. TODO: include the
+   * OpenJDK JIRA url


Do you have the JIRA issue number already?

Oops. I just added it in 4baaeda

https://bugs.openjdk.org/browse/JDK-8336000

…nch (#13545)

uschindler · 2024-07-08T22:52:29Z

I reverted the addition of the file to 9.x branch: 86d080a

ChrisHegarty · 2024-07-09T08:20:09Z

@uschindler Apologies, I didn't notice this when cherrypicking. Thanks for reverting (while I was sleeping ;-) )

ChrisHegarty added 2 commits July 5, 2024 17:45

Stride as int rather than long in xorBitCount

e7582f0

cleanup benchmark

fc34acd

ChrisHegarty requested a review from jimczi July 5, 2024 17:05

jimczi approved these changes Jul 8, 2024

View reviewed changes

Merge branch 'main' into xorBitCount

5277061

ChrisHegarty merged commit 3304b60 into apache:main Jul 8, 2024
3 checks passed

ChrisHegarty deleted the xorBitCount branch July 8, 2024 16:33

ChrisHegarty mentioned this pull request Jul 8, 2024

Optimize xorBitCount on ARM, ~4x faster elastic/elasticsearch#110599

Merged

uschindler reviewed Jul 8, 2024

View reviewed changes

asfgit pushed a commit that referenced this pull request Jul 8, 2024

Revert accidentally added file during cherry-pick/merge from main bra…

86d080a

…nch (#13545)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve VectorUtil::xorBitCount perf on ARM #13545

Improve VectorUtil::xorBitCount perf on ARM #13545

ChrisHegarty commented Jul 5, 2024 •

edited

Loading

uschindler commented Jul 8, 2024

uschindler commented Jul 8, 2024

uschindler Jul 8, 2024

ChrisHegarty Jul 9, 2024

uschindler commented Jul 8, 2024

ChrisHegarty commented Jul 9, 2024

Improve VectorUtil::xorBitCount perf on ARM #13545

Improve VectorUtil::xorBitCount perf on ARM #13545

Conversation

ChrisHegarty commented Jul 5, 2024 • edited Loading

uschindler commented Jul 8, 2024

uschindler commented Jul 8, 2024

uschindler Jul 8, 2024

Choose a reason for hiding this comment

ChrisHegarty Jul 9, 2024

Choose a reason for hiding this comment

uschindler commented Jul 8, 2024

ChrisHegarty commented Jul 9, 2024

ChrisHegarty commented Jul 5, 2024 •

edited

Loading