Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-temporal memcpy support for ARM #361

Merged
merged 8 commits into from
Nov 8, 2024
Merged

Add non-temporal memcpy support for ARM #361

merged 8 commits into from
Nov 8, 2024

Conversation

bmerry
Copy link
Contributor

@bmerry bmerry commented Nov 7, 2024

It uses SVE, so it'll run on Neoverse CPUs like Grace, but doesn't support any Apple silicon. We're not using Apple for performance anyway (just to enable Apple users to develop) so at this stage I'm not proposing to implement alternative paths for non-SVE capable machines.

Update to use BOOST_TEST where possible instead of BOOST_CHECK_EQUAL or
BOOST_CHECK. BOOST_TEST is more "modern" but doesn't cope with
expressions that aren't be formatted with ostream, so it's not applied
everywhere.

The main motivation for this is that BOOST_CHECK_EQUAL leads to useless
clang error messages when something goes wrong (it never reports the
assertion that is the problem). This has been showing up with
-Wsign-compare, and in some places the RHS of comparisons has been made
explicitly unsigned to address these errors.
Previously -Dauto_features was used, but with the addition of SVE, no
platform can have all features enabled. Instead, pass flags to
force-enable features that we want to ensure are enabled on a
per-platform basis. These are abstracted into a helper script to avoid
repeating them for each usage. This script also takes care of -Dwerror
and --native.
The implementation is very specific to x86, and naming it appropriately
will make it easier to write an ARM version.
It's not actually used yet.
This is probably not optimal, but will provide something to test with.
It hurts performance and I'm confident it isn't going to be needed in
any practical situation. Instead, update the documentation to indicate
the corner case where the user might need to update code.
- Align the source address
- Unroll the loop
This code path was only compiled if SVE is enabled and the ifunc
attribute is not. Normally !ifunc is tested by the MacOS builds, but SVE
is disabled in that case.
Copy link
Contributor

@james-smith-za james-smith-za left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any value to add here.

Sounds like a sensible idea to me.

@bmerry bmerry merged commit ad1e8c4 into master Nov 8, 2024
70 of 72 checks passed
@bmerry bmerry deleted the arm-sve-memcpy-nt branch November 8, 2024 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants