Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce safe SIMD API, port some simple vectorized operations to use it #604

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Feb 25, 2025

This PR is the first step in addressing #549, which will significantly reduce the amount of unsafe code overall in rten.

It introduces a new API for the rten-simd portable SIMD library in rten_simd::safe, which allows defining SIMD operations using safe code, and ports some of the simpler vectorized operations in rten-vecmath (Sum, SumSquare, SumSquareSub, MinMax, Normalize) to use it.

The basic idea behind the API design is to separate SIMD data types (which can always be constructed) and operations (which require hardware support), and enforce that the types implementing the operations can only be constructed if the ISA is supported. This is similar to pulp, which appears to be a Rust-ified version of Highway. One important difference is that the SIMD traits are designed in a way which makes it easier to reuse code for vectorized operations on different data types. To reduce the need for unsafe operations to load and store SIMD vectors, iterators and functional utilities (map, fold etc.) on slices are provided, which handle the load/store internally.

Design notes

Unified versus separate types for data and operations

An earlier version of this PR used a different design where operations were implemented as methods of the SIMD data types, similar to std::simd. This has several advantages compared to what is implemented here:

  • Operations like add, sub, mul etc. can be implemented as implementations of std::ops traits, so you can use expressions like (a * b + c)
  • Operations are methods on the SIMD vector so they can be chained. Chaining operations is less ergonomic with the design here (ops.mul(x, ops.splat(2)) versus ops.splat(2).mul(x) or ops.splat(2) * x).

With such a design any operation that constructs a SIMD vector needs to somehow enforce that the instruction set is supported. This requires wrapping the built-in SIMD types (__m256 etc.) with a newtype and then enforcing that constructing instances of the newtype requires some kind of proof that the ISA is supported. I did this earlier by moving all constructing operations to a separate trait. This made the overall set of traits more complex however.

Support for scalable vectors

One of the reasons that Highway separates operations and data is because they want to support scalable vectors with a size that is unknown at compile time, and such compiler builtin types cannot be wrapped in classes in C++ in order to implement the operations as methods. In Rust it isn't completely settled yet what restrictions SVE types will have, but it seems that the newtype pattern is expected to work. If so, then both designs would work with all vector types.

TODO:

  • Review and revise documentation
  • Compare performance vs main on Arm
  • Compare performance vs main on x64

@robertknight robertknight force-pushed the safe-simd-api branch 6 times, most recently from 2e5ea49 to 267c56b Compare March 1, 2025 06:01
@robertknight robertknight marked this pull request as ready for review March 1, 2025 06:08
Initial commit of a new API for rten-simd which allows defining vectorized
operations without unsafe code, or much less unsafe code.

This includes:

 - The core traits needed for defining vectorized operations

 - Implementations for Arm Neon, wasm32, AVX2 and generic

 - Tools for applying vectorized operations to slices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant