You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The performance of means which are not on the contiguous axis in xtensor appear to be slower than optimal. I have provided benchmarks below using a more optimized approach. It uses memory coalescing to improve performance and cache hits by performing the mean in "groups" along the reduction axis rather than striding through memory. Would there be a way to implement this in xtensor to get the factor of 2 speed up?
The performance of means which are not on the contiguous axis in xtensor appear to be slower than optimal. I have provided benchmarks below using a more optimized approach. It uses memory coalescing to improve performance and cache hits by performing the mean in "groups" along the reduction axis rather than striding through memory. Would there be a way to implement this in xtensor to get the factor of 2 speed up?
See reference implementation here: https://github.com/spectre-ns/xtensor-benchmark/blob/bb2404641cfd632c459d4e91c3881ebd601b2a62/include/reduction.hpp#L14
The text was updated successfully, but these errors were encountered: