You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have strided version of a subset of _gr_vec functions, allowing efficient application to columns and diagonals of matrices, reversed vectors, odd-even subvectors, etc. For example one could have an assignment function like this:
_gr_vec_set_strided(res, x, xstride, y, ystride, len, ctx)
Potentially, this could replace a number of specialized vector functions, such as those mixing vectors and scalars. For example, instead of
_gr_vec_add_scalar(res, x, len, y, ctx)
one could do
_gr_vec_add_strided(res, x, 1, y, 0, len, ctx)
Likewise for _gr_vec_dot / _gr_vec_dot_rev. Note that a _gr_vec_dot_strided could be used for basecase matrix multiplication without the need for a temporary transpose (though the temporary transpose is better for cache efficiency with large matrices anyway). Likewise for other accumulating operations like sum, max. Currently arb_dot, acb_dot and _fmpz_vec_dot_general have this functionality.
For non-accumulating (entrywise) operations, we mostly care about this for machine types and some near-machine types like fmpz where the overhead of calling elementwise operations in a loop can be noticeable.
Unfortunately, strided functions may not be able to replace non-strided ones entirely for such types: there are more parameters to pass around making function calls slightly slower, and some loops will run slower with a runtime offset instead of a fixed compile-time offset like 1 or -1. One can have branches for special strides, but then the branches also have an O(1) penalty.
The text was updated successfully, but these errors were encountered:
It would be useful to have strided version of a subset of
_gr_vec
functions, allowing efficient application to columns and diagonals of matrices, reversed vectors, odd-even subvectors, etc. For example one could have an assignment function like this:Potentially, this could replace a number of specialized vector functions, such as those mixing vectors and scalars. For example, instead of
one could do
Likewise for
_gr_vec_dot
/_gr_vec_dot_rev
. Note that a_gr_vec_dot_strided
could be used for basecase matrix multiplication without the need for a temporary transpose (though the temporary transpose is better for cache efficiency with large matrices anyway). Likewise for other accumulating operations likesum
,max
. Currentlyarb_dot
,acb_dot
and_fmpz_vec_dot_general
have this functionality.For non-accumulating (entrywise) operations, we mostly care about this for machine types and some near-machine types like
fmpz
where the overhead of calling elementwise operations in a loop can be noticeable.Unfortunately, strided functions may not be able to replace non-strided ones entirely for such types: there are more parameters to pass around making function calls slightly slower, and some loops will run slower with a runtime offset instead of a fixed compile-time offset like 1 or -1. One can have branches for special strides, but then the branches also have an O(1) penalty.
The text was updated successfully, but these errors were encountered: