Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🛫 Optimize vec_transform again #103

Merged
merged 1 commit into from
Feb 23, 2025
Merged

Conversation

Korijn
Copy link
Contributor

@Korijn Korijn commented Feb 23, 2025

Follow-up to #102

I like optimizing too much I think. 🤪

After comparing many different implementations, this ultimately turned out to be the fastest. Even though theoretically speaking it a 3x3 matrix multiplication should be faster than a full 4x4 matrix multiplication, in practice it is not - but only for batches. I suspect there are hidden optimizations within Numpy (BLAS etc). 🤡

@panxinmiao With projection=False, vec_transform is now always faster than apply_matrix (both in single vector as in batch case).

# vector batch=1, timeit number=100000
vec_transform (projection=True) 0.402374500001315
vec_transform (projection=False) 0.25878190000366885
apply_matrix 0.28508210000291

# vector batch=100000, timeit number=10
vec_transform (batch, projection=True) 0.03246779998880811
vec_transform (batch, projection=False) 0.02070159999129828
apply_matrix (batch) 0.022843300001113676

Benchmarked with the script below:

import numpy as np
import pylinalg as la
import timeit


def vec_transform(
    vectors, matrix, /, *, w=1, projection=True, out=None, dtype=None
) -> np.ndarray:
    matrix = np.asarray(matrix, dtype='f8')

    if projection:
        vectors = la.vec_homogeneous(vectors, w=w, dtype='f8')
        if vectors.ndim == 1:
            vectors = matrix @ vectors
            vectors[:-1] /= vectors[-1]
            vectors = vectors[:-1]
        elif vectors.ndim == 2:
            vectors = (matrix @ vectors.T).T
            vectors = vectors[..., :-1] / vectors[..., -1, None]
        else:
            raise ValueError("vectors must be a 1D or 2D array")
    else:
        if vectors.ndim == 1:
            vectors = matrix[:-1, :-1] @ vectors + matrix[:-1, -1]
        elif vectors.ndim == 2:
            vectors = la.vec_homogeneous(vectors, w=w, dtype='f8')
            vectors = (matrix @ vectors.T).T
            vectors = vectors[..., :-1]
        else:
            raise ValueError("vectors must be a 1D or 2D array")

    if out is None:
        out = vectors
        if dtype is not None:
            out = out.astype(dtype, copy=False)
    else:
        out[:] = vectors

    return out


def apply_matrix(v, m):
    v = la.vec_homogeneous(v, dtype='f8')
    vv = m @ v.T
    return vv.T[..., :-1]


v = np.array([1, 2, 3])
m = la.mat_compose(np.array([1, 2, 3]), la.quat_from_euler((1, 2, 3)), np.array([1, 2, 1]))

print(vec_transform(v, m))
print(vec_transform(v, m, projection=False))
print(apply_matrix(v, m))

v_batch = np.stack([v] * 100000)

print(vec_transform(v_batch, m)[:5])
print(vec_transform(v_batch, m, projection=False)[:5])
print(apply_matrix(v_batch, m)[:5])

print("vec_transform (projection=True)", timeit.timeit(lambda: vec_transform(v, m), number=100000))
print("vec_transform (projection=False)", timeit.timeit(lambda: vec_transform(v, m, projection=False), number=100000))
print("apply_matrix", timeit.timeit(lambda: apply_matrix(v, m), number=100000))
print("vec_transform (batch, projection=True)", timeit.timeit(lambda: vec_transform(v_batch, m), number=10))
print("vec_transform (batch, projection=False)", timeit.timeit(lambda: vec_transform(v_batch, m, projection=False), number=10))
print("apply_matrix (batch)", timeit.timeit(lambda: apply_matrix(v_batch, m), number=10)

@panxinmiao
Copy link
Contributor

Great, ultimate performance optimization :)

@Korijn Korijn merged commit e436128 into main Feb 23, 2025
10 checks passed
@Korijn Korijn deleted the optimize-vec-transform-more branch February 23, 2025 09:55
@@ -105,17 +105,28 @@ def vec_transform(
transformed vectors
"""

matrix = np.asarray(matrix)
matrix = np.asarray(matrix, dtype="f8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: is there a reason for using "f8" instead of float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! They have the same effect, but "f8" is a local string literal and float needs to be looked up from globals. It improves performance (significantly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants