[multibody] Experiment using M frame for Inverse Dynamics, for timings/sanity check #22253

sherm1 · 2024-12-03T02:03:21Z

WIP, not intended to merge, don't review

This branch will be an experiment to integrate Alejandro's M-frame inverse dynamics prototype into Drake to see what speedups we can get in real life switching from W to M.

This change is

sherm1 · 2024-12-05T21:49:41Z

Preliminary performance results for M-frame Inverse Dynamics. TL/DR: switching from W to M frame resulted in a 25% speedup for ID. Combined with previous changes ID is 2X faster than when we started. Details:

	This PR ID-M	Master ID-W	Before speedup work
time μs	11.6	15.2	21.4
speedup	24%	26%	--
total	46%	26%	--

Cassie benchmark times on my old Puget Xeon [email protected], g++ 11.4
(ID-W is the World frame version in master, ID-M is the new M-frame method)

I'm still studying this to see where we can squeeze out more speed.
cc @amcastro-tri

sherm1 · 2024-12-06T01:31:56Z

Since we're trying to match Pinocchio timings (we think about 2μs for Cassie-size ID), there are some further considerations to make an apples-to-apples comparison. We've been including position & velocity kinematics in ID timings. Possibly Pinocchio is leaving kinematics fixed and just measuring the ID time alone. Also, the above timings were with gcc 11.4 which does a poor optimizing job compared to clang 14.0.0. And, the Pinocchio timings were presumably run on a faster machine than my 7yo Puget. Let's see how those factors affect things. TL/DR: this gets us to 2μs. And we're within 2X even with kinematics included.

	P+V+ID-W	P+V+ID-M	Just ID-M
time μs	4.95	3.74	2.14

Timed on my laptop: Xeon W-11855M CPU @ 3.20GHz, using clang 14.0.0
(ID-W is the World frame version in master, ID-M is the new M-frame method)

sherm1 · 2025-01-13T23:44:58Z

Minor update: After profiling, I've been experimenting with SIMD implementations for operations that stand out:

Symmetric 3x3 matrix times 3-vector
Cross product wXr
Double cross produce wXwXr
Re-express spatial vector

Although all of these can be done with only a few packed floating point operations, only the last one was better than optimized C++ (according to llvm-mca in Godbolt). That's because of the many instructions required to fill and reorder the 4-element ymm registers prior to executing the packed fp. For short functions the loss of inlining is also likely a problem though I couldn't analyze that in Godbolt.

Cassiebench timing with the re-express spatial vector SIMD only provided a 2% speedup overall so it's not worth the extra complexity. My conclusion is that we can only get real SIMD speedups with more substantial operations. I'm not seeing good candidates in kinematics and ID, but will revisit this when I get to forward dynamics.

Interestingly (to me anyway) the compilers (g++ 11.4, clang 14) managed to do a little 2-wide SIMD when working with 3-vectors, using a double wide xmm operation followed by a scalar operation. This required much less shuffling so the overall performance was better than I could get after the contortions required to pack the 4-wide registers. This suggests to me that it will be futile to attempt to exploit the 8-wide zmm SIMD instructions in AVX512 for small data structures -- they will certainly be useful for large operations though.

Moving on now ...

sherm1 added status: do not merge status: do not review type: performance labels Dec 3, 2024

sherm1 force-pushed the better_inverse_dynamics branch 3 times, most recently from a730a7d to 4eab60a Compare December 5, 2024 01:04

sherm1 mentioned this pull request Dec 9, 2024

[multibody] Some speedups for inertia * vector #22287

Merged

sherm1 force-pushed the better_inverse_dynamics branch 2 times, most recently from a1754dd to 5adfeef Compare December 16, 2024 20:02

sherm1 force-pushed the better_inverse_dynamics branch 2 times, most recently from fe6adf8 to 18faf6a Compare December 20, 2024 01:23

sherm1 force-pushed the better_inverse_dynamics branch 4 times, most recently from 8df8522 to 0cda315 Compare January 13, 2025 22:26

sherm1 force-pushed the better_inverse_dynamics branch 2 times, most recently from 2b82aa2 to e7c3d87 Compare January 15, 2025 00:51

sherm1 added 2 commits January 15, 2025 15:39

Inverse Dynamics 2 (in M) proto + cassiebench test

a59758d

Don't put NaN inertias in tests.

012cc0f

sherm1 force-pushed the better_inverse_dynamics branch from e7c3d87 to 012cc0f Compare January 15, 2025 23:48

Working on M frame methods for curvilinear mobilizer.

fcbae08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multibody] Experiment using M frame for Inverse Dynamics, for timings/sanity check #22253

[multibody] Experiment using M frame for Inverse Dynamics, for timings/sanity check #22253

sherm1 commented Dec 3, 2024 •

edited

Loading

sherm1 commented Dec 5, 2024 •

edited

Loading

sherm1 commented Dec 6, 2024

sherm1 commented Jan 13, 2025

[multibody] Experiment using M frame for Inverse Dynamics, for timings/sanity check #22253

Are you sure you want to change the base?

[multibody] Experiment using M frame for Inverse Dynamics, for timings/sanity check #22253

Conversation

sherm1 commented Dec 3, 2024 • edited Loading

sherm1 commented Dec 5, 2024 • edited Loading

sherm1 commented Dec 6, 2024

sherm1 commented Jan 13, 2025

sherm1 commented Dec 3, 2024 •

edited

Loading

sherm1 commented Dec 5, 2024 •

edited

Loading