[libc++] Take advantage of trivial relocation in std::vector::erase #116268

ldionne · 2024-11-14T18:28:59Z

In vector::erase(iter) and vector::erase(iter, iter), we can take
advantage of a type being trivially relocatable to open up a gap
in the vector and then relocate the tail of the vector into that gap.

The benefit is that relocating an object is often more efficient than
move-assigning and then destroying the original object. For types that
can be relocated trivially but that are complicated enough for the
compiler not to optimize by itself (like std::string), this provides
around a 2x performance speedup in vector::erase (see below).

This optimization requires stopping the usage of Clang's __is_trivially_relocatable
builtin, which doesn't currently honour assignment operators like
is_trivially_copyable does and can lead us to perform incorrect
optimizations.

It is also worth noting that __uninitialized_allocator_relocate has to
be modified so that we can relocate into an overlapping range. This
has an unfortunate impact on its exception safety guarantees, which
needs to be investigated further.

Previous implementation

--------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations
--------------------------------------------------------------------------------------
BM_erase_iter_in_middle/vector_int/1024           24.9 ns         24.9 ns     28042962
BM_erase_iter_in_middle/vector_int/4096            107 ns          107 ns      6590592
BM_erase_iter_in_middle/vector_int/10240           271 ns          265 ns      2733478
BM_erase_iter_in_middle/vector_string/1024         349 ns          349 ns      2005886
BM_erase_iter_in_middle/vector_string/4096        1410 ns         1406 ns       498355
BM_erase_iter_in_middle/vector_string/10240       3449 ns         3449 ns       201989
BM_erase_iter_at_start/vector_int/1024            47.1 ns         47.1 ns     14836261
BM_erase_iter_at_start/vector_int/4096             204 ns          204 ns      3430414
BM_erase_iter_at_start/vector_int/10240            504 ns          504 ns      1391373
BM_erase_iter_at_start/vector_string/1024          684 ns          684 ns      1025160
BM_erase_iter_at_start/vector_string/4096         2855 ns         2806 ns       254080
BM_erase_iter_at_start/vector_string/10240        7060 ns         7060 ns        94134

New implementation

--------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations
--------------------------------------------------------------------------------------
BM_erase_iter_in_middle/vector_int/1024           26.0 ns         25.9 ns     27127367
BM_erase_iter_in_middle/vector_int/4096            105 ns          105 ns      6515204
BM_erase_iter_in_middle/vector_int/10240           259 ns          258 ns      2800795
BM_erase_iter_in_middle/vector_string/1024         148 ns          147 ns      4725706
BM_erase_iter_in_middle/vector_string/4096         608 ns          606 ns      1168205
BM_erase_iter_in_middle/vector_string/10240       1523 ns         1520 ns       459909
BM_erase_iter_at_start/vector_int/1024            47.1 ns         47.1 ns     14762513
BM_erase_iter_at_start/vector_int/4096             205 ns          205 ns      3403130
BM_erase_iter_at_start/vector_int/10240            507 ns          507 ns      1382716
BM_erase_iter_at_start/vector_string/1024          300 ns          300 ns      2327546
BM_erase_iter_at_start/vector_string/4096         1205 ns         1205 ns       580855
BM_erase_iter_at_start/vector_string/10240        4296 ns         4296 ns       162956

In particular, test everything with both a normal and a min_allocator, add tests for a few corner cases and add tests with types that are trivially relocatable. Also add tests that count the number of assignments performed by vector::erase, since that is mandated by the Standard. This patch is a preparation for optimizing vector::erase.

In vector::erase(iter) and vector::erase(iter, iter), we can take advantage of a type being trivially relocatable to open up a gap in the vector and then relocate the tail of the vector into that gap. The benefit is that relocating an object is often more efficient than move-assigning and then destroying the original object. For types that can be relocated trivially but that are complicated enough for the compiler not to optimize by itself (like std::string), this provides around a 2x performance speedup in vector::erase (see below). This optimization requires stopping the usage of Clang's __is_trivially_relocatable builtin, which doesn't currently honour assignment operators like is_trivially_copyable does and can lead us to perform incorrect optimizations. It is also worth noting that __uninitialized_allocator_relocate has to be modified so that we can relocate into an overlapping range. This has an unfortunate impact on its exception safety guarantees, which needs to be investigated further. Previous implementation -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_erase_iter_in_middle/vector_int/1024 24.9 ns 24.9 ns 28042962 BM_erase_iter_in_middle/vector_int/4096 107 ns 107 ns 6590592 BM_erase_iter_in_middle/vector_int/10240 271 ns 265 ns 2733478 BM_erase_iter_in_middle/vector_string/1024 349 ns 349 ns 2005886 BM_erase_iter_in_middle/vector_string/4096 1410 ns 1406 ns 498355 BM_erase_iter_in_middle/vector_string/10240 3449 ns 3449 ns 201989 BM_erase_iter_at_start/vector_int/1024 47.1 ns 47.1 ns 14836261 BM_erase_iter_at_start/vector_int/4096 204 ns 204 ns 3430414 BM_erase_iter_at_start/vector_int/10240 504 ns 504 ns 1391373 BM_erase_iter_at_start/vector_string/1024 684 ns 684 ns 1025160 BM_erase_iter_at_start/vector_string/4096 2855 ns 2806 ns 254080 BM_erase_iter_at_start/vector_string/10240 7060 ns 7060 ns 94134 New implementation -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_erase_iter_in_middle/vector_int/1024 26.0 ns 25.9 ns 27127367 BM_erase_iter_in_middle/vector_int/4096 105 ns 105 ns 6515204 BM_erase_iter_in_middle/vector_int/10240 259 ns 258 ns 2800795 BM_erase_iter_in_middle/vector_string/1024 148 ns 147 ns 4725706 BM_erase_iter_in_middle/vector_string/4096 608 ns 606 ns 1168205 BM_erase_iter_in_middle/vector_string/10240 1523 ns 1520 ns 459909 BM_erase_iter_at_start/vector_int/1024 47.1 ns 47.1 ns 14762513 BM_erase_iter_at_start/vector_int/4096 205 ns 205 ns 3403130 BM_erase_iter_at_start/vector_int/10240 507 ns 507 ns 1382716 BM_erase_iter_at_start/vector_string/1024 300 ns 300 ns 2327546 BM_erase_iter_at_start/vector_string/4096 1205 ns 1205 ns 580855 BM_erase_iter_at_start/vector_string/10240 4296 ns 4296 ns 162956

ldionne · 2024-11-14T18:29:47Z

Note that this is stacked on top of #116265.

ldionne · 2024-11-14T18:31:32Z

libcxx/include/__type_traits/is_trivially_relocatable.h

-#else
+//
+// Note that we don't use Clang's __is_trivially_relocatable builtin because it doesn't honor the presence
+// of non-trivial special members like assignment operators, or even a copy constructor, making it possible


In particular, this change is required or else we fail the test I added in erase_iter.pass.cpp which counts the number of assignments. Because of this, and since the semantics of what it means to be trivially relocatable are still being actively debated, I think it makes sense for us to err on the safer side and use is_trivially_copyable for now.

TrackedAssignment has a trivial destructor and a trivial copy constructor, so it's trivially relocatable by destroying it and copy-constructing a new instance. Is the library code not permitted to do that? If not, would it make sense to have separate internal traits for "trivially assignment-relocatable" versus "trivially construction-relocatable" that only detect the case where you could relocate specifically using assignment versus specifically using construction?

Sorry for the delay, the last days have been busy. I believe what you are suggesting is essentially what http://wg21.link/P2687R9 introduced with the replaceable concept -- or were you aware of that concept but you had something else in mind?

Since I wrote this patch, my understanding has evolved and so did P2687, so I created this draft PR to discuss with the P2687 authors. That PR implements the currently proposed semantics for P2687 (although with a IMO better library API).

ldionne added 2 commits November 14, 2024 19:07

ldionne added the performance label Nov 14, 2024

ldionne commented Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc++] Take advantage of trivial relocation in std::vector::erase #116268

[libc++] Take advantage of trivial relocation in std::vector::erase #116268

ldionne commented Nov 14, 2024

ldionne commented Nov 14, 2024

ldionne Nov 14, 2024

zygoloid Nov 14, 2024

ldionne Nov 19, 2024

[libc++] Take advantage of trivial relocation in std::vector::erase #116268

Are you sure you want to change the base?

[libc++] Take advantage of trivial relocation in std::vector::erase #116268

Conversation

ldionne commented Nov 14, 2024

ldionne commented Nov 14, 2024

ldionne Nov 14, 2024

Choose a reason for hiding this comment

zygoloid Nov 14, 2024

Choose a reason for hiding this comment

ldionne Nov 19, 2024

Choose a reason for hiding this comment