Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::sort(par_unseq ,...) leaks memory when called repeatedly #1533

Closed
oschonrock opened this issue Oct 23, 2024 · 4 comments
Closed

std::sort(par_unseq ,...) leaks memory when called repeatedly #1533

oschonrock opened this issue Oct 23, 2024 · 4 comments
Labels

Comments

@oschonrock
Copy link

oschonrock commented Oct 23, 2024

Summary

When std::sort(std::execution::par_unseq,...) is called repeatedly it leaks memory. Ultimately it will exhaust the machine's memory and the kernel will kill the process.

Single threaded sort does not leak.

sample code with precise instructions to reproduce is here:
https://github.com/oschonrock/tbbleak

tested on Ubuntu 24.04: exact details at repo above
discovered using libtbb-dev:amd64 2021.11.0-2ubuntu2 package from ubuntu repos

also confirmed by compiling the head of https://github.com/oneapi-src/oneTBB, ie this commit
42b833f

compiled with gcc-13 and clang-18 (both from ubuntu 24.04)

Observed Behavior

std::sort(par_unseq,.... ) leaks as shown by output from above demo code

Expected Behavior

std::sort(par_unseq,.... ) does not leak

Maybe Related

A very similar issue was reported here in June 2024:
https://community.intel.com/t5/Intel-oneAPI-Threading-Building/std-sort-std-execution-par-unseq-has-a-memory-leak-on-Linux/m-p/1582773

and it was solved here:

uxlfoundation/oneDPL#1589

but on the oneDPL codebase.

Valgrind

Running:

/build.sh && valgrind ./build/gcc/relwithdebinfo/sort_leak  # see tbbleak repo above

shows some leaks. If the call to std::sort(par_unseq,..) is commented out, the leaks go away

ASAN

When Address sanitizer is run as follows:

./build.sh debug && ./build/gcc/debug/sort_leak  # see other repo above

we get some runtime errors complaining about misaligned addresses
(these are not reported when std::sort(par_unseq,...) is commented out)

/usr/include/c++/13/pstl/parallel_backend_tbb.h:515:7: runtime error: member access within misaligned address 0x76af91a090e0 for type 'struct __task', which requires 64 byte alignment
0x76af91a090e0: note: pointer points here
 00 00 00 00  f8 98 bd 34 b3 59 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00

... 

/usr/include/c++/13/pstl/parallel_backend_tbb.h:675:7: runtime error: reference binding to misaligned address 0x76af91a090e0 for type 'struct __as_base ', which requires 64 byte alignment
0x76af91a090e0: note: pointer points here
 00 00 00 00  28 99 bd 34 b3 59 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
@oschonrock oschonrock added the bug label Oct 23, 2024
@kboyarinov
Copy link
Contributor

@oschonrock thanks for reporting that.
As you mentioned, this issue was already fixed in the oneDPL codebase and it was caused by incorrect usage of oneTBB as part of a backend in oneDPL. We cannot fix it in oneTBB since this code does not belong to oneTBB.

If you are using standard parallel algorithms from oneDPL library, consider simply updating the oneDPL to the latest available version. If you are using parallel algorithms directly from the libstdc++ or libc++ implementations, it would be complicated to receive a fix since it was not yet upstreamed. You can either apply the patch from uxlfoundation/oneDPL#1589 to your local STL files or submit an issue directly to the STL vendor to speed up the upstreaming from oneDPL to STL.

@oschonrock
Copy link
Author

Thanks for you quick response. I have to be honest. From a user perspective it is extremely confusing who owns what code here, and how they interact. oneTBB, oneDPL. libstdc++ / gcc etc

I am using libstdc++ , ie the standard library that is distributed with gcc. And simply calling:

std::sort(std::execution::par-unseq,...

This has never worked without installing an external library from Intel called "TBB" and linking against it. I believe gcc's libstdc++ STL implementation just relies on it. The mechanism of installing "TBB" has changed over the years. It used to require some separate download direct from Intel, but these days it is as simple as

sudo apt install libtbb-dev

And that is how we have run this for years.

It was not clear to me where the patch from oneDPL fits. From what you are saying it is in the "interface glue code" between libstd++ and TBB, but under control of libstdc++?

I went hunting through my current installation(s) of libstdc++, indeed found the parallel_backend_tbb.h and manually and very carefully applied that patch from oneDPL (https://github.com/oneapi-src/oneDPL/pull/1589/files) - the patch won't apply automatically due to different assert macros etc:

This is the resulting diff

--- /usr/include/c++/13/pstl/parallel_backend_tbb.h.orig 2024-10-23 15:56:29.035725777 +0000
+++ /usr/include/c++/13/pstl/parallel_backend_tbb.h     2024-10-23 16:12:13.932914691 +0000
@@ -511,7 +511,7 @@
     friend class __func_task<_Func>;
 };
 
-#else  // TBB_INTERFACE_VERSION <= 12000
+#else  // TBB_INTERFACE_VERSION > 12000
 class __task : public tbb::detail::d1::task
 {
   protected:
@@ -646,10 +646,15 @@
 
         _PSTL_ASSERT(__parent != nullptr);
         _PSTL_ASSERT(__parent->_M_refcount.load(std::memory_order_relaxed) > 0);
-        if (--__parent->_M_refcount == 0)
+        auto __refcount = --__parent->_M_refcount;
+
+        // Placing the deallocation after the refcount decrement allows another thread to proceed with tree
+        // folding concurrently with this task cleanup.
+        __alloc.deallocate(this, *__ed);
+
+        if (__refcount == 0)
         {
             _PSTL_ASSERT(__next == nullptr);
-            __alloc.deallocate(this, *__ed);
             return __parent;
         }
 

I reran my tests on this repo: https://github.com/oschonrock/tbbleak and it does indeed solve the problem.

It also works with the TBB version in the ubuntu repos ie 2021.11.0-2ubuntu2

The above patch does apply automatically now. And it also applies to

/usr/include/c++/14/pstl/parallel_backend_tbb.h

which I believe is the latest available from ubuntu repos.

The libstdc++ mirror repo seems to show that this remains unpatched in the current master of libstd++

https://github.com/gcc-mirror/gcc/blob/4b0f238855f8fa79acf7cca84b523ca8513bf68d/libstdc%2B%2B-v3/include/pstl/parallel_backend_tbb.h#L659

I will figure out how to file a bug on that, or confirm one is filed and try to get this corrected.

Many thanks for your help.

@oschonrock
Copy link
Author

bug submitted on libstdc++

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117276

@kboyarinov
Copy link
Contributor

Thank you once again for working on resolving this issue. I am closing this for now. Feel free to reopen the issue or open a separate one in case of any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants