Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub-RFC for increased availability of NUMA API #1545

115 changes: 115 additions & 0 deletions rfcs/proposed/numa_support/increase-numa-support-availability.org
akukanov marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# -*- fill-column: 80; -*-

#+title: Improve predictability of API for NUMA support

*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535.
Specifically, its section about "Increased availability of NUMA support".

* Introduction
oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded
by the library as part of its initialization stage. In turn, each ~tbbbind~ has
a hard dependency, i.e., relies on load-time linking, on a concrete version of
akukanov marked this conversation as resolved.
Show resolved Hide resolved
the HWLOC library [1, 2]. The soft dependency of oneTBB on ~tbbbind~ allows the
library to continue its execution even if the system loader is unable to resolve
the hard dependency on HWLOC for ~tbbbind~. In this case, the HW topology is not
discovered and the machine is seen as if all CPU cores were uniform, which is
the default TBB behavior when NUMA constraints are not used. Thus, the following
code returns meaningless values as these values are just ignored by oneTBB:
akukanov marked this conversation as resolved.
Show resolved Hide resolved

#+begin_src C++
std::vector<oneapi::tbb::numa_node_id> numa_nodes = oneapi::tbb::info::numa_nodes();
std::vector<oneapi::tbb::core_type_id> core_types = oneapi::tbb::info::core_types();
#+end_src

An error is also not reported and the client code that uses NUMA support
facilities may continue running expecting it to work as it was intended. Such
behavior is not readily noticable by developers that use oneTBB and this
represents the main problem with the current behavior.
akukanov marked this conversation as resolved.
Show resolved Hide resolved

Having a dependency on a shared HWLOC library has a number of advantages:
1. Code reuse with all of the positive consequences out of this. That's the
primary purpose of shared libraries.
2. Sharing HWLOC context between its clients. This avoids performing the same
operations repeatedly with identical results.
3. A drop-in replacement. Users are able to use their own version of HWLOC
without recompilation of oneTBB.
akukanov marked this conversation as resolved.
Show resolved Hide resolved

The only disadvantage from depending on HWLOC library dynamically is that the
developers that use oneTBB's NUMA support API need to make sure the library is
available and can be found by oneTBB. Depending on the distribution model of a
developer's code, this is achieved either by:
1. Asking the end user to have necessary version of a dependency pre-installed.
2. Bundling necessary HWLOC version together with other pieces of a product
release.

However, the requirement to fulfill one of the above steps for the NUMA API to
start paying off may be considered as an incovenience and, what is more
important, it is not always obvious that one of these steps is needed.
Especially, due to silent behavior in case HWLOC library cannot be found in the
environment.

This proposal suggests an improvement to reduce the effect of the disadvantage
being dependent on a dynamic version of HWLOC library by having it linked
statically with one of the ~tbbbind~ libraries that are distributed together
with oneTBB, yet leaving possibility to specify another version of HWLOC library
if users see the need.

[1] [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]]

[2] [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]]

* Proposal
1. Introduce new variant of the ~tbbbind~ library with the name ~tbbbind_static~
akukanov marked this conversation as resolved.
Show resolved Hide resolved
which is statically-linked with HWLOC library and distributed along side with
the other ~tbbbind~ variants.
akukanov marked this conversation as resolved.
Show resolved Hide resolved
2. Add loading of ~tbbbind_static~ as the last attempt to resolve the dependency
on functionality provided by ~tbbbind~ layer.
3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to
include steps determining the variant of ~tbbbind~ being used.

** Advantages
The proposed behavior allows having a mechanism for resolving a dependency on
HWLOC library in case it cannot be found in the environment, while still
preferring user-provided version of HWLOC.

As a result, the problematic use of oneTBB API mentioned above should work as
expected, returning enumerated list of actual NUMA nodes and core types on the
system the code is running on, provided that the loaded HWLOC library works on
that system and that an application properly distributes all binaries of oneTBB,
sets the environment so that the necessary variant of ~tbbbind~ library can be
found and loaded.

** Disadvantages
1. There will be one more variation of a ~tbbbind~ binary to ship in oneTBB
distribution packages.
2. Still silent by default behavior in case user failed to setup environment
with their own version of HWLOC library correctly. Although, specifying
~TBB_VERSION=1~ envar will help identifying an issue with an environment
setup pretty quickly.
akukanov marked this conversation as resolved.
Show resolved Hide resolved
3. Statically-linked HWLOC does not share its context with those loaded
dynamically in case of ~tbbbind_static~ library is used.

* Alternative handling of inability to parse system topology
The other behavior in case HWLOC library cannot be found is to be more explicit
about the problem of a missing component and to either issue a warning or to
refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw
an exception).

Comparing these alternative approaches to the one proposed.
** Common Advantages
- Explicitly tells that the functionality being used is not going to work
instead of just being silent.
- Does not require additional variant of ~tbbbind~ library to be distributed
along with the others.

** Common Disadvantages
- Requires additional step from the user side to resolve the problem. In other
words, it does not provide complete solution to the problem.

** Disadvantages of Issuing a Warning
- The warning may still not be visible, especially if standard streams are
closed.

** Disadvantages of Throwing an Exception
- May break existing code as it does not expect an exception to be thrown.
- Requires introduction of an additional exception hierarchy.
Loading