Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added language standard parallelism to dot product #251

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

amklinv-nnl
Copy link
Contributor

@mhoemmen @crtrott Please review at your convenience and I'll do the same for the other functions.

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! : - ) Please see comments; thanks!

@@ -13,7 +13,7 @@
// Make mdspan less verbose
using std::experimental::mdspan;
using std::experimental::extents;
using std::experimental::dynamic_extent;
using std::dynamic_extent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change suggests that CI doesn't actually build the examples. Should we consider fixing that?

This change might actually need to depend on the C++ version, as std::dynamic_extent entered the Standard in C++20 with span. We'll have to revisit the examples anyway, because of the recent change to use macros to specify the namespaces. (The macros let users control them, so that they can but don't need to go into std.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI builds the examples; I just wasn't building them on my PC.

I was unaware of the macros change. Can you give me an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDSPAN_IMPL_STANDARD_NAMESPACE and MDSPAN_IMPL_PROPOSED_NAMESPACE are the two macros. They can be defined by users, but they also get default definitions, e.g., here in include/experimental/mdspan. The library assumes that MDSPAN_IMPL_PROPOSED_NAMESPACE is nested inside MDSPAN_IMPL_STANDARD_NAMESPACE.

Using these macros might require updating the version of the reference mdspan implementation that github's CI pulls in.

using std::experimental::submdspan;
using std::experimental::full_extent;
using std::full_extent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see above comment on std::full_extent; thanks!

@@ -43,6 +43,7 @@
#ifndef LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_
#define LINALG_INCLUDE_EXPERIMENTAL___P1673_BITS_BLAS1_DOT_HPP_

#include <ranges>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<ranges> is a C++20 include. Would you consider protecting both the include and the use of iota_view below with the appropriate feature test macro, and providing a fall-back implementation? It's OK if the fall-back is not parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Can you point me to an example of using a feature test macro?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! This web page gives a good summary.

// Ensure that the feature test macro __cpp_lib_ranges is available;
// <version> also defines this macro, but that is a C++20 header.
#include <algorithm>

#if defined(__cpp_lib_ranges)
#  include <ranges>
#endif

void some_function() {
#if defined(__cpp_lib_ranges_iota)
  // ... code using views::iota ...
#else
  // ... fall-back code ...
#endif
}

The point of using two different macros -- one for the header, and one for the specific feature iota::view -- is that the feature came after the header, so some compiler versions may have the header but not the feature.

include/experimental/__p1673_bits/blas1_dot.hpp Outdated Show resolved Hide resolved
init += v1(k) * v2(k);
}
return init;
using scalar_type = std::common_type_t<ElementType1, ElementType2, Scalar>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary the right type. For example, if operator* returns a higher-precision type than its inputs (as might make sense for custom integer or fixed-point real types, for example), then what you want here is the common type of Scalar and the result of operator*. However, it turns out that you don't need scalar_type here; please see the comment below.

@@ -12,11 +12,12 @@ inline namespace __p1673_version_0 {
namespace linalg {
namespace impl {
// the execution policy used for default serial inline implementations
struct inline_exec_t {};
using inline_exec_t = std::execution::sequenced_policy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see note above about the "inline" executor.

Any Standard Algorithm overload that takes an ExecutionPolicy template parameter is a "parallel algorithm" (see [algorithms.parallel] 2). That changes what the algorithm assumes about the behavior of element access functions.

Therefore, it's really important that the "inline" executor not use any parallel algorithm, even if ExecutionPolicy is sequenced_policy.


// The execution policy used when no execution policy is provided
// It must be remapped to some other execution policy, which the default mapper does
struct default_exec_t {};
using default_exec_t = std::execution::parallel_policy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// It must be remapped to some other execution policy, which the default mapper does -- it's important that we not circumvent the default mapper. Would you consider, thus, reverting this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the execpolicy_mapper did not like this default. I can restore it and show you the error it gave me.

@@ -87,7 +87,7 @@ TEST_F(blas2_signed_float_fixture, kokkos_matrix_frob_norm_trivial_empty)
{
std::vector<value_type> v;

constexpr auto de = std::experimental::dynamic_extent;
constexpr auto de = std::dynamic_extent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CI not actually test this test? (Please see comment above about std::dynamic_extent.) If so, then should we consider using the namespace macros instead of std explicitly?

Comment on lines +78 to +83
#ifdef LINALG_ENABLE_TBB
const auto dotResultPar = dot (std::execution::par, x, y);
#else
const auto dotResultPar = dot (std::execution::seq, x, y);
#endif // LINALG_ENABLE_TBB
static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef LINALG_ENABLE_TBB
const auto dotResultPar = dot (std::execution::par, x, y);
#else
const auto dotResultPar = dot (std::execution::seq, x, y);
#endif // LINALG_ENABLE_TBB
static_assert( std::is_same_v<std::remove_const_t<decltype(dotResultPar)>, scalar_t> );
#ifdef LINALG_ENABLE_TBB
auto dotResultPar = dot (std::execution::par, x, y);
#else
auto dotResultPar = dot (std::execution::seq, x, y);
#endif // LINALG_ENABLE_TBB
static_assert( std::is_same_v<decltype(dotResultPar), scalar_t> );

@@ -130,6 +130,15 @@ if(LINALG_ENABLE_KOKKOS)
find_package(KokkosKernels REQUIRED)
endif()

find_package(TBB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More recent versions of GCC (for instance) shouldn't require TBB for std::execution::par to work. Furthermore, nvc++ comes with its own, non-TBB implementation of std::execution::par. Would you consider gating this on compiler version, instead of requiring a third-party library? That could even be done in the code -- the code would just need to test the appropriate compiler version to ensure that the C++ algorithms are available (see https://en.cppreference.com/w/cpp/compiler_support/17 and search for "Parallel algorithms and execution policies").

amklinv-nnl and others added 3 commits May 31, 2023 19:06
plus<void> deduces argument and return types, letting the two input types differ.

Co-authored-by: Mark Hoemmen <[email protected]>
@@ -31,11 +31,11 @@ int main(int argc, char* argv[]) {
mdspan<double, extents<std::size_t, dynamic_extent>> y(y_vec.data(),N);
for(int i=0; i<A.extent(0); i++)
for(int j=0; j<A.extent(1); j++)
A(i,j) = 100.0*i+j;
A[i,j] = 100.0*i+j;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only compile if the C++23 feature "multidimensional subscript operator" (P2128R6) is available. As a result, would you consider one of the following changes?

  1. Protect the example so it only compiles if the feature test macro __cpp_multidimensional_subscript is defined, OR
  2. Add #define MDSPAN_USE_PAREN_OPERATOR 1 to the example before including any mdspan headers (see top of https://godbolt.org/z/Yrr8oe9sE for an example of the relevant macros), and use parentheses instead of brackets (e.g., A(i,j))

@@ -132,7 +132,7 @@ void add_rank_2(
using size_type = std::common_type_t<SizeType_x, SizeType_y, SizeType_z>;
for (size_type j = 0; j < x.extent(1); ++j) {
for (size_type i = 0; i < x.extent(0); ++i) {
z(i,j) = x(i,j) + y(i,j);
z[i,j] = x[i,j] + y[i,j];
Copy link
Contributor

@mhoemmen mhoemmen Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation is a C++17 back-port, so we unfortunately have to roll with whatever operator (parentheses or brackets) the user has available. If you really do want to change the implementation, then we might have to go with something like the following.

Suggested change
z[i,j] = x[i,j] + y[i,j];
#if (MDSPAN_USE_PAREN_OPERATOR > 0)
z(i,j) = x(i,j) + y(i,j);
#else
z[i,j] = x[i,j] + y[i,j];
#endif

or at least

Suggested change
z[i,j] = x[i,j] + y[i,j];
#if defined(__cpp_multidimensional_subscript)
z[i,j] = x[i,j] + y[i,j];
#else
z(i,j) = x(i,j) + y(i,j);
#endif

It might be best just to leave the implementation alone; we might want to come up with a better way to do this.

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend, at least for now, not changing the implementation to use the multidimensional subscript operator. Doing this would hinder back-porting to C++17, which is an important requirement of the reference implementation.

@amklinv-nnl amklinv-nnl marked this pull request as draft June 12, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants