diff --git a/docs/source/ProgrammingGuide/Atomic-Operations.md b/docs/source/ProgrammingGuide/Atomic-Operations.md index 6929d4217..ebfa22a97 100644 --- a/docs/source/ProgrammingGuide/Atomic-Operations.md +++ b/docs/source/ProgrammingGuide/Atomic-Operations.md @@ -82,7 +82,7 @@ void compute_force(View neighbours, View values) { } ``` -There are also atomic operations which return the old or the new value. They follow the [`atomic_fetch_[op]`](../API/core/atomics/atomic_fetch_op) and [`atomic_[op]_fetch`](../API/core/atomics/atomic_op_fetch.md) naming scheme. For example if one would want to find all the indices of negative values in an array and store them in a list this would be the algorithm: +There are also atomic operations which return the old or the new value. They follow the [`atomic_fetch_[op]`](../API/core/atomics/atomic_fetch_op) and [`atomic_[op]_fetch`](../API/core/atomics/atomic_op_fetch) naming scheme. For example if one would want to find all the indices of negative values in an array and store them in a list this would be the algorithm: ```c++ void find_indicies(View indicies, View values) { View count("Count"); diff --git a/docs/source/ProgrammingGuide/HierarchicalParallelism.md b/docs/source/ProgrammingGuide/HierarchicalParallelism.md index 60ddc8bb1..8607ed35d 100644 --- a/docs/source/ProgrammingGuide/HierarchicalParallelism.md +++ b/docs/source/ProgrammingGuide/HierarchicalParallelism.md @@ -263,7 +263,7 @@ The third pattern is [`parallel_scan()`](../API/core/parallel-dispatch/parallel_ #### 8.4.1.1 Team Barriers -In instances where one loop operation might need to be sequenced with a different loop operation, such as filling of arrays as a preparation stage for following computations on that data, it is important to be able to control threads in time; this can be done through the use of barriers. In nested loops, the outside loop ( [`TeamPolicy<> ()`](../API/core/policies/TeamPolicy) ) has a built-in (implicit) team barrier; inner loops ( [`TeamThreadRange ()`](../API/core/policies/TeamThreadRange.md) ) do not. This latter condition is often referred to as a 'non-blocking' condition. When necessary, an explicit barrier can be introduced to synchronize team threads; an example is shown in the previous example. +In instances where one loop operation might need to be sequenced with a different loop operation, such as filling of arrays as a preparation stage for following computations on that data, it is important to be able to control threads in time; this can be done through the use of barriers. In nested loops, the outside loop ( [`TeamPolicy<> ()`](../API/core/policies/TeamPolicy) ) has a built-in (implicit) team barrier; inner loops ( [`TeamThreadRange ()`](../API/core/policies/TeamThreadRange) ) do not. This latter condition is often referred to as a 'non-blocking' condition. When necessary, an explicit barrier can be introduced to synchronize team threads; an example is shown in the previous example. ### 8.4.2 Vector loops diff --git a/docs/source/usecases/MDRangePolicy.md b/docs/source/usecases/MDRangePolicy.md index 6293baa21..743ebe539 100644 --- a/docs/source/usecases/MDRangePolicy.md +++ b/docs/source/usecases/MDRangePolicy.md @@ -88,7 +88,7 @@ Kokkos::parallel_for("for_all_cells", If the number of cells is large enough to merit parallelization, that is the overhead for parallel dispatch plus computation time is less than total serial execution time, then the simple implementation above will result in improved performance. -There is more parallelism to exploit, particularly within the for loops over fields `F` and points `P`. One way to accomplish this would involve taking the product of the three iteration ranges, `C*F*P`, and performing a [`parallel_for`](../API/core/parallel-dispatch/parallel_for) over that product. However, this would require extraction routines to map between indices from the flattened iteration range, `C*F*P`, and the multidimensional indices required by data structures in this example. In addition, to achieve performance portability the mapping between the 1-D product iteration range and multidimensional 3-D indices would require architecture-awareness, akin to the notion of [`LayoutLeft`](../API/core/view/layoutLeft.md) and [`LayoutRight`](../API/core/view/layoutRight) used in Kokkos to establish data access patterns. +There is more parallelism to exploit, particularly within the for loops over fields `F` and points `P`. One way to accomplish this would involve taking the product of the three iteration ranges, `C*F*P`, and performing a [`parallel_for`](../API/core/parallel-dispatch/parallel_for) over that product. However, this would require extraction routines to map between indices from the flattened iteration range, `C*F*P`, and the multidimensional indices required by data structures in this example. In addition, to achieve performance portability the mapping between the 1-D product iteration range and multidimensional 3-D indices would require architecture-awareness, akin to the notion of [`LayoutLeft`](../API/core/view/layoutLeft) and [`LayoutRight`](../API/core/view/layoutRight) used in Kokkos to establish data access patterns. The [`MDRangePolicy`](../API/core/policies/MDRangePolicy) provides a natural way to accomplish the goal of parallelize over all three iteration ranges without requiring manually computing the product of the iteration ranges and mapping between 1-D and 3-D multidimensional indices. The [`MDRangePolicy`](../API/core/policies/MDRangePolicy) is suitable for use with tightly-nested for loops and provides a method to expose additional parallelism in computations beyond simply parallelize in a single dimension, as was shown in the first implementation using the [`RangePolicy`](../API/core/policies/RangePolicy).