Skip to content

Commit

Permalink
Merge pull request #435 from antoinemeyer5/fix-last-warnings
Browse files Browse the repository at this point in the history
#433 : Fix last warnings to build warnings-free
  • Loading branch information
fnrizzi authored Jul 6, 2023
2 parents fd433b3 + eba6554 commit 04f40b3
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/ProgrammingGuide/Atomic-Operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ void compute_force(View<int**> neighbours, View<double*> values) {
}
```

There are also atomic operations which return the old or the new value. They follow the [`atomic_fetch_[op]`](../API/core/atomics/atomic_fetch_op) and [`atomic_[op]_fetch`](../API/core/atomics/atomic_op_fetch.md) naming scheme. For example if one would want to find all the indices of negative values in an array and store them in a list this would be the algorithm:
There are also atomic operations which return the old or the new value. They follow the [`atomic_fetch_[op]`](../API/core/atomics/atomic_fetch_op) and [`atomic_[op]_fetch`](../API/core/atomics/atomic_op_fetch) naming scheme. For example if one would want to find all the indices of negative values in an array and store them in a list this would be the algorithm:
```c++
void find_indicies(View<int*> indicies, View<double*> values) {
View<int> count("Count");
Expand Down
2 changes: 1 addition & 1 deletion docs/source/ProgrammingGuide/HierarchicalParallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The third pattern is [`parallel_scan()`](../API/core/parallel-dispatch/parallel_
#### 8.4.1.1 Team Barriers
In instances where one loop operation might need to be sequenced with a different loop operation, such as filling of arrays as a preparation stage for following computations on that data, it is important to be able to control threads in time; this can be done through the use of barriers. In nested loops, the outside loop ( [`TeamPolicy<> ()`](../API/core/policies/TeamPolicy) ) has a built-in (implicit) team barrier; inner loops ( [`TeamThreadRange ()`](../API/core/policies/TeamThreadRange.md) ) do not. This latter condition is often referred to as a 'non-blocking' condition. When necessary, an explicit barrier can be introduced to synchronize team threads; an example is shown in the previous example.
In instances where one loop operation might need to be sequenced with a different loop operation, such as filling of arrays as a preparation stage for following computations on that data, it is important to be able to control threads in time; this can be done through the use of barriers. In nested loops, the outside loop ( [`TeamPolicy<> ()`](../API/core/policies/TeamPolicy) ) has a built-in (implicit) team barrier; inner loops ( [`TeamThreadRange ()`](../API/core/policies/TeamThreadRange) ) do not. This latter condition is often referred to as a 'non-blocking' condition. When necessary, an explicit barrier can be introduced to synchronize team threads; an example is shown in the previous example.
### 8.4.2 Vector loops
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usecases/MDRangePolicy.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Kokkos::parallel_for("for_all_cells",
If the number of cells is large enough to merit parallelization, that is the overhead for parallel dispatch plus computation time is less than total serial execution time, then the simple implementation above will result in improved performance.
There is more parallelism to exploit, particularly within the for loops over fields `F` and points `P`. One way to accomplish this would involve taking the product of the three iteration ranges, `C*F*P`, and performing a [`parallel_for`](../API/core/parallel-dispatch/parallel_for) over that product. However, this would require extraction routines to map between indices from the flattened iteration range, `C*F*P`, and the multidimensional indices required by data structures in this example. In addition, to achieve performance portability the mapping between the 1-D product iteration range and multidimensional 3-D indices would require architecture-awareness, akin to the notion of [`LayoutLeft`](../API/core/view/layoutLeft.md) and [`LayoutRight`](../API/core/view/layoutRight) used in Kokkos to establish data access patterns.
There is more parallelism to exploit, particularly within the for loops over fields `F` and points `P`. One way to accomplish this would involve taking the product of the three iteration ranges, `C*F*P`, and performing a [`parallel_for`](../API/core/parallel-dispatch/parallel_for) over that product. However, this would require extraction routines to map between indices from the flattened iteration range, `C*F*P`, and the multidimensional indices required by data structures in this example. In addition, to achieve performance portability the mapping between the 1-D product iteration range and multidimensional 3-D indices would require architecture-awareness, akin to the notion of [`LayoutLeft`](../API/core/view/layoutLeft) and [`LayoutRight`](../API/core/view/layoutRight) used in Kokkos to establish data access patterns.
The [`MDRangePolicy`](../API/core/policies/MDRangePolicy) provides a natural way to accomplish the goal of parallelize over all three iteration ranges without requiring manually computing the product of the iteration ranges and mapping between 1-D and 3-D multidimensional indices. The [`MDRangePolicy`](../API/core/policies/MDRangePolicy) is suitable for use with tightly-nested for loops and provides a method to expose additional parallelism in computations beyond simply parallelize in a single dimension, as was shown in the first implementation using the [`RangePolicy`](../API/core/policies/RangePolicy).
Expand Down

0 comments on commit 04f40b3

Please sign in to comment.