-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
next #19
next #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to path filters (2)
mkn.yaml
is excluded by:!**/*.yaml
res/mkn/hipcc.yaml
is excluded by:!**/*.yaml
Files selected for processing (9)
- inc/mkn/gpu.hpp (1 hunks)
- inc/mkn/gpu/alloc.hpp (1 hunks)
- inc/mkn/gpu/cuda.hpp (2 hunks)
- inc/mkn/gpu/def.hpp (1 hunks)
- inc/mkn/gpu/defines.hpp (1 hunks)
- inc/mkn/gpu/device.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (3 hunks)
- test/any/add.cpp (1 hunks)
- test/any/managed.cpp (1 hunks)
Additional comments: 15
inc/mkn/gpu/def.hpp (1)
- 5-10: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [6-17]
The conditional compilation logic to extend
is_floating_point_v
for non-CPU architectures by including_Float16
is a thoughtful addition. It aligns with the PR's objectives of enhancing compatibility across different GPU architectures. However, ensure that_Float16
is supported in all targeted environments to avoid potential compilation issues.inc/mkn/gpu/defines.hpp (2)
- 8-40: The use of
__has_include
for conditionally definingMKN_GPU_ROCM
andMKN_GPU_CUDA
and the subsequent logic to handle these definitions is well-implemented. It's a clean approach to adapt the codebase to different GPU environments. Consider adding comments to explain the logic behind these macros for future maintainability.- 44-48: The
CompileFlags
struct is a neat addition for encapsulating the compile-time GPU environment flags. This can be very useful for runtime checks and decisions based on the compilation environment. Good job on this design choice.test/any/managed.cpp (1)
- 24-24: Adding an assertion to check if a GPU pointer is a managed pointer before launching the kernel is a good practice for ensuring memory management correctness. This aligns with the PR's objectives of enhancing robustness. Make sure to thoroughly test this change to confirm it doesn't inadvertently restrict valid use cases.
inc/mkn/gpu.hpp (2)
- 34-34: Replacing conditional compilation directives with the direct inclusion of
defines.hpp
is a clean approach to centralize the logic for determining the GPU environment. This change enhances code readability and maintainability.- 39-50: Simplifying the
idx()
function with conditional checks based on macros for different GPU types is a good improvement. It makes the function more readable and adaptable to different environments. Consider enhancing the error message for unknown GPU environments to guide developers more effectively.test/any/add.cpp (3)
- 34-35: Adding assertions to check if a pointer is a host pointer enhances the robustness of memory management validation. This is a good practice for catching potential issues early.
- 38-39: Similarly, adding an assertion to check if a pointer is a device pointer before operations is a prudent measure for ensuring memory management correctness.
- 43-43: Commenting out the assertion related to pointer data might indicate an unresolved issue or a change in testing strategy. Please clarify the reason for this change to ensure it aligns with the intended testing objectives.
inc/mkn/gpu/alloc.hpp (1)
- 75-86: The modifications in the
copy
function, including the direct initialization ofsrc_p
anddst_p
and the correction in the argument order for thetake
function, improve the clarity and correctness of the memory management operations. These changes are commendable for enhancing the code's readability and functionality.inc/mkn/gpu/cuda.hpp (2)
- 42-42: Reordering the include statements to place
#include <cuda_runtime.h>
below local includes is a subtle change that might have implications for compilation order or symbol resolution. It's generally a good practice to include system or library headers after local headers to avoid potential conflicts.- 56-67: Adding the
idx()
template function within themkn::gpu::cuda
namespace is a valuable enhancement that simplifies the calculation of thread indices in CUDA kernels. This function is likely to be widely used and appreciated for its utility. Consider adding comments to explain its usage and the rationale behind its implementation.inc/mkn/gpu/rocm.hpp (3)
- 40-40: Reordering of include statements is generally a good practice for clarity and maintainability, ensuring that related headers are grouped together. This change appears to be aligned with the PR's objectives of simplifying and organizing the codebase.
- 55-68: The addition of the
mkn::gpu::hip
namespace and theidx()
template function is a significant enhancement. It provides a structured way to calculate thread indices in HIP kernels, which is crucial for GPU programming. This addition aligns with the PR's goal of improving device interaction mechanisms. However, it's essential to ensure that the logic for calculating indices (x + (y * width) + (z * width * height)
) correctly matches the intended use cases and doesn't introduce any off-by-one errors or other logical issues.- 95-116: The modifications in the
Pointer
struct, including additional checks and assignment oftype
, enhance robustness by ensuring that the pointer attributes are correctly initialized and by providing methods to query the type of memory (e.g., host, device, managed). These changes are in line with the PR's objectives of refining memory management practices. However, it's crucial to verify that theassert(t)
statement in the constructor (line 96) aligns with the project's error handling strategy, as assertions may be disabled in release builds, potentially leading to unchecked null pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/build.yml (1 hunks)
Additional comments: 4
.github/workflows/build.yml (4)
- 10-10: Renaming the
build
job tocpu
clarifies the job's purpose, aligning with the addition ofcuda
androcm
jobs for specific GPU environments.- 13-13: Updating the
actions/checkout
version to v3 is a good practice, ensuring the latest features and fixes are utilized.- 20-30: The addition of a
cuda
job with a specific NVIDIA CUDA container is a significant improvement for testing in CUDA environments. However, ensure that the CUDA version (12.3.0
) and Ubuntu version (22.04
) specified in the container tag are compatible with the project's requirements and the ROCm 6.0.2 environment.- 31-40: Similarly, the addition of a
rocm
job with a ROCm container is crucial for testing in ROCm environments. It's important to verify that therocm/rocm-terminal:latest
container is compatible with ROCm 6.0.2 and meets the project's specific requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/build.yml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/build.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (4)
- .github/workflows/build.yml (1 hunks)
- inc/mkn/gpu/cpu.hpp (2 hunks)
- inc/mkn/gpu/cuda.hpp (5 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
Files skipped from review as they are similar to previous changes (3)
- .github/workflows/build.yml
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
Additional comments: 2
inc/mkn/gpu/cpu.hpp (2)
- 132-132: The
destroy
function is now declared asinline
. This is a good practice for small, frequently called functions that are defined in header files, as it can help reduce function call overhead and potentially improve performance by allowing the compiler to expand the function inline where it is called. However, ensure that this change does not lead to code bloat if the function is used extensively throughout the codebase.- 180-180: Similarly, declaring the
sync
function asinline
is appropriate given its trivial implementation. This change is consistent with the goal of optimizing performance and reducing overhead in GPU-related operations. As with thedestroy
function, it's important to monitor for any negative impacts such as increased binary size, but given the simplicity of this function, such concerns are likely minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (4)
- .github/workflows/build.yml (1 hunks)
- inc/mkn/gpu/cpu.hpp (2 hunks)
- inc/mkn/gpu/cuda.hpp (5 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
Files skipped from review as they are similar to previous changes (4)
- .github/workflows/build.yml
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
5c7b28d
to
dde2116
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (4)
- .github/workflows/build.yml (3 hunks)
- inc/mkn/gpu/cpu.hpp (2 hunks)
- inc/mkn/gpu/cuda.hpp (5 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
Additional comments: 3
.github/workflows/build.yml (3)
- 1-1: The workflow name has been updated to
ubuntu
, which is more specific than the previousubuntu-latest
. This change is clear and straightforward.- 11-11: Updating the
runs-on
configuration toubuntu-20.04
fromubuntu-latest
ensures more predictable and stable build environments by using a fixed version of Ubuntu. This is a good practice to avoid unexpected issues whenubuntu-latest
moves to a newer version.- 21-26: The addition of a job using
Jimver/[email protected]
for CUDA version12.3.2
is a significant enhancement. It ensures that the project is tested against a specific CUDA version, which is crucial for compatibility and stability. However, it's important to ensure that the version of CUDA chosen is compatible with the project's requirements and other dependencies. Additionally, the build command for CUDA is consistent with the project's standards, using-std=c++17
and other flags. This step enhances the CI pipeline by ensuring that CUDA builds are explicitly tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (4)
- .github/workflows/build.yml (2 hunks)
- inc/mkn/gpu/cpu.hpp (2 hunks)
- inc/mkn/gpu/cuda.hpp (5 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
Additional comments: 2
.github/workflows/build.yml (2)
- 11-11: The update from
ubuntu-latest
toubuntu-20.04
is a good practice for ensuring a stable and predictable build environment by using a fixed version of Ubuntu. This change helps avoid potential issues that could arise from unexpected changes in theubuntu-latest
runner.- 20-25: The addition of a job using
Jimver/[email protected]
for CUDA version12.3.2
is a significant improvement for ensuring compatibility with specific CUDA versions. This step allows for testing against a precise version of CUDA, enhancing the reliability of the build process for CUDA-dependent components. However, ensure that the version of the CUDA toolkit aligns with the project's requirements and that any potential compatibility issues with other dependencies are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (4)
- .github/workflows/build.yml (2 hunks)
- inc/mkn/gpu/cpu.hpp (2 hunks)
- inc/mkn/gpu/cuda.hpp (6 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
Additional comments: 1
.github/workflows/build.yml (1)
- 11-11: Updating the
runs-on
configuration toubuntu-20.04
is a good practice to ensure the workflow runs on a specific, stable version of Ubuntu, rather than relying on the moving target ofubuntu-latest
. This change enhances predictability and stability of the CI environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- .github/workflows/build.yml (2 hunks)
Additional comments: 2
.github/workflows/build.yml (2)
- 11-11: Updating the
runs-on
configuration toubuntu-20.04
is a good practice for ensuring consistency and reliability in the build environment. This change helps avoid potential issues that could arise from using theubuntu-latest
tag, which automatically updates to the latest version.- 20-25: The commented-out steps for building with CUDA configuration using
Jimver/[email protected]
and specifying CUDA version12.3.2
are prepared for future use. When uncommenting and implementing these steps, ensure that the CUDA toolkit action is from a trusted source and that the specified CUDA version aligns with the project's requirements. Additionally, consider adding error handling for the build steps to gracefully handle any failures.Verification successful
The corrected script successfully fetched the necessary information about the
Jimver/cuda-toolkit
GitHub action. The repository was created on March 10, 2021, and has been actively maintained, with the last update on February 10, 2024, and the most recent push on February 25, 2024. It has garnered 121 stars and 38 forks, indicating a decent level of community interest and engagement.Based on this information, the
Jimver/cuda-toolkit
action appears to be from a trusted source and is actively maintained, aligning with the review comment's advice to ensure the CUDA toolkit action is from a trusted source. The repository's activity and community interest further support its trustworthiness and relevance for future use in the project.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the trustworthiness and maintenance status of the Jimver/cuda-toolkit action gh repo view Jimver/cuda-toolkit --json owner,createdAt,updatedAt,pushedAt,stargazersCount,forksCountLength of output: 1223
Script:
#!/bin/bash # Corrected script to verify the trustworthiness and maintenance status of the Jimver/cuda-toolkit action gh repo view Jimver/cuda-toolkit --json owner,createdAt,updatedAt,pushedAt,stargazerCount,forkCountLength of output: 298
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 12
Configuration used: CodeRabbit UI
Files selected for processing (5)
- inc/mkn/gpu/cpu.hpp (4 hunks)
- inc/mkn/gpu/cuda.hpp (7 hunks)
- inc/mkn/gpu/launchers.hpp (2 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
- test/any/async_streaming.cpp (1 hunks)
Additional comments: 9
test/any/async_streaming.cpp (1)
- 42-43: The usage of the
stream
function withinStreamDeviceFunction
demonstrates a practical application of launching kernels on specific streams. This is a good example of how the new functionality can be utilized in real-world scenarios. However, ensure that the lambda capture[=, fn = fn]
correctly captures the necessary variables and does not lead to unintended copies or memory issues. Additionally, consider adding comments to explain the lambda's purpose and how it interacts with thestream
function.inc/mkn/gpu/cpu.hpp (2)
- 145-145: Changing the
destroy
function to be inline is a minor optimization that could improve performance in some cases. However, ensure that this change is consistent with the project's coding standards and performance goals. Additionally, consider whether other utility functions in this file and related files should also be inlined for consistency and performance optimization.- 193-193: The inline specification for the
sync
function is another minor optimization. Given that this function is a no-op in the CPU context, the impact on performance might be negligible. Still, it's good practice to maintain consistency in the treatment of utility functions across the project. Ensure that the inline specification aligns with the project's coding standards and performance optimization strategies.inc/mkn/gpu/cuda.hpp (3)
- 42-42: The reordering of includes and the addition of
cuda_runtime.h
are necessary changes to ensure that CUDA functionalities are accessible and correctly prioritized over other includes. This adjustment helps avoid potential conflicts and ensures that the CUDA API is available for subsequent code. It's good practice to keep system and third-party includes well-organized and separated from project-specific includes.- 149-149: Changing the
destroy
function to be inline is a minor optimization that aligns with the changes made in other parts of the project. This consistency in handling utility functions is beneficial. However, ensure that the CUDA API calls within these functions are correctly error-checked to prevent silent failures.- 206-207: The addition of
sync
functions with and without the stream parameter enhances the flexibility of synchronization operations in CUDA contexts. These functions are essential for ensuring that GPU operations are completed before proceeding with CPU operations. Ensure that these functions are used appropriately throughout the project to manage synchronization effectively.inc/mkn/gpu/rocm.hpp (3)
- 113-130: Modifications to the
Pointer
struct to include HIP pointer attributes and memory type checks are essential for managing memory in HIP contexts. These changes improve the robustness and flexibility of memory management. Ensure that the implementation correctly handles various memory types (host, device, managed) and that it's consistent with HIP's memory management paradigms.- 161-161: Changing the
destroy
function to be inline is consistent with similar changes in other parts of the project. This optimization might have a minor impact but contributes to consistency. Ensure that HIP API calls within these functions are correctly error-checked to prevent silent failures and resource leaks.- 217-218: The addition of
sync
functions with and without the stream parameter in the HIP context is a valuable enhancement, providing more control over synchronization operations. These functions are crucial for ensuring that GPU operations are completed before proceeding with CPU operations. Ensure that these functions are appropriately utilized throughout the project to manage synchronization effectively.
inc/mkn/gpu/cpu.hpp
Outdated
struct StreamEvent { | ||
StreamEvent(Stream&) {} | ||
~StreamEvent() {} | ||
|
||
auto& operator()() { return event; }; | ||
void record() { ; } | ||
bool finished() const { return true; } | ||
void reset() {} | ||
|
||
Stream stream; | ||
std::size_t event = 0; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of the StreamEvent
struct in the CPU context is an interesting addition, but it's essential to clarify its purpose and functionality, especially since the operations within the struct (e.g., record
, finished
, reset
) are no-ops. If this struct is meant to provide a uniform interface across CPU and GPU backends, consider adding documentation to explain its intended use and the rationale behind the no-op implementations.
namespace mkn::gpu::hip { | ||
|
||
template <typename SIZE = uint32_t /*max 4294967296*/> | ||
__device__ SIZE idx() { | ||
SIZE width = hipGridDim_x * hipBlockDim_x; | ||
SIZE height = hipGridDim_y * hipBlockDim_y; | ||
|
||
SIZE x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x; | ||
SIZE y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y; | ||
SIZE z = hipBlockDim_z * hipBlockIdx_z + hipThreadIdx_z; | ||
return x + (y * width) + (z * width * height); // max 4294967296 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of the idx
function within the mkn::gpu::hip
namespace is a critical utility for indexing in HIP kernels, similar to its CUDA counterpart. This function facilitates the calculation of thread indices in a straightforward manner. Ensure comprehensive testing across various grid and block configurations to validate its accuracy and reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to path filters (2)
mkn.yaml
is excluded by:!**/*.yaml
res/mkn/hipcc.yaml
is excluded by:!**/*.yaml
Files selected for processing (12)
- .github/workflows/build.yml (1 hunks)
- inc/mkn/gpu.hpp (1 hunks)
- inc/mkn/gpu/alloc.hpp (1 hunks)
- inc/mkn/gpu/cpu.hpp (4 hunks)
- inc/mkn/gpu/cuda.hpp (7 hunks)
- inc/mkn/gpu/def.hpp (1 hunks)
- inc/mkn/gpu/defines.hpp (1 hunks)
- inc/mkn/gpu/device.hpp (1 hunks)
- inc/mkn/gpu/launchers.hpp (2 hunks)
- inc/mkn/gpu/rocm.hpp (6 hunks)
- test/any/add.cpp (1 hunks)
- test/any/managed.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/launchers.hpp
Additional comments: 10
.github/workflows/build.yml (1)
- 13-13: The update of
actions/checkout
tov4
is a good practice for staying up-to-date with action versions. Ensure to verify the workflow execution post-merge to confirm there are no compatibility issues.inc/mkn/gpu/def.hpp (1)
- 5-10: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [6-17]
The simplification of conditional compilation for floating-point type checks is a good improvement. Consider adding comments explaining the inclusion of
_Float16
support under non-CPU platforms for future maintainability.inc/mkn/gpu/defines.hpp (1)
- 8-48: The dynamic definition of macros based on the availability of CUDA and ROCm headers, along with the introduction of the
CompileFlags
structure, significantly enhances the codebase's flexibility and maintainability. Consider adding documentation for theCompileFlags
structure to clarify its purpose and usage within the codebase.test/any/managed.cpp (1)
- 24-24: The addition of the assertion to check for managed GPU pointers before kernel launch is a good practice for ensuring memory safety. Consider adding a comment explaining the rationale behind this assertion to aid future maintainability.
inc/mkn/gpu.hpp (1)
- 34-50: The reorganization of conditional includes and the modification of the
idx()
function to use the newMKN_GPU_*
defines improve code clarity and consistency. Consider adding unit tests for theidx()
function to ensure its correct behavior across different GPU platforms.test/any/add.cpp (1)
- 34-43: The addition of assertions to check for host and device pointers before and after memory operations in the
test_add1
function is a good practice for ensuring memory safety and correctness. Consider adding comments explaining the purpose of these assertions to aid future maintainability.inc/mkn/gpu/alloc.hpp (1)
- 75-86: Swapping the order of arguments in the
take
function call and adding assertions before creatingPointer
instances correct potential issues and improve code robustness. Consider adding comments explaining the reason behind these changes to aid future maintainability.inc/mkn/gpu/rocm.hpp (1)
- 89-137: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [40-292]
The introduction of the
mkn::gpu::hip
namespace, modifications to thePointer
struct, addition of theStreamEvent
struct, and updates to memory type checks and synchronization functions significantly enhance the codebase's functionality and robustness when working with ROCm. Consider adding comprehensive tests to ensure the correct behavior of these new functionalities and modifications.inc/mkn/gpu/device.hpp (2)
- 103-104: The addition of
data()
member functions to theDeviceMem
struct is a good practice for providing controlled access to the internal data pointer. This allows for both mutable and immutable access, which can be very useful in various contexts.Ensure that proper memory management practices are followed throughout the
DeviceMem
struct to prevent issues such as memory leaks or dangling pointers, especially considering the manual memory management operations performed in the constructor and destructor.
- 100-108: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1-250]
Ensure consistency and proper memory management practices across all components related to device and host memory management in this file. Given the manual memory management operations performed in various places, it's crucial to carefully manage resources to prevent memory leaks, dangling pointers, and other related issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (1)
- test/any/async_streaming.cpp (1 hunks)
Additional Context Used
Additional comments not posted (1)
test/any/async_streaming.cpp (1)
63-70
: The constructor ofStreamLauncher
initializes member variables and sets up streams and events. However, the destructor only synchronizes streams without cleaning up or releasing any resources explicitly. Ensure that all allocated resources are properly managed and released to prevent memory leaks or resource exhaustion.Consider verifying if additional cleanup is required for the streams and events to ensure proper resource management.
#include <cassert> | ||
#include <chrono> | ||
#include <iostream> | ||
#include <thread> | ||
#include <algorithm> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider organizing the standard library includes in alphabetical order for better readability and maintainability.
#include <algorithm>
#include <cassert>
#include <chrono>
#include <iostream>
#include <thread>
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
#include <cassert> | |
#include <chrono> | |
#include <iostream> | |
#include <thread> | |
#include <algorithm> | |
#include <algorithm> | |
#include <cassert> | |
#include <chrono> | |
#include <iostream> | |
#include <thread> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to sort by line length first, and alphabetically second, which would look like
#include <chrono>
#include <thread>
#include <cassert>
#include <iostream>
#include <algorithm>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Learnings Added
User: PhilipDeegan
PR: mkn/mkn.gpu#19
File: test/any/async_streaming.cpp:2-6
Timestamp: 2024-05-18T14:17:54.861Z
Learning: @PhilipDeegan, thank you for sharing your preferred style for organizing includes. I'll remember this for future reviews in your codebase.
<!--
PhilipDeegan prefers to sort `#include` statements by line length first, and alphabetically second.
Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.
test/any/async_streaming.cpp
Outdated
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 3; }) // | ||
(); | ||
|
||
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing the first element of each vector in vecs
using std::cout
is useful for debugging but might clutter the output in production or automated testing environments. Consider using a logging framework or conditional compilation to control the verbosity of output.
- for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl;
+ // For debugging purposes, consider using a logging framework or conditional compilation
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; | |
// For debugging purposes, consider using a logging framework or conditional compilation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- inc/mkn/gpu.hpp (1 hunks)
- inc/mkn/gpu/cli.hpp (1 hunks)
- inc/mkn/gpu/cuda.hpp (9 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
Additional comments not posted (8)
inc/mkn/gpu.hpp (2)
34-35
: LGTM!
40-51
: Ensure comprehensive testing for theidx
function.The
idx
function determines the thread index based on the GPU type. Ensure that this function is thoroughly tested for various grid and block configurations to guarantee its correctness and robustness.Verification successful
Comprehensive testing for the
idx
function is confirmed.The
idx
function is thoroughly tested across various files, including different grid and block configurations.
test/hip/async.cpp
test/cuda/async.cpp
test/hip/add.cpp
test/cuda/atomic.cpp
test/any/async_streaming.cpp
test/any/add.cpp
test/any/managed.cpp
test/any/struct.cpp
test/any/array.cpp
test/cpu/namespace.cpp
test/any/class1.cpp
test/any/class2.cpp
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the `idx` function is thoroughly tested. # Test: Search for test cases that cover the `idx` function. Expect: Comprehensive coverage. rg --type cpp $'idx()'Length of output: 2763
Script:
#!/bin/bash # Description: Verify comprehensive testing of the `idx` function for various grid and block configurations. # Search for grid and block configuration patterns in the test files that call `idx()`. rg --type cpp -e 'dim3' -e '<<<' -e 'idx()'Length of output: 5507
inc/mkn/gpu/cuda.hpp (2)
Line range hint
42-52
: LGTM!
56-66
: Ensure comprehensive testing for theidx
function.The
idx
function determines the thread index in CUDA kernels. Ensure that this function is thoroughly tested for various grid and block configurations to guarantee its correctness and robustness.inc/mkn/gpu/rocm.hpp (4)
Line range hint
40-50
: LGTM!
54-65
: Ensure comprehensive testing for theidx
function.The
idx
function determines the thread index in HIP kernels. Ensure that this function is thoroughly tested for various grid and block configurations to guarantee its correctness and robustness.
95-111
: Enable the destructor code forStreamEvent
.The commented-out destructor code should be enabled to ensure proper resource management and avoid resource leaks.
226-241
: Document the implications of conditional synchronization in thelaunch
function.The update to the
launch
function to include conditional synchronization based on the_sync
template parameter is a significant improvement. Ensure that the implications of this change are well-documented, especially regarding how it affects kernel execution and synchronization.
inc/mkn/gpu/cli.hpp
Outdated
#ifndef _MKN_GPU_CLI_HPP_ | ||
#define _MKN_GPU_CLI_HPP_ | ||
|
||
#include <optional> | ||
#include <type_traits> | ||
|
||
#include "mkn/kul/env.hpp" | ||
|
||
namespace mkn::gpu { | ||
|
||
template <typename Device> | ||
struct Cli { | ||
// | ||
|
||
auto bx_threads() const { | ||
char const* ENV = "MKN_GPU_BX_THREADS"; | ||
if (mkn::kul::env::EXISTS(ENV)) { | ||
return as<std::int32_t>(mkn::kul::env::GET(ENV)); | ||
} | ||
return dev.maxThreadsPerBlock; | ||
} | ||
|
||
template <typename T> | ||
auto static as(std::string const& from) { | ||
T t; | ||
std::stringstream ss(from); | ||
ss >> t; | ||
return t; | ||
} | ||
|
||
Device const& dev; | ||
}; | ||
|
||
} /* namespace mkn::gpu */ | ||
|
||
#endif /*_MKN_GPU_CLI_HPP_*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling for environment variable retrieval.
The function bx_threads
retrieves an environment variable and converts it to an integer. Consider adding error handling for cases where the environment variable is not a valid integer.
auto bx_threads() const {
char const* ENV = "MKN_GPU_BX_THREADS";
if (mkn::kul::env::EXISTS(ENV)) {
try {
return as<std::int32_t>(mkn::kul::env::GET(ENV));
} catch (const std::invalid_argument& e) {
// Handle error: invalid integer
} catch (const std::out_of_range& e) {
// Handle error: out of range
}
}
return dev.maxThreadsPerBlock;
}
Caution Review failedThe pull request is closed. WalkthroughThe recent updates significantly enhance GPU management capabilities within the codebase. Key improvements include new device functions for indexing and memory management, streamlined event handling with the Changes
Sequence DiagramssequenceDiagram
participant User
participant StreamLauncher
participant StreamEvent
participant Callable
User->>StreamLauncher: Initiate parallel execution
StreamLauncher->>StreamEvent: Create and manage events
StreamLauncher->>Callable: Execute callable functions
Callable-->>StreamLauncher: Return results
StreamLauncher-->>User: Provide execution status/results
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (3)
test/any/async_streaming.cpp (3)
2-6
: Organize includes by line length first, and alphabetically second.Consider organizing the standard library includes by line length first, and alphabetically second for better readability and maintainability.
-#include <cassert> -#include <chrono> -#include <iostream> -#include <thread> -#include <algorithm> +#include <chrono> +#include <thread> +#include <cassert> +#include <iostream> +#include <algorithm>
27-28
: Declare destructor asvirtual
and useoverride
for overridden methods.It's good practice to declare the destructor of a base class with virtual methods as
virtual
to ensure proper cleanup in derived classes. Since C++11, it's recommended to useoverride
for overridden methods anddefault
for trivial destructors.- virtual ~StreamFunction() {} + virtual ~StreamFunction() = default; - virtual void run(std::uint32_t const){}; + virtual void run(std::uint32_t const) override {};
136-150
: Add assertions to verify the correctness of operations.To ensure the test validates the expected behavior, consider adding assertions after the streaming operations to check the state of
vecs
.+ for (std::size_t i = 0; i < vecs.size(); ++i) { + assert(vecs[i][0] == expected_value); // Replace `expected_value` with the actual expected value after operations + }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (16)
- .github/workflows/build.yml (1 hunks)
- inc/mkn/gpu.hpp (1 hunks)
- inc/mkn/gpu/alloc.hpp (1 hunks)
- inc/mkn/gpu/cli.hpp (1 hunks)
- inc/mkn/gpu/cpu.hpp (6 hunks)
- inc/mkn/gpu/cuda.hpp (9 hunks)
- inc/mkn/gpu/def.hpp (1 hunks)
- inc/mkn/gpu/defines.hpp (1 hunks)
- inc/mkn/gpu/device.hpp (1 hunks)
- inc/mkn/gpu/launchers.hpp (2 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
- mkn.yaml (2 hunks)
- res/mkn/hipcc.yaml (1 hunks)
- test/any/add.cpp (1 hunks)
- test/any/async_streaming.cpp (1 hunks)
- test/any/managed.cpp (1 hunks)
Files skipped from review due to trivial changes (4)
- .github/workflows/build.yml
- inc/mkn/gpu/def.hpp
- res/mkn/hipcc.yaml
- test/any/managed.cpp
Additional context used
Learnings (1)
test/any/async_streaming.cpp (1)
Learnt from: PhilipDeegan PR: mkn/mkn.gpu#19 File: test/any/async_streaming.cpp:2-6 Timestamp: 2024-05-18T14:17:54.861Z Learning: @PhilipDeegan, thank you for sharing your preferred style for organizing includes. I'll remember this for future reviews in your codebase. <!-- PhilipDeegan prefers to sort `#include` statements by line length first, and alphabetically second.
Additional comments not posted (29)
mkn.yaml (2)
13-13
: Verify the intention behind commenting out thearg
for therocm
profile.Commenting out the
arg
may affect the build configuration. Ensure this change is intentional.
19-19
: Verify the intention behind commenting out thearg
for thecuda
profile.Commenting out the
arg
may affect the build configuration. Ensure this change is intentional.inc/mkn/gpu/defines.hpp (1)
12-24
: LGTM! The conditional compilation directives and includes are correctly defined.The changes ensure that the appropriate headers are included based on the presence of ROCm and CUDA.
inc/mkn/gpu.hpp (1)
39-50
: LGTM! Theidx()
function is correctly defined based on the GPU configuration.The changes ensure that the correct
idx()
function is called based on the defined GPU configuration. The error directive for unknown configurations is a good addition.test/any/add.cpp (2)
34-34
: LGTM! The assertion for the host pointer is a good addition.The assertion ensures that the data pointer is a host pointer before proceeding with the test.
38-38
: LGTM! The assertion for the device pointer is a good addition.The assertion ensures that the data pointer is a device pointer before proceeding with the test.
inc/mkn/gpu/cli.hpp (2)
46-51
: Ensure proper error handling for environment variable retrieval.The function
bx_threads
retrieves an environment variable and converts it to an integer. Consider adding error handling for cases where the environment variable is not a valid integer.auto bx_threads() const { char const* ENV = "MKN_GPU_BX_THREADS"; if (mkn::kul::env::EXISTS(ENV)) { try { return as<std::int32_t>(mkn::kul::env::GET(ENV)); } catch (const std::invalid_argument& e) { // Handle error: invalid integer } catch (const std::out_of_range& e) { // Handle error: out of range } } return dev.maxThreadsPerBlock; }
54-59
: LGTM!The
as
function correctly converts a string to the specified type.inc/mkn/gpu/launchers.hpp (2)
45-47
: Ensure theStream
type is well-defined and add documentation.The addition of the
stream
function in theGDLauncher
struct is a significant enhancement, allowing kernels to be launched on specific GPU streams. This change improves the flexibility and control over GPU resource management. Ensure theStream
type is well-defined and compatible with the intended GPU backends (CUDA, ROCm, etc.). Additionally, consider adding documentation for this function to clarify its usage and the expected type of theStream
parameter.Verification successful
Ensure the
Stream
type is well-defined and add documentation.The
Stream
type is well-defined and compatible with various GPU backends, including CUDA, ROCm, and CPU, as evidenced by its definitions ininc/mkn/gpu/cuda.hpp
,inc/mkn/gpu/cpu.hpp
, andinc/mkn/gpu/rocm.hpp
. Consider adding documentation for thestream
function to clarify its usage and the expected type of theStream
parameter.
inc/mkn/gpu/cuda.hpp
inc/mkn/gpu/cpu.hpp
inc/mkn/gpu/rocm.hpp
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the definition of the `Stream` type. # Test: Search for the `Stream` type definition. Expect: A well-defined type compatible with GPU backends. rg --type cpp $'Stream'Length of output: 6525
59-61
: Ensure effective handling of the stream parameter and add error handling.The modification to the
_launch
function to accept a stream parameter (S& _s
) is a necessary change to support the newstream
function. This adjustment allows the kernel launch operation to target a specific stream, enhancing concurrency and resource utilization on the GPU. Ensure that theMKN_GPU_NS::launch
function is updated accordingly to handle the stream parameter effectively. It might also be beneficial to add error handling or assertions to verify the validity of the stream before launching the kernel.inc/mkn/gpu/alloc.hpp (1)
75-86
: LGTM!The assertions for
dst
andsrc
ensure that the pointers are valid. The switching of arguments in thetake
function call is correct and aligns with the logic for sending and taking data.inc/mkn/gpu/cpu.hpp (5)
41-41
: Inclusion ofcli.hpp
is appropriate.The inclusion of
cli.hpp
aligns with the new functionalities introduced in the file.
86-88
: Addition ofsetLimitMallocHeapSize
function is appropriate.This no-op function maintains interface consistency with the GPU counterpart.
101-112
: Addition ofStreamEvent
struct is appropriate, but document its purpose.The
StreamEvent
struct provides a uniform interface for event handling across CPU and GPU contexts. Ensure to add documentation explaining its purpose and the rationale behind the no-op implementations.
120-121
: Clarify behavior ofPointer
struct functions.Modifying the
Pointer
struct to always returntrue
foris_device_ptr
andis_managed_ptr
in the CPU context might be misleading. Add documentation to clarify the behavior of these functions.
207-207
: Document the implications of conditional synchronization in thelaunch
function.The update to the
launch
function to include conditional synchronization based on the_sync
template parameter is a significant improvement. Ensure that the implications of this change are well-documented, especially regarding how it affects kernel execution and synchronization.inc/mkn/gpu/cuda.hpp (6)
42-42
: Inclusion ofcli.hpp
is appropriate.The inclusion of
cli.hpp
aligns with the new functionalities introduced in the file.
57-67
: Addition ofidx
function is appropriate, ensure thorough testing.The
idx
function simplifies thread index calculation in CUDA kernels. Ensure that the function is thoroughly tested for various grid and block configurations.
81-83
: Addition ofsetLimitMallocHeapSize
function is appropriate.This function sets the heap size limit for CUDA memory allocation.
97-113
: Enable the destructor code forStreamEvent
.The
StreamEvent
struct addition for CUDA streams is crucial for event management. Ensure the destructor properly manages the destruction of CUDA events to prevent resource leaks.
215-230
: Document the implications of conditional synchronization in thelaunch
function.The update to the
launch
function to include conditional synchronization based on the_sync
template parameter is a significant improvement. Ensure that the implications of this change are well-documented, especially regarding how it affects kernel execution and synchronization.
255-262
: Addition ofGLauncher
class is appropriate, ensure thorough testing.The
GLauncher
class initializes the block size usingcli.bx_threads()
. Ensure that this functionality is thoroughly tested.inc/mkn/gpu/rocm.hpp (6)
40-40
: Inclusion ofcli.hpp
is appropriate.The inclusion of
cli.hpp
aligns with the new functionalities introduced in the file.
55-66
: Addition ofidx
function is appropriate, ensure thorough testing.The
idx
function simplifies thread index calculation in HIP kernels. Ensure that the function is thoroughly tested for various grid and block configurations.
80-82
: Addition ofsetLimitMallocHeapSize
function is appropriate.This function sets the heap size limit for HIP memory allocation.
96-112
: Enable the destructor code forStreamEvent
.The
StreamEvent
struct addition for HIP streams is crucial for event management. Ensure the destructor properly manages the destruction of HIP events to prevent resource leaks.
117-138
: Addition ofPointer
struct is appropriate, ensure documentation.The
Pointer
struct provides functions to determine the type of pointer (host, device, managed). Ensure that these functions are documented properly to clarify their behavior.
227-242
: Document the implications of conditional synchronization in thelaunch
function.The update to the
launch
function to include conditional synchronization based on the_sync
template parameter is a significant improvement. Ensure that the implications of this change are well-documented, especially regarding how it affects kernel execution and synchronization.inc/mkn/gpu/device.hpp (1)
103-105
: Addition ofdata
member functions toDeviceMem
struct is appropriate, ensure documentation.The
data
member functions provide access to the data pointer inDeviceMem
. Ensure that these functions are documented properly to clarify their behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- inc/mkn/gpu/cuda.hpp (10 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
- res/mkn/clang_cuda.yaml (1 hunks)
Files skipped from review due to trivial changes (1)
- res/mkn/clang_cuda.yaml
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (1)
test/any/coop.cpp (1)
16-34
: Add error checks for GPU launcher setup.The function does not include explicit error checks for the GPU launcher setup. Consider adding error handling to ensure robustness.
// Example error check for GPU launcher setup if (!launcher) { KOUT(NON) << "Failed to set up GPU launcher"; return 1; }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- inc/mkn/gpu/cpu.hpp (7 hunks)
- inc/mkn/gpu/cuda.hpp (10 hunks)
- inc/mkn/gpu/launchers.hpp (2 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
- test/any/coop.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (4)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/launchers.hpp
- inc/mkn/gpu/rocm.hpp
Additional comments not posted (3)
test/any/coop.cpp (3)
2-6
: LGTM!The includes and constants look appropriate for the context.
36-46
: LGTM!The
main
function is straightforward and correctly handles the cooperative launch support check.
16-34
: Verify compatibility with ROCm.The lambda function uses
__device__
which is specific to CUDA. Ensure compatibility with ROCm.Verification successful
Verified compatibility with ROCm.
The codebase includes ROCm-specific preprocessor directives and documentation, indicating that it is designed to be compatible with ROCm.
inc/mkn/gpu/rocm.hpp
inc/mkn/gpu/defines.hpp
test/hip/add.cpp
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify compatibility with ROCm. # Test: Search for usage of `__device__` in ROCm-related files. rg --type cpp $'__device__'Length of output: 3024
Script:
#!/bin/bash # Description: Check for ROCm-specific conditional compilation directives and documentation. # Search for ROCm-specific preprocessor directives. rg -i 'ROCm' --type cpp # Search for comments or documentation related to ROCm. rg -i 'ROCm' --type md --type hpp --type cppLength of output: 1391
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- inc/mkn/gpu/cuda.hpp (10 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
- test/any/coop.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
- test/any/coop.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/cuda.hpp (10 hunks)
- inc/mkn/gpu/rocm.hpp (8 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- inc/mkn/gpu/cpu.hpp (7 hunks)
Files skipped from review as they are similar to previous changes (1)
- inc/mkn/gpu/cpu.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- test/any/construct.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- inc/mkn/gpu/alloc.hpp
Additional comments not posted (5)
test/any/construct.cpp (5)
2-4
: LGTM!The include statement and constant definition are appropriate.
6-10
: LGTM!The template definitions for
ManagedVector
andManagedMemory
are correctly defined usingmkn::gpu
allocators.
12-15
: LGTM!The global variable and the struct
S
with a constructor that modifiesalloced
are correctly defined.
17-29
: LGTM!The test functions
test_does_construct_on_resize
andtest_does_not_construct_on_resize
correctly test the construction behavior on resize operations.
31-34
: LGTM!The main function correctly runs the tests and returns their combined result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- test/any/construct.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/alloc.hpp
- test/any/construct.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- test/any/construct.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/alloc.hpp
- test/any/construct.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (7)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- inc/mkn/gpu/cpu.hpp (9 hunks)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/defines.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- test/any/construct.cpp (1 hunks)
- test/any/managed.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (6)
- inc/mkn/gpu/alloc.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/defines.hpp
- inc/mkn/gpu/rocm.hpp
- test/any/construct.cpp
- test/any/managed.cpp
Additional comments not posted (6)
inc/mkn/gpu/cpu.hpp (6)
41-41
: Include review: ApprovedThe new include for
mkn/gpu/cli.hpp
likely brings in CLI functionalities.
86-88
: No-op function: ApprovedThe
setLimitMallocHeapSize
function is a no-op, likely for API consistency across platforms.
90-93
: No support for cooperative launch: ApprovedThe
supportsCooperativeLaunch
function returns 0, indicating no support for cooperative launch. This is likely for API consistency.
155-155
: Inlining function: ApprovedInlining the
destroy
function can improve performance by reducing function call overhead.
172-176
: New function: ApprovedThe
copy_on_device
function performs a memcpy operation, copying data fromsrc
todst
on the CPU.
209-209
: Inlining function: ApprovedInlining the
sync
function can improve performance by reducing function call overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (1)
test/any/add.cpp (1)
45-46
: Remove or clarify commented-out assertion.The commented-out assertion should either be removed or clarified with a comment explaining its purpose.
- // assert(mkn::gpu::Pointer{a.data()}.is_device_ptr());
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- .github/workflows/build.yml (1 hunks)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- test/any/add.cpp (1 hunks)
- test/any/construct.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (5)
- .github/workflows/build.yml
- inc/mkn/gpu/alloc.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/rocm.hpp
- test/any/construct.cpp
Additional comments not posted (4)
test/any/add.cpp (4)
33-35
: LGTM! Assertion to check host pointer.The added assertion is a good practice to verify memory types before operations.
38-41
: LGTM! Conditional assertion to check device pointer.The added conditional assertion is a good practice to verify memory types based on compile-time flags.
42-42
: LGTM! Kernel function launcher call.The launcher call correctly executes the
vectoradd1
kernel function with appropriate arguments.
46-48
: LGTM! Result verification loop.The added for-loop correctly verifies the results of the kernel execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- inc/mkn/gpu/alloc.hpp (4 hunks)
Files skipped from review as they are similar to previous changes (1)
- inc/mkn/gpu/alloc.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (24)
- .github/workflows/build.yml (1 hunks)
- .sublime-project (1 hunks)
- LICENSE.md (1 hunks)
- inc/mkn/gpu.hpp (2 hunks)
- inc/mkn/gpu/alloc.hpp (4 hunks)
- inc/mkn/gpu/asio.hpp (1 hunks)
- inc/mkn/gpu/cli.hpp (1 hunks)
- inc/mkn/gpu/cpu.hpp (9 hunks)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/def.hpp (1 hunks)
- inc/mkn/gpu/defines.hpp (1 hunks)
- inc/mkn/gpu/device.hpp (2 hunks)
- inc/mkn/gpu/launchers.hpp (3 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- inc/mkn/gpu/tuple.hpp (1 hunks)
- mkn.yaml (2 hunks)
- res/mkn/clang_cuda.yaml (1 hunks)
- res/mkn/hipcc.yaml (1 hunks)
- test/any/add.cpp (1 hunks)
- test/any/async_streaming.cpp (1 hunks)
- test/any/construct.cpp (1 hunks)
- test/any/coop.cpp (1 hunks)
- test/any/managed.cpp (1 hunks)
Files skipped from review due to trivial changes (1)
- res/mkn/clang_cuda.yaml
Files skipped from review as they are similar to previous changes (23)
- .github/workflows/build.yml
- .sublime-project
- LICENSE.md
- inc/mkn/gpu.hpp
- inc/mkn/gpu/alloc.hpp
- inc/mkn/gpu/asio.hpp
- inc/mkn/gpu/cli.hpp
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/def.hpp
- inc/mkn/gpu/defines.hpp
- inc/mkn/gpu/device.hpp
- inc/mkn/gpu/launchers.hpp
- inc/mkn/gpu/multi_launch.hpp
- inc/mkn/gpu/rocm.hpp
- inc/mkn/gpu/tuple.hpp
- mkn.yaml
- res/mkn/hipcc.yaml
- test/any/add.cpp
- test/any/async_streaming.cpp
- test/any/construct.cpp
- test/any/coop.cpp
- test/any/managed.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/launchers.hpp (3 hunks)
- test/any/managed.cpp (2 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/launchers.hpp
- test/any/managed.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/launchers.hpp (3 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/launchers.hpp
- inc/mkn/gpu/multi_launch.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range, codebase verification and nitpick comments (3)
test/any/async_streaming.cpp (2)
45-45
: Consider using a logging framework or conditional compilation for debugging output.Printing the first element of each vector in
vecs
usingstd::cout
is useful for debugging but might clutter the output in production or automated testing environments. Consider using a logging framework or conditional compilation to control the verbosity of output.- for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; + // For debugging purposes, consider using a logging framework or conditional compilation
81-81
: Consider using a logging framework or conditional compilation for debugging output.Printing the first element of each vector in
vecs
usingstd::cout
is useful for debugging but might clutter the output in production or automated testing environments. Consider using a logging framework or conditional compilation to control the verbosity of output.- for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; + // For debugging purposes, consider using a logging framework or conditional compilationinc/mkn/gpu/multi_launch.hpp (1)
50-58
: Declare the destructor asdefault
.It's good practice to declare the destructor of a base class with virtual methods as
default
to ensure proper cleanup in derived classes.- virtual ~StreamFunction() {} + virtual ~StreamFunction() = default;
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- test/any/async_streaming.cpp (1 hunks)
Additional context used
Learnings (1)
test/any/async_streaming.cpp (1)
Learnt from: PhilipDeegan PR: mkn/mkn.gpu#19 File: test/any/async_streaming.cpp:2-6 Timestamp: 2024-05-18T14:17:54.861Z Learning: @PhilipDeegan, thank you for sharing your preferred style for organizing includes. I'll remember this for future reviews in your codebase. <!-- PhilipDeegan prefers to sort `#include` statements by line length first, and alphabetically second.
Additional comments not posted (5)
test/any/async_streaming.cpp (2)
2-6
: Includes are well-organized.The includes are sorted by line length first, and alphabetically second, as per your preference.
93-97
: LGTM!The function
main
is straightforward and does not contain any issues.inc/mkn/gpu/multi_launch.hpp (3)
60-75
: LGTM!The structure
StreamDeviceFunction
is well-implemented and does not contain any issues.
77-84
: LGTM!The structure
StreamHostFunction
is well-implemented and does not contain any issues.
183-301
: LGTM!The structure
ThreadedStreamLauncher
is well-implemented and does not contain any issues.
test/any/async_streaming.cpp
Outdated
std::uint32_t test() { | ||
using namespace mkn::gpu; | ||
using T = double; | ||
|
||
KUL_DBG_FUNC_ENTER; | ||
|
||
std::vector<ManagedVector<T>> vecs(C, ManagedVector<T>(NUM, 0)); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) std::fill_n(vecs[i].data(), NUM, i); | ||
|
||
ManagedVector<T*> datas(C); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) datas[i] = vecs[i].data(); | ||
auto views = datas.data(); | ||
|
||
StreamLauncher{vecs} | ||
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 1; }) | ||
.host([&](auto i) mutable { | ||
std::this_thread::sleep_for(200ms); | ||
for (auto& e : vecs[i]) e += 1; | ||
}) | ||
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 3; }) // | ||
(); | ||
|
||
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; | ||
|
||
std::size_t val = 5; | ||
for (auto const& vec : vecs) { | ||
for (auto const& e : vec) | ||
if (e != val) return 1; | ||
++val; | ||
}; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assertions to verify correctness.
To ensure the test validates the expected behavior, consider adding assertions after the streaming operations to check the state of vecs
.
+ for (std::size_t i = 0; i < vecs.size(); ++i) {
+ assert(vecs[i][0] == expected_value); // Replace `expected_value` with the actual expected value after operations
+ }
Committable suggestion was skipped due to low confidence.
test/any/async_streaming.cpp
Outdated
std::uint32_t test_threaded() { | ||
using namespace mkn::gpu; | ||
using T = double; | ||
|
||
KUL_DBG_FUNC_ENTER; | ||
|
||
std::vector<ManagedVector<T>> vecs(C, ManagedVector<T>(NUM, 0)); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) std::fill_n(vecs[i].data(), NUM, i); | ||
|
||
ManagedVector<T*> datas(C); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) datas[i] = vecs[i].data(); | ||
auto views = datas.data(); | ||
|
||
using namespace std::chrono_literals; | ||
|
||
ThreadedStreamLauncher{vecs, 6} | ||
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 1; }) | ||
.host([&](auto i) mutable { | ||
std::this_thread::sleep_for(200ms); | ||
for (auto& e : vecs[i]) e += 1; | ||
}) | ||
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 3; }) // | ||
(); | ||
|
||
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; | ||
|
||
std::size_t val = 5; | ||
for (auto const& vec : vecs) { | ||
for (auto const& e : vec) | ||
if (e != val) return 1; | ||
++val; | ||
}; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assertions to verify correctness.
To ensure the test validates the expected behavior, consider adding assertions after the streaming operations to check the state of vecs
.
+ for (std::size_t i = 0; i < vecs.size(); ++i) {
+ assert(vecs[i][0] == expected_value); // Replace `expected_value` with the actual expected value after operations
+ }
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::uint32_t test_threaded() { | |
using namespace mkn::gpu; | |
using T = double; | |
KUL_DBG_FUNC_ENTER; | |
std::vector<ManagedVector<T>> vecs(C, ManagedVector<T>(NUM, 0)); | |
for (std::size_t i = 0; i < vecs.size(); ++i) std::fill_n(vecs[i].data(), NUM, i); | |
ManagedVector<T*> datas(C); | |
for (std::size_t i = 0; i < vecs.size(); ++i) datas[i] = vecs[i].data(); | |
auto views = datas.data(); | |
using namespace std::chrono_literals; | |
ThreadedStreamLauncher{vecs, 6} | |
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 1; }) | |
.host([&](auto i) mutable { | |
std::this_thread::sleep_for(200ms); | |
for (auto& e : vecs[i]) e += 1; | |
}) | |
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 3; }) // | |
(); | |
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; | |
std::size_t val = 5; | |
for (auto const& vec : vecs) { | |
for (auto const& e : vec) | |
if (e != val) return 1; | |
++val; | |
}; | |
return 0; | |
} | |
std::uint32_t test_threaded() { | |
using namespace mkn::gpu; | |
using T = double; | |
KUL_DBG_FUNC_ENTER; | |
std::vector<ManagedVector<T>> vecs(C, ManagedVector<T>(NUM, 0)); | |
for (std::size_t i = 0; i < vecs.size(); ++i) std::fill_n(vecs[i].data(), NUM, i); | |
ManagedVector<T*> datas(C); | |
for (std::size_t i = 0; i < vecs.size(); ++i) datas[i] = vecs[i].data(); | |
auto views = datas.data(); | |
using namespace std::chrono_literals; | |
ThreadedStreamLauncher{vecs, 6} | |
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 1; }) | |
.host([&](auto i) mutable { | |
std::this_thread::sleep_for(200ms); | |
for (auto& e : vecs[i]) e += 1; | |
}) | |
.dev([=] __device__(auto i) { views[i][mkn::gpu::idx()] += 3; }) // | |
(); | |
for (auto const& vec : vecs) std::cout << __LINE__ << " " << vec[0] << std::endl; | |
std::size_t val = 5; | |
for (auto const& vec : vecs) { | |
for (auto const& e : vec) | |
if (e != val) return 1; | |
++val; | |
}; | |
for (std::size_t i = 0; i < vecs.size(); ++i) { | |
assert(vecs[i][0] == expected_value); // Replace `expected_value` with the actual expected value after operations | |
} | |
return 0; | |
} |
inc/mkn/gpu/multi_launch.hpp
Outdated
template <typename Datas, typename Self_ = void> | ||
struct StreamLauncher { | ||
using This = StreamLauncher<Datas, Self_>; | ||
using Self = std::conditional_t<std::is_same_v<Self_, void>, This, Self_>; | ||
|
||
StreamLauncher(Datas& datas_) : datas{datas_}, streams(datas.size()), data_step(datas.size(), 0) { | ||
for (auto& s : streams) events.emplace_back(s); | ||
} | ||
|
||
~StreamLauncher() { sync(); } | ||
|
||
void sync() noexcept { | ||
for (auto& s : streams) s.sync(); | ||
} | ||
|
||
template <typename Fn> | ||
Self& dev(Fn&& fn) { | ||
fns.emplace_back(std::make_shared<StreamDeviceFunction<Self, Fn>>(self, std::forward<Fn>(fn))); | ||
return self; | ||
} | ||
template <typename Fn> | ||
Self& host(Fn&& fn) { | ||
fns.emplace_back(std::make_shared<StreamHostFunction<Self, Fn>>(self, std::forward<Fn>(fn))); | ||
return self; | ||
} | ||
|
||
void operator()() { | ||
using namespace std::chrono_literals; | ||
|
||
if (fns.size() == 0) return; | ||
|
||
for (std::size_t i = 0; i < datas.size(); ++i) self(i); | ||
|
||
do { | ||
for (std::size_t i = 0; i < datas.size(); ++i) { | ||
if (is_finished(i)) continue; | ||
if (is_fn_finished(i)) { | ||
data_step[i] += 1; | ||
if (not is_finished(i)) self(i); | ||
} | ||
} | ||
std::this_thread::sleep_for(1ms); // make sleep time configurable | ||
} while (!is_finished()); | ||
} | ||
|
||
void operator()(std::uint32_t const i) { | ||
auto const& step = data_step[i]; | ||
assert(step < fns.size()); | ||
fns[step]->run(i); | ||
if (fns[step]->mode == StreamFunctionMode::DEVICE_WAIT) events[i].record(); | ||
} | ||
|
||
bool is_finished() const { | ||
std::uint32_t finished = 0; | ||
for (std::size_t i = 0; i < datas.size(); ++i) | ||
if (is_finished(i)) ++finished; | ||
return finished == datas.size(); | ||
} | ||
|
||
bool is_finished(std::uint32_t idx) const { return data_step[idx] == fns.size(); } | ||
|
||
bool is_fn_finished(std::uint32_t i) { | ||
auto const b = [&]() { | ||
auto const& step = data_step[i]; | ||
if (fns[step]->mode == StreamFunctionMode::HOST_WAIT) return true; | ||
return events[i].finished(); | ||
}(); | ||
if (b) { | ||
events[i].reset(); | ||
} | ||
return b; | ||
} | ||
|
||
Datas& datas; | ||
std::vector<std::shared_ptr<StreamFunction<Self>>> fns; | ||
std::vector<mkn::gpu::Stream> streams; | ||
std::vector<mkn::gpu::StreamEvent> events; | ||
std::vector<std::uint16_t> data_step; | ||
Self& self = *reinterpret_cast<Self*>(this); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sleep duration configurable.
The operator()
method has a hardcoded sleep duration. Consider making it configurable to allow flexibility.
- std::this_thread::sleep_for(1ms); // make sleep time configurable
+ std::this_thread::sleep_for(sleep_duration); // make sleep time configurable
std::chrono::milliseconds sleep_duration{1};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- res/mkn/clang_cuda.yaml (2 hunks)
- test/any/async_streaming.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/multi_launch.hpp
- test/any/async_streaming.cpp
Additional comments not posted (13)
res/mkn/clang_cuda.yaml (2)
8-12
: Addition of troubleshooting information is helpful.The added troubleshooting information for the "GPUassert: unknown error" is relevant and useful for users.
26-27
: Modification ofenv
property is appropriate.Including the CUDA binaries directory in the
PATH
ensures that CUDA tools are accessible, facilitating smoother compilation and execution of CUDA applications.inc/mkn/gpu/cuda.hpp (11)
42-44
: Inclusion of necessary headers is appropriate.The included headers (
cuda_runtime.h
andcooperative_groups.h
) are necessary for the functionalities implemented in the file.
Line range hint
48-54
: Implementation ofgpuAssert
function is correct.The
gpuAssert
function is correctly implemented to handle CUDA errors and is defined asinline
for potential performance benefits.
60-68
: Implementation ofidx
device function is correct.The
idx
function is correctly implemented to calculate a unique index for each thread in a CUDA kernel.
70-73
: Implementation ofblock_idx_x
device function is correct.The
block_idx_x
function is correctly implemented to return the block index in the x-dimension in CUDA kernels.
87-89
: Implementation ofsetLimitMallocHeapSize
function is correct.The
setLimitMallocHeapSize
function is correctly implemented to set the CUDA device's heap size limit.
91-95
: Implementation ofsupportsCooperativeLaunch
function is correct.The
supportsCooperativeLaunch
function is correctly implemented to check if the device supports cooperative kernel launches.
109-151
: Implementation ofStreamEvent
struct is correct.The
StreamEvent
struct is correctly implemented to manage CUDA events, ensuring proper resource management and providing methods for recording events and checking their completion status.
191-193
: Implementation ofdestroy
function is correct.The
destroy
function is correctly implemented to free CUDA memory and is defined asinline
for potential performance benefits.
208-212
: Implementation ofcopy_on_device
function is correct.The
copy_on_device
function is correctly implemented to copy memory from one device location to another.
253-254
: Implementation ofsync
function is correct.The
sync
function is correctly implemented to synchronize the CUDA device and is defined asinline
for potential performance benefits.
259-280
: Implementation oflaunch
function is correct.The
launch
function is correctly implemented to support cooperative kernel launches and conditional synchronization, providing more flexible execution configurations for CUDA kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (13)
inc/mkn/gpu/cpu.hpp (5)
86-87
: Consider adding documentation forsetLimitMallocHeapSize
.The function is a no-op in the CPU context. Adding a comment or documentation explaining this behavior would help future maintainers.
// No-op for CPU context void setLimitMallocHeapSize(std::size_t const& /*bytes*/) {}
88-89
: Consider adding documentation forsetDevice
.The function is a no-op in the CPU context. Adding a comment or documentation explaining this behavior would help future maintainers.
// No-op for CPU context void setDevice(std::size_t const& /*dev*/) {}
90-93
: Consider adding documentation forsupportsCooperativeLaunch
.The function always returns
0
in the CPU context. Adding a comment or documentation explaining this behavior would help future maintainers.// Always returns 0 for CPU context auto supportsCooperativeLaunch(int const /*dev*/ = 0) { int supportsCoopLaunch = 0; return supportsCoopLaunch; }
106-117
: Clarify the purpose ofStreamEvent
.The
StreamEvent
struct is introduced in the CPU context with no-op implementations. Adding documentation to explain its intended use and the rationale behind the no-op implementations would be beneficial.struct StreamEvent { // Constructor StreamEvent(Stream&) {} // Destructor ~StreamEvent() {} // Operator overload auto& operator()() { return event; } // Record event (no-op) void record() { ; } // Check if event is finished (always true) bool finished() const { return true; } // Reset event (no-op) void reset() {} // Member variables Stream stream; std::size_t event = 0; };
Line range hint
218-265
:
Document the implications of_sync
and_coop
inlaunch
function.The
launch
function now includes_sync
and_coop
template parameters. Ensure that the implications of these changes are well-documented, especially regarding how they affect kernel execution and synchronization.template <bool _sync = true, bool _coop = false, typename F, typename... Args> void launch(F f, dim3 g, dim3 b, std::size_t /*ds*/, std::size_t /*stream*/, Args&&... args) { std::size_t N = (g.x * g.y * g.z) * (b.x * b.y * b.z); KLOG(TRC) << N; std::apply( [&](auto&&... params) { for (std::size_t i = 0; i < N; ++i) { f(params...); detail::idx++; } }, devmem_replace(std::forward_as_tuple(args...), std::make_index_sequence<sizeof...(Args)>())); detail::idx = 0; }inc/mkn/gpu/rocm.hpp (3)
244-265
: Document the implications of_sync
and_coop
inlaunch
function.The
launch
function now includes_sync
and_coop
template parameters. Ensure that the implications of these changes are well-documented, especially regarding how they affect kernel execution and synchronization.template <bool _sync = true, bool _coop = false, typename F, typename... Args> void launch(F&& f, dim3 g, dim3 b, std::size_t ds, hipStream_t& s, Args&&... args) { std::size_t N = (g.x * g.y * g.z) * (b.x * b.y * b.z); KLOG(TRC) << N; std::apply( [&](auto&&... params) { if constexpr (_coop) { auto address_of = [](auto& a) { return (void*)&a; }; void* kernelArgs[] = {(address_of(params), ...)}; hipLaunchCooperativeKernel<F>(f, g, b, kernelArgs, ds, s); } else { hipLaunchKernelGGL(f, g, b, ds, s, params...); } MKN_GPU_ASSERT(hipGetLastError()); }, devmem_replace(std::forward_as_tuple(args...), std::make_index_sequence<sizeof...(Args)>())); if constexpr (_sync) { if (s) sync(s); else sync(); } }
88-93
: EnsuresupportsCooperativeLaunch
function is tested.The
supportsCooperativeLaunch
function is not currently tested in the codebase. The only instance found is commented out intest/any/coop.cpp
. Please add tests to ensure this function is thoroughly tested.
test/any/coop.cpp
: Uncomment and implement tests forsupportsCooperativeLaunch
.Analysis chain
Ensure
supportsCooperativeLaunch
function is tested.The function checks if the selected device supports cooperative launches. Ensure that this function is thoroughly tested.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `supportsCooperativeLaunch` function to ensure it is tested. # Test: Search for the `supportsCooperativeLaunch` function usage. Expect: Verify if the function is tested. rg --type cpp -A 5 $'supportsCooperativeLaunch'Length of output: 1226
Script:
#!/bin/bash # Description: Verify if the `supportsCooperativeLaunch` function is tested in the codebase. # Search for test files that might contain tests for `supportsCooperativeLaunch` rg --type cpp -l 'supportsCooperativeLaunch' | grep -E 'test|spec|unittest|check' # Search for test functions that might be testing `supportsCooperativeLaunch` rg --type cpp -A 5 'TEST|TEST_F|TEST_CASE|TEST_SUITE' | grep -A 5 'supportsCooperativeLaunch'Length of output: 193
86-86
: EnsuresetDevice
function is tested.The
setDevice
function is not currently tested. Please add tests to ensure this function works as expected.
- The function is defined in the following files but is not referenced in any test files:
inc/mkn/gpu/rocm.hpp
inc/mkn/gpu/cuda.hpp
inc/mkn/gpu/cpu.hpp
Analysis chain
Ensure
setDevice
function is tested.The function sets the active GPU device. Ensure that this function is thoroughly tested.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `setDevice` function to ensure it is tested. # Test: Search for the `setDevice` function usage. Expect: Verify if the function is tested. rg --type cpp -A 5 $'setDevice'Length of output: 1142
Script:
#!/bin/bash # Description: Verify if the `setDevice` function is tested by searching for its usage in common test directories. # Search for the `setDevice` function usage in test files. rg 'setDevice' -g '*test*' -g '*Test*' -g '*tests*' -g '*Tests*'Length of output: 213
Script:
#!/bin/bash # Description: Diagnose why ripgrep did not search any files by using the --debug flag. # Search for the `setDevice` function usage in test files with debug information. rg 'setDevice' -g '*test*' -g '*Test*' -g '*tests*' -g '*Tests*' --debugLength of output: 11076
Script:
#!/bin/bash # Description: Search for the `setDevice` function usage in all files, including hidden ones, and explicitly include common test file extensions and directories. # Search for the `setDevice` function usage in all files, including hidden ones. rg 'setDevice' --hidden --glob '*test*' --glob '*Test*' --glob '*tests*' --glob '*Tests*' --glob '*.cpp' --glob '*.hpp'Length of output: 393
inc/mkn/gpu/cuda.hpp (5)
127-128
: Delete copy constructor and assignment operator forStreamEvent
.The copy constructor and assignment operator are deleted to prevent copying of the
StreamEvent
object. Ensure that this behavior is well-documented.StreamEvent(StreamEvent const&) = delete; StreamEvent& operator=(StreamEvent const&) = delete;
261-282
: Document the implications of_sync
and_coop
inlaunch
function.The
launch
function now includes_sync
and_coop
template parameters. Ensure that the implications of these changes are well-documented, especially regarding how they affect kernel execution and synchronization.template <bool _sync = true, bool _coop = false, typename F, typename... Args> void launch(F&& f, dim3 g, dim3 b, std::size_t ds, cudaStream_t& s, Args&&... args) { std::size_t N = (g.x * g.y * g.z) * (b.x * b.y * b.z); KLOG(TRC) << N; std::apply( [&](auto&&... params) { if constexpr (_coop) { auto address_of = [](auto& a) { return (void*)&a; }; void* kernelArgs[] = {(address_of(params), ...)}; cudaLaunchCooperativeKernel((void*)f, g, b, kernelArgs, ds); } else { f<<<g, b, ds, s>>>(params...); } MKN_GPU_ASSERT(cudaGetLastError()); }, devmem_replace(std::forward_as_tuple(args...), std::make_index_sequence<sizeof...(Args)> --- `93-97`: **Ensure `supportsCooperativeLaunch` function is tested.** The `supportsCooperativeLaunch` function is not currently tested, as the only reference found in the test files is commented out. - `test/any/coop.cpp`: Commented-out reference to `mkn::gpu::supportsCooperativeLaunch()` Please add tests to ensure this function is thoroughly tested. <details> <summary>Analysis chain</summary> **Ensure `supportsCooperativeLaunch` function is tested.** The function checks if the selected device supports cooperative launches. Ensure that this function is thoroughly tested. </details> <details> <summary>Scripts executed</summary> The following scripts were executed for the analysis: Script: ```shell #!/bin/bash # Description: Verify the usage of the `supportsCooperativeLaunch` function to ensure it is tested. # Test: Search for the `supportsCooperativeLaunch` function usage. Expect: Verify if the function is tested. rg --type cpp -A 5 $'supportsCooperativeLaunch'Length of output: 1226
Script:
#!/bin/bash # Description: Verify if there are any active tests for the `supportsCooperativeLaunch` function. # Search for the `supportsCooperativeLaunch` function references in test files to see if there are any active tests. rg --type cpp --glob 'test/**' 'supportsCooperativeLaunch'Length of output: 165
91-91
: EnsuresetDevice
function is tested.The
setDevice
function is not currently tested. Please add test cases to ensure that this function works correctly.
inc/mkn/gpu/cuda.hpp
inc/mkn/gpu/rocm.hpp
inc/mkn/gpu/cpu.hpp
Analysis chain
Ensure
setDevice
function is tested.The function sets the active GPU device. Ensure that this function is thoroughly tested.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `setDevice` function to ensure it is tested. # Test: Search for the `setDevice` function usage. Expect: Verify if the function is tested. rg --type cpp -A 5 $'setDevice'Length of output: 1142
Script:
#!/bin/bash # Description: Verify if the `setDevice` function is tested by searching for its usage in potential test directories. # Search for the `setDevice` function usage in potential test directories. rg --type cpp -A 5 'setDevice' test testsLength of output: 87
Script:
#!/bin/bash # Description: Verify if the `setDevice` function is tested by searching for its usage in the entire codebase. # Search for the `setDevice` function usage in the entire codebase. rg --type cpp -A 5 'setDevice'Length of output: 1142
87-89
: EnsuresetLimitMallocHeapSize
function is tested.The function
setLimitMallocHeapSize
is not explicitly tested in the current codebase. Please add tests to ensure its functionality.
- No instances of
setLimitMallocHeapSize
were found in the test directories.Analysis chain
Ensure
setLimitMallocHeapSize
function is tested.The function sets the limit on the heap size for dynamic memory allocation. Ensure that this function is thoroughly tested.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `setLimitMallocHeapSize` function to ensure it is tested. # Test: Search for the `setLimitMallocHeapSize` function usage. Expect: Verify if the function is tested. rg --type cpp -A 5 $'setLimitMallocHeapSize'Length of output: 1049
Script:
#!/bin/bash # Description: Verify if the `setLimitMallocHeapSize` function is tested. # Search for the `setLimitMallocHeapSize` function usage in typical test directories. rg --type cpp -A 5 'setLimitMallocHeapSize' tests/ test/Length of output: 103
Script:
#!/bin/bash # Description: Identify potential test directories and files in the repository. # List all directories and files in the repository to identify potential test locations. fd --type d fd --type fLength of output: 853
Script:
#!/bin/bash # Description: Verify if the `setLimitMallocHeapSize` function is tested. # Search for the `setLimitMallocHeapSize` function usage in identified test directories. rg --type cpp -A 5 'setLimitMallocHeapSize' test/any/ test/cpu/ test/cuda/ test/hip/Length of output: 84
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- inc/mkn/gpu/cpu.hpp (9 hunks)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- test/any/async_streaming.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- inc/mkn/gpu/multi_launch.hpp
- test/any/async_streaming.cpp
Additional comments not posted (10)
inc/mkn/gpu/cpu.hpp (3)
155-157
: Inlinedestroy
function.The
destroy
function is now inline, which may optimize its usage. Ensure that this change is necessary and beneficial for performance.
209-209
: Inlinesync
function.The
sync
function is now inline, which may optimize its usage. Ensure that this change is necessary and beneficial for performance.Verification successful
Inline
sync
function.The
sync
function ininc/mkn/gpu/cpu.hpp
is an empty function. Inlining it is unlikely to have a significant impact on performance due to its simplicity. The change is reasonable and should not introduce any unintended side effects.
- File:
inc/mkn/gpu/cpu.hpp
- Line: 209
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `sync` function to ensure inlining is beneficial. # Test: Search for the `sync` function usage. Expect: Verify if inlining is beneficial. rg --type cpp -A 5 $'sync'Length of output: 21438
Script:
#!/bin/bash # Description: Find all instances where the `sync` function from `inc/mkn/gpu/cpu.hpp` is called. # Search for calls to `sync` in the context of `inc/mkn/gpu/cpu.hpp` rg --type cpp 'sync\(\)' -A 5Length of output: 4431
172-176
: Ensurecopy_on_device
function is tested.The
copy_on_device
function is added to facilitate copying data to the device. Ensure that this function is thoroughly tested.inc/mkn/gpu/rocm.hpp (5)
238-239
: Inlinesync
function.The
sync
function is now inline, which may optimize its usage. Ensure that this change is necessary and beneficial for performance.
134-145
: EnsurePointer
struct is tested.The
Pointer
struct now includes additional logic to determine the type of memory pointer. Ensure that this struct is thoroughly tested.Verification successful
The
Pointer
struct is tested.The
Pointer
struct is used in assertions within the test filestest/any/managed.cpp
andtest/any/add.cpp
, indicating that it is being tested.
test/any/managed.cpp
:assert(mkn::gpu::Pointer{mem.data()}.is_managed_ptr());
test/any/add.cpp
:assert(mkn::gpu::Pointer{b.data()}.is_host_ptr());
test/any/add.cpp
:assert(mkn::gpu::Pointer{devA.p}.is_device_ptr());
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `Pointer` struct to ensure it is tested. # Test: Search for the `Pointer` struct usage. Expect: Verify if the struct is tested. rg --type cpp -A 5 $'Pointer'Length of output: 3493
176-176
: Inlinedestroy
function.The
destroy
function is now inline, which may optimize its usage. Ensure that this change is necessary and beneficial for performance.Verification successful
Inlining of the
destroy
function may not be necessary.The
destroy
function is primarily used in cleanup processes, such as destructors and deallocation functions, which are not typically performance-critical sections. Therefore, inlining this function may not provide significant performance benefits.
inc/mkn/gpu/device.hpp
: Used in destructors and deallocation functions.inc/mkn/gpu/alloc.hpp
: Used in deallocation functions.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `destroy` function to ensure inlining is beneficial. # Test: Search for the `destroy` function usage. Expect: Verify if inlining is beneficial. rg --type cpp -A 5 $'destroy'Length of output: 3666
Script:
#!/bin/bash # Description: Verify the context of the `destroy` function usage to determine if inlining is beneficial. # Test: Search for the `destroy` function usage within loops or performance-critical sections. rg --type cpp -A 10 -B 10 $'destroy('Length of output: 105
Script:
#!/bin/bash # Description: Verify the context of the `destroy` function usage to determine if inlining is beneficial. # Test: Search for the `destroy` function usage within loops or performance-critical sections. rg --type cpp -A 10 -B 10 'destroy\('Length of output: 7652
193-197
: Ensurecopy_on_device
function is tested.The
copy_on_device
function is added to facilitate copying data on the device. Ensure that this function is thoroughly tested.
82-84
: EnsuresetLimitMallocHeapSize
function is tested.The function sets the limit on the heap size for dynamic memory allocation. Ensure that this function is thoroughly tested.
inc/mkn/gpu/cuda.hpp (2)
255-256
: Inlinesync
function.The
sync
function is now inline, which may optimize its usage. Ensure that this change is necessary and beneficial for performance.
210-214
: Ensurecopy_on_device
function is tested.The
copy_on_device
function is added to facilitate copying data on the device. Ensure that this function is thoroughly tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- inc/mkn/gpu/cpu.hpp (9 hunks)
- inc/mkn/gpu/cuda.hpp (12 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- test/any/async_streaming.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/multi_launch.hpp
- test/any/async_streaming.cpp
Additional comments not posted (20)
inc/mkn/gpu/rocm.hpp (9)
45-53
: LGTM!The
gpuAssert
function has been made inline to potentially optimize its usage.
82-84
: LGTM!The
setLimitMallocHeapSize
function correctly sets the heap size limit for dynamic memory allocation using thehipDeviceSetLimit
API.
86-86
: LGTM!The
setDevice
function correctly sets the active GPU device using thehipSetDevice
API.
88-93
: LGTM!The
supportsCooperativeLaunch
function correctly checks if the selected device supports cooperative launches using thehipDeviceGetAttribute
API.
Line range hint
94-106
:
LGTM!The
Stream
structure correctly manages HIP streams, including creation, destruction, and synchronization, ensuring proper resource management.
193-197
: LGTM!The
copy_on_device
function correctly performs device-to-device memory copying using thehipMemcpy
API.
238-239
: LGTM!The
sync
function correctly synchronizes the device or a specific stream using thehipDeviceSynchronize
andhipStreamSynchronize
APIs.
244-265
: LGTM! But document the implications of the changes.The
launch
function has been modified to support cooperative kernel launches and conditional synchronization. Ensure that the implications of these changes are well-documented.
336-341
: LGTM!The
grid_sync
function correctly synchronizes the entire grid using the cooperative groups API.inc/mkn/gpu/cuda.hpp (11)
Line range hint
43-54
:
LGTM!The
gpuAssert
function has been made inline to potentially optimize its usage.
60-68
: LGTM!The
idx
function correctly calculates the thread index in a CUDA kernel, simplifying the process of determining a unique index for each thread.
70-73
: LGTM!The
block_idx_x
function correctly returns the block index in the x-dimension, useful for grid and block indexing.
87-89
: LGTM!The
setLimitMallocHeapSize
function correctly sets the heap size limit for dynamic memory allocation using thecudaDeviceSetLimit
API.
91-91
: LGTM!The
setDevice
function correctly sets the active GPU device using thecudaSetDevice
API.
93-97
: LGTM!The
supportsCooperativeLaunch
function correctly checks if the selected device supports cooperative launches using thecudaDeviceGetAttribute
API.
Line range hint
98-110
:
LGTM!The
Stream
structure correctly manages CUDA streams, including creation, destruction, and synchronization, ensuring proper resource management.
210-214
: LGTM!The
copy_on_device
function correctly performs device-to-device memory copying using thecudaMemcpy
API.
255-256
: LGTM!The
sync
function correctly synchronizes the device or a specific stream using thecudaDeviceSynchronize
andcudaStreamSynchronize
APIs.
261-282
: LGTM! But document the implications of the changes.The
launch
function has been modified to support cooperative kernel launches and conditional synchronization. Ensure that the implications of these changes are well-documented.
359-364
: LGTM!The
grid_sync
function correctly synchronizes the entire grid using the cooperative groups API.
inc/mkn/gpu/rocm.hpp
Outdated
struct StreamEvent { | ||
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | ||
~StreamEvent() { /*MKN_GPU_ASSERT(result = hipEventDestroy(event));*/ | ||
} | ||
|
||
auto& operator()() { return event; }; | ||
void record() { MKN_GPU_ASSERT(result = hipEventRecord(event, stream())); } | ||
bool finished() const { return hipEventQuery(event) == hipSuccess; } | ||
void reset() { | ||
if (event) MKN_GPU_ASSERT(result = hipEventDestroy(event)); | ||
MKN_GPU_ASSERT(result = hipEventCreate(&event)); | ||
} | ||
|
||
Stream& stream; | ||
hipError_t result; | ||
hipEvent_t event = nullptr; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable the destructor code for StreamEvent
.
The commented-out destructor code should be enabled to ensure proper resource management and avoid resource leaks.
- /*MKN_GPU_ASSERT(result = hipEventDestroy(event));*/
+ MKN_GPU_ASSERT(result = hipEventDestroy(event));
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
struct StreamEvent { | |
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | |
~StreamEvent() { /*MKN_GPU_ASSERT(result = hipEventDestroy(event));*/ | |
} | |
auto& operator()() { return event; }; | |
void record() { MKN_GPU_ASSERT(result = hipEventRecord(event, stream())); } | |
bool finished() const { return hipEventQuery(event) == hipSuccess; } | |
void reset() { | |
if (event) MKN_GPU_ASSERT(result = hipEventDestroy(event)); | |
MKN_GPU_ASSERT(result = hipEventCreate(&event)); | |
} | |
Stream& stream; | |
hipError_t result; | |
hipEvent_t event = nullptr; | |
}; | |
struct StreamEvent { | |
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | |
~StreamEvent() { MKN_GPU_ASSERT(result = hipEventDestroy(event)); | |
} | |
auto& operator()() { return event; }; | |
void record() { MKN_GPU_ASSERT(result = hipEventRecord(event, stream())); } | |
bool finished() const { return hipEventQuery(event) == hipSuccess; } | |
void reset() { | |
if (event) MKN_GPU_ASSERT(result = hipEventDestroy(event)); | |
MKN_GPU_ASSERT(result = hipEventCreate(&event)); | |
} | |
Stream& stream; | |
hipError_t result; | |
hipEvent_t event = nullptr; | |
}; |
inc/mkn/gpu/cuda.hpp
Outdated
struct StreamEvent { | ||
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | ||
~StreamEvent() { | ||
if (start) { | ||
MKN_GPU_ASSERT(result = cudaEventDestroy(start)) | ||
} | ||
if (stop) { | ||
MKN_GPU_ASSERT(result = cudaEventDestroy(stop)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the destructor code for StreamEvent
.
The destructor code has a syntax error and should be corrected to ensure proper resource management.
- MKN_GPU_ASSERT(result = cudaEventDestroy(start))
+ MKN_GPU_ASSERT(result = cudaEventDestroy(start));
- MKN_GPU_ASSERT(result = cudaEventDestroy(stop))
+ MKN_GPU_ASSERT(result = cudaEventDestroy(stop));
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
struct StreamEvent { | |
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | |
~StreamEvent() { | |
if (start) { | |
MKN_GPU_ASSERT(result = cudaEventDestroy(start)) | |
} | |
if (stop) { | |
MKN_GPU_ASSERT(result = cudaEventDestroy(stop)) | |
} | |
} | |
struct StreamEvent { | |
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | |
~StreamEvent() { | |
if (start) { | |
MKN_GPU_ASSERT(result = cudaEventDestroy(start)); | |
} | |
if (stop) { | |
MKN_GPU_ASSERT(result = cudaEventDestroy(stop)); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (6)
inc/mkn/gpu/rocm.hpp (6)
88-93
: Ensure testing ofsupportsCooperativeLaunch
function.The function
supportsCooperativeLaunch
appears to lack active test cases. The only reference found is commented out intest/any/coop.cpp
. Please ensure that this function is properly tested.
test/any/coop.cpp
: Commented-out reference tosupportsCooperativeLaunch
.Analysis chain
Ensure testing of
supportsCooperativeLaunch
function.Ensure that the
supportsCooperativeLaunch
function is tested to verify its correctness.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `supportsCooperativeLaunch` function. # Test: Search for test cases involving the `supportsCooperativeLaunch` function. Expect: Test cases should exist. rg --type cpp -A 5 $'supportsCooperativeLaunch'Length of output: 1226
40-45
: Review the necessity ofhip/hip_cooperative_groups.h
includeThe include for
hip/hip_runtime.h
is necessary as its constructs are used in the file. However, there is no evidence of usage forhip/hip_cooperative_groups.h
. Please review and remove the unnecessary include if it is not required.
hip/hip_runtime.h
: Necessaryhip/hip_cooperative_groups.h
: UnnecessaryAnalysis chain
Verify the necessity of new includes.
Ensure that the new includes for
hip/hip_runtime.h
andhip/hip_cooperative_groups.h
are necessary for the added functionality.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the necessity of the new includes. # Test: Search for the usage of the included headers. Expect: The headers should be used in the file. rg --type cpp -A 5 $'#include "hip/hip_runtime.h"' rg --type cpp -A 5 $'#include "hip/hip_cooperative_groups.h"'Length of output: 996
Script:
#!/bin/bash # Description: Verify the usage of specific constructs from the included headers. # Test: Search for specific constructs from hip/hip_runtime.h and hip/hip_cooperative_groups.h in the file inc/mkn/gpu/rocm.hpp. # Searching for constructs from hip/hip_runtime.h rg --type cpp 'hipLaunchKernelGGL|hipMalloc|hipFree|hipMemcpy|hipStreamCreate|hipStreamDestroy' inc/mkn/gpu/rocm.hpp # Searching for constructs from hip/hip_cooperative_groups.h rg --type cpp 'cooperative_groups::' inc/mkn/gpu/rocm.hppLength of output: 1154
278-299
: Ensure testing of cooperative kernel launch enhancements.The enhancements to the
launch
function, specifically the support for cooperative kernel launches, are not currently covered by test cases. Please add tests to verify this functionality.
- File:
inc/mkn/gpu/rocm.hpp
- Lines: 278-299
Analysis chain
Ensure testing of
launch
function enhancements.Ensure that the enhancements to the
launch
function are tested to verify their correctness.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `launch` function enhancements. # Test: Search for test cases involving the `launch` function enhancements. Expect: Test cases should exist. rg --type cpp -A 5 $'launch'Length of output: 8042
Script:
#!/bin/bash # Description: Verify the testing of the cooperative kernel launch enhancements in the `launch` function. # Test: Search for test cases involving cooperative kernel launches. Expect: Test cases should exist. rg --type cpp -A 5 'hipLaunchCooperativeKernel'Length of output: 394
227-231
: Ensure testing ofcopy_on_device
function.No test cases were found for the
copy_on_device
function. Please ensure that this function is explicitly tested to verify its correctness.Analysis chain
Ensure testing of
copy_on_device
function.Ensure that the
copy_on_device
function is tested to verify its correctness.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `copy_on_device` function. # Test: Search for test cases involving the `copy_on_device` function. Expect: Test cases should exist. rg --type cpp -A 5 $'copy_on_device'Length of output: 1212
Script:
#!/bin/bash # Description: Verify the testing of the `copy_on_device` function by checking for tests involving `alloc.hpp` or functions that call `copy_on_device`. # Test: Search for test cases involving the `alloc.hpp` file or functions that use `copy_on_device`. rg --type cpp -A 5 'alloc.hpp'Length of output: 1093
Script:
#!/bin/bash # Description: Verify the testing of the `copy_on_device` function by checking for integration tests involving `alloc.hpp` or `copy_on_device`. # Test: Search for test files or directories that might contain integration tests involving `alloc.hpp` or `copy_on_device`. rg --type cpp -l 'alloc.hpp' | xargs -I {} rg --type cpp -l 'copy_on_device' {}Length of output: 143
82-84
: Ensure testing ofsetLimitMallocHeapSize
function.The
setLimitMallocHeapSize
function is not currently tested. Please add test cases to verify its correctness.
- File:
inc/mkn/gpu/rocm.hpp
- Lines: 82-84
Analysis chain
Ensure testing of
setLimitMallocHeapSize
function.Ensure that the
setLimitMallocHeapSize
function is tested to verify its correctness.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `setLimitMallocHeapSize` function. # Test: Search for test cases involving the `setLimitMallocHeapSize` function. Expect: Test cases should exist. rg --type cpp -A 5 $'setLimitMallocHeapSize'Length of output: 1049
Script:
#!/bin/bash # Description: Search for potential test files or directories that might contain tests for the `setLimitMallocHeapSize` function. # Find all files and directories with names indicating they might contain tests. fd test # Search for references to `setLimitMallocHeapSize` in likely test files. rg --type cpp -A 5 $'setLimitMallocHeapSize' tests/Length of output: 113
Script:
#!/bin/bash # Description: Search for C++ test files within the `test/` directory and check for references to `setLimitMallocHeapSize`. # Find all C++ test files in the `test/` directory. fd . --type f --extension cpp test/ # Search for references to `setLimitMallocHeapSize` in the found C++ test files. rg --type cpp -A 5 $'setLimitMallocHeapSize' test/Length of output: 428
86-86
: Ensure testing ofsetDevice
function.No direct test cases for the
setDevice
function were found in the repository. It is important to add tests to verify the correctness of this function.
- File:
inc/mkn/gpu/rocm.hpp
- Function:
setDevice
Analysis chain
Ensure testing of
setDevice
function.Ensure that the
setDevice
function is tested to verify its correctness.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `setDevice` function. # Test: Search for test cases involving the `setDevice` function. Expect: Test cases should exist. rg --type cpp -A 5 $'setDevice'Length of output: 1481
Script:
#!/bin/bash # Description: Verify the testing of the `setDevice` function. # Search for test files or directories fd test # Search for test cases involving the `setDevice` function in test files rg --type cpp -A 5 $'setDevice' $(fd test)Length of output: 63
Script:
#!/bin/bash # Description: Verify the testing of the `setDevice` function. # Search for test cases involving the `setDevice` function in all files rg --type cpp -A 5 $'setDevice'Length of output: 1481
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (9)
- inc/mkn/gpu/cli.hpp (1 hunks)
- inc/mkn/gpu/cpu.hpp (9 hunks)
- inc/mkn/gpu/cuda.hpp (11 hunks)
- inc/mkn/gpu/multi_launch.hpp (1 hunks)
- inc/mkn/gpu/rocm.hpp (10 hunks)
- mkn.yaml (2 hunks)
- res/mkn/clang_cuda.yaml (2 hunks)
- res/mkn/hipcc.yaml (2 hunks)
- test/any/async_streaming.cpp (1 hunks)
Files skipped from review as they are similar to previous changes (5)
- inc/mkn/gpu/cpu.hpp
- inc/mkn/gpu/cuda.hpp
- inc/mkn/gpu/multi_launch.hpp
- res/mkn/clang_cuda.yaml
- test/any/async_streaming.cpp
Additional comments not posted (9)
res/mkn/hipcc.yaml (2)
3-5
: Activation of local settings.The local settings have been activated by removing the comment markers. Ensure that these paths are correct and accessible in the build environment.
21-21
: Compiler updated to C++20.The compiler version has been updated from C++17 to C++20. This change should be verified for compatibility with all dependencies and the build environment.
mkn.yaml (2)
13-15
: Deactivation of ROCM arguments and tests.The arguments and test paths for the
rocm
profile have been commented out, indicating a potential deactivation. Ensure this change is intentional and verify its impact on the build and testing processes.
19-19
: Deactivation of CUDA argument.The argument for the
cuda
profile has been commented out, while the test paths remain active. Verify the necessity and impact of this change on the CUDA testing capabilities.inc/mkn/gpu/rocm.hpp (5)
159-183
: Ensure testing ofPointer
structure.Ensure that the
Pointer
structure is tested to verify its correctness.Verification successful
The
Pointer
structure is adequately tested.The search results show that the
Pointer
structure is referenced and tested in multiple test files, such astest/any/managed.cpp
andtest/any/add.cpp
. Assertions in these files verify the functionality of thePointer
structure.
test/any/managed.cpp
:assert(mkn::gpu::Pointer{mem.data()}.is_managed_ptr());
test/any/add.cpp
:assert(mkn::gpu::Pointer{b.data()}.is_host_ptr());
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the testing of the `Pointer` structure. # Test: Search for test cases involving the `Pointer` structure. Expect: Test cases should exist. rg --type cpp -A 5 $'Pointer'Length of output: 3493
45-53
: Verify the necessity and impact of makinggpuAssert
inline.Ensure that making the
gpuAssert
function inline is beneficial and does not introduce any issues.
57-68
: Ensure comprehensive testing ofidx
function.The
idx
function is a critical utility for indexing in HIP kernels. Ensure comprehensive testing across various grid and block configurations to validate its accuracy and reliability.
272-273
: Ensure testing ofsync
function overloads.Ensure that the
sync
function overloads are tested to verify their correctness.
381-386
: Ensure testing ofgrid_sync
function.Ensure that the
grid_sync
function is tested to verify its correctness.
struct StreamEvent { | ||
StreamEvent(Stream& stream_) : stream{stream_} { reset(); } | ||
~StreamEvent() { clear(); } | ||
|
||
StreamEvent(StreamEvent&& that) : stream{that.stream}, start{that.start}, stop{that.stop} { | ||
that.start = nullptr; | ||
that.stop = nullptr; | ||
} | ||
|
||
StreamEvent(StreamEvent const&) = delete; | ||
StreamEvent& operator=(StreamEvent const&) = delete; | ||
|
||
auto& operator()() { return stop; }; | ||
auto& record() { | ||
if (stage == 0) { | ||
MKN_GPU_ASSERT(result = hipEventRecord(start, stream())); | ||
++stage; | ||
} else { | ||
MKN_GPU_ASSERT(result = hipEventRecord(stop, stream())); | ||
++stage; | ||
} | ||
return *this; | ||
} | ||
auto& wait() { | ||
if (stage == 0) { | ||
MKN_GPU_ASSERT(result = hipStreamWaitEvent(stream(), start)); | ||
} else { | ||
MKN_GPU_ASSERT(result = hipStreamWaitEvent(stream(), stop)); | ||
} | ||
return *this; | ||
} | ||
|
||
void clear() { | ||
if (start) MKN_GPU_ASSERT(result = hipEventDestroy(start)); | ||
if (stop) MKN_GPU_ASSERT(result = hipEventDestroy(stop)); | ||
} | ||
bool finished() const { return stage == 2 and hipEventQuery(stop) == hipSuccess; } | ||
void reset() { | ||
clear(); | ||
MKN_GPU_ASSERT(result = hipEventCreate(&start)); | ||
MKN_GPU_ASSERT(result = hipEventCreate(&stop)); | ||
stage = 0; | ||
} | ||
|
||
Stream& stream; | ||
hipError_t result; | ||
hipEvent_t start = nullptr, stop = nullptr; | ||
std::uint16_t stage = 0; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable the destructor code for StreamEvent
.
The commented-out destructor code should be enabled to ensure proper resource management and avoid resource leaks.
- /*MKN_GPU_ASSERT(result = hipEventDestroy(event));*/
+ MKN_GPU_ASSERT(result = hipEventDestroy(event));
Ensure testing of StreamEvent
structure.
No test cases for the StreamEvent
structure were found in the codebase. Ensure that the StreamEvent
structure is tested to verify its correctness.
- Add test cases in the
test/
directory to cover theStreamEvent
structure. - Verify the proper functioning of the destructor and other methods.
Analysis chain
Enable the destructor code for StreamEvent
.
The commented-out destructor code should be enabled to ensure proper resource management and avoid resource leaks.
- /*MKN_GPU_ASSERT(result = hipEventDestroy(event));*/
+ MKN_GPU_ASSERT(result = hipEventDestroy(event));
Ensure testing of StreamEvent
structure.
Ensure that the StreamEvent
structure is tested to verify its correctness.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the testing of the `StreamEvent` structure.
# Test: Search for test cases involving the `StreamEvent` structure. Expect: Test cases should exist.
rg --type cpp -A 5 $'StreamEvent'
Length of output: 2477
Script:
#!/bin/bash
# Description: Search for test cases involving the `StreamEvent` structure.
# Find all potential test files
test_files=$(fd --type f --extension cpp --exec grep -l 'StreamEvent')
# Print the test files found
echo "Test files potentially using StreamEvent:"
echo "$test_files"
# Search within these files for instantiation and usage of StreamEvent
for file in $test_files; do
echo "Searching in $file"
rg 'StreamEvent' "$file"
done
Length of output: 175
Script:
#!/bin/bash
# Description: Search for test cases involving the `StreamEvent` structure across various file extensions and directories.
# Define common test file extensions
extensions=("cpp" "hpp" "h" "cxx" "cc")
# Search for files containing 'StreamEvent' across the defined extensions
for ext in "${extensions[@]}"; do
fd --type f --extension "$ext" --exec grep -l 'StreamEvent'
done
# Search for common test directories
test_dirs=$(fd --type d --glob '*test*')
# Print the test directories found
echo "Test directories potentially containing tests:"
echo "$test_dirs"
# Search within these directories for files containing 'StreamEvent'
for dir in $test_dirs; do
echo "Searching in directory $dir"
rg 'StreamEvent' "$dir"
done
Length of output: 867
closes #18
Summary by CodeRabbit
New Features
StreamEvent
struct for effective management of CUDA events.Improvements
StreamFunction
hierarchy.