-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit test_hipcub_device_radix_sort memory usage #454
base: release-staging/rocm-rel-6.4
Are you sure you want to change the base?
Limit test_hipcub_device_radix_sort memory usage #454
Conversation
Could we not implement a run-time check on the current's GPU's total available memory via hipDeviceProp_t instead, and skip the test if the available memory is not sufficient? |
We already have a check at
|
On Windows, HipcubDeviceRadixSort.SortKeysLargeSizes fails due to an out of memory error on some devices. This happens because of an issue that sometimes causes hipMalloc to return hipSuccess for some allocation requests that are too large. This prevents us from being able to reliably detect whether a data size is too large for the test. This change works around the problem for now by limiting the data sizes that used are used for the test.
df2b24c
to
1b94b50
Compare
Good idea - I took a closer look at this, and the problem seems to be that in the latest Windows hipSDK build, hipMalloc returns hipSuccess with some sizes that returned hipErrorOutOfMemory in previous hipSDK builds. I'll log a bug with the hipSDK team. We can use this change as a workaround for now if you like. I've added a note about this problem to the "known issues" section of the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog wording changes.
Co-authored-by: spolifroni-amd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of working around SDK bugs. Is that bug still to be fixed before it releases? If so, then we wouldn't need this workaround.
@@ -1215,7 +1215,7 @@ inline void sort_keys_large_sizes() | |||
|
|||
hipStream_t stream = 0; | |||
|
|||
const std::vector<size_t> sizes = test_utils::get_large_sizes(seeds[0]); | |||
const std::vector<size_t> sizes = test_utils::get_large_sizes<34>(seeds[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to document workarounds in the source as well and not impact Linux builds.
const std::vector<size_t> sizes = test_utils::get_large_sizes<34>(seeds[0]); | |
// Workaround: `hipMalloc` always returns `hipSuccess` even when allocation fails. | |
// We limit the maximum size so this bug doesn't occur. | |
#ifdef _WIN32 | |
const std::vector<size_t> sizes = test_utils::get_large_sizes<34>(seeds[0]); | |
#else | |
const std::vector<size_t> sizes = test_utils::get_large_sizes(seeds[0]); | |
#endif |
On Windows, HipcubDeviceRadixSort.SortKeysLargeSizes fails due to an out of memory error on some devices. Limiting the data sizes that used are used for this test fixes this problem.