Multi-device C/C++ sample #4

Beanavil · 2023-09-26T10:31:42Z

Commits from DEB packaging and CI updates have been left out. Updated CI passing can be checked here.

Beanavil · 2023-09-26T11:40:04Z

FYI: the branch has conflicts because it's based on KhronosGroup/main instead of StreamHPC/main.

mfep · 2023-09-27T07:38:44Z

FYI: the branch has conflicts because it's based on KhronosGroup/main instead of StreamHPC/main.

Fixed!

mfep

I find the implemented example a good draft, however still a bit undirected, maybe as a result of multiple people working on it. The amount of required boilerplate and accompanying code (e.g. the ubiquitous padding logic) is quite large compared to the showcased OpenCL functionality.

I think this sample would be a fine opportunity to show more interesting OpenCL features, such as device partitioning and/or sub-buffers.

Let's discuss the next steps!

mfep · 2023-09-27T07:51:41Z

samples/core/multi-device/main.cpp

+void host_convolution(std::vector<cl_float> in, std::vector<cl_float>& out,
+                      std::vector<cl_float> mask, size_t x_dim, size_t y_dim)


The convention is to pass potentially expensive-to-copy arguments by (const) reference.

mfep · 2023-09-27T08:00:39Z

samples/core/multi-device/main.cpp

+        cl::Context context2 =
+            cl::sdk::get_context(triplets.at((triplets.size() >= 2)));


This implicit bool to int conversion seems to be quite unconventional. Can we have an explicit phrasing of the same logic here, e.g. a ternary operator?

mfep · 2023-09-27T08:01:51Z

samples/core/multi-device/main.cpp

+        // Query device and runtime capabilities.
+        auto d1_highest_device_opencl_c_is_2_x =
+            cl::util::opencl_c_version_contains(dev1, "2.");
+        auto d1_highest_device_opencl_c_is_3_x =
+            cl::util::opencl_c_version_contains(dev1, "3.");


The whole setup should be implemented in a for loop. Especially that the current implementation is buggy: we use the same -cl-std queried from the first device for both devices.

mfep · 2023-09-27T08:12:29Z

samples/core/multi-device/README.md

+## Key APIs and Concepts
+The main idea behind this example is that a given kernel can be run simultaneously by two (or potentially more) devices, therefore reducing its execution time. One can essentially think of two strategies for this workflow:
+1. each device computes its proportional part of the solution at its own speed and the results are combined on the host's side when finished, and
+2. each device executes the kernel at its own speed but after each iteration there is P2P communication between the devices to share the partial results.


I think it is only possible, when the two devices are on the same context (if the copy is done via Buffers). I.e. they must be on the same platform as well.

mfep · 2023-09-27T08:17:54Z

samples/core/multi-device/main.cpp

+        // Check that the WGSs can divide the global size (MacOS reports
+        // CL_INVALID_WORK_GROUP_SIZE otherwise). If WGS is smaller than the x
+        // dimension, then a NULL pointer will be used when initialising
+        // cl::EnqueueArgs for enqueuing the kernels.
+        if (pad_x_dim % wgs1 && pad_x_dim > wgs1)
+        {
+            size_t div = pad_x_dim / wgs1;
+            wgs1 = sqrt(div * wgs1);
+        }
+
+        if (pad_x_dim % wgs2 && pad_x_dim > wgs2)
+        {
+            size_t div = pad_x_dim / wgs2;
+            wgs2 = sqrt(div * wgs2);
+        }


I'm yet to understand why we go through all hassle with the local size, when the kernel itself doesn't use any work-group related function, nor local memory. Could you please give an input on that?

mfep · 2023-09-27T08:37:26Z

samples/core/multi-device/main.cpp

+        // Fill with 0s the extra rows and columns added for padding.
+        for (size_t j = 0; j < pad_x_dim; ++j)
+        {
+            for (size_t i = 0; i < pad_y_dim; ++i)
+            {
+                if (i == 0 || j == 0 || i == (pad_y_dim - 1)
+                    || j == (pad_x_dim - 1))
+                {
+                    h_input_grid[j + i * pad_x_dim] = 0;
+                }
+            }
+        }


Instead of the ubiquitous padding logic, the logic could be simplified by using Image2Ds for input and output. The read operation could be performed with a padded sampler. Obviously the padding still would be implemented in the host convolution, but it could be done on-the-fly, without regenerating the whole input array.

mfep · 2023-09-27T08:39:59Z

samples/core/multi-device/main.cpp

+    }
+}
+
+int main(int argc, char* argv[])


Can you think of a way to break up this very long function into meaningful subroutines? E.g. the generation of the input data, the command line parsing, the setup of the kernels, or the verification of the results could be such. This would help readers to follow the structure of the program, without going too deep into every section's details.

* Add BUILD_UTILITY_LIBRARIES option * Add whereami dependence * Add exe relative utilities * Update samples to use exe relative utilities * Improve diagnostic on missing file * Add missing default argument for error param * Add docs on file utilities * Add EOL * Fix typo Co-authored-by: Ronan Keryell <[email protected]> * Simplify byte size calculation Co-authored-by: Ronan Keryell <[email protected]> * Fix typo Co-authored-by: Ben Ashbaugh <[email protected]> * Fix formatting * Remove implicit narrowing conversions * No unnamed type on libSDK surface * warning: enumeration value x not handled in switch --------- Co-authored-by: Ronan Keryell <[email protected]> Co-authored-by: Ben Ashbaugh <[email protected]>

* Implemented callback sample * Minor fixes from code review * Minor fixes from code review II.

Beanavil force-pushed the multi-device-sample branch 3 times, most recently from f9818a5 to 92d35c5 Compare September 26, 2023 11:26

mfep force-pushed the main branch from 9f2dab7 to cd612e5 Compare September 27, 2023 07:37

mfep suggested changes Sep 27, 2023

View reviewed changes

Beanavil force-pushed the multi-device-sample branch 2 times, most recently from 26c7bcc to 2d62412 Compare October 27, 2023 12:50

Beanavil changed the title ~~Multi-device sample~~ Multi-device C/C++sample Oct 27, 2023

Beanavil changed the title ~~Multi-device C/C++sample~~ Multi-device C/C++ sample Oct 27, 2023

Beanavil force-pushed the multi-device-sample branch 5 times, most recently from 87a4154 to 419fe40 Compare October 30, 2023 11:09

Beanavil force-pushed the multi-device-sample branch from 11c0765 to b818605 Compare November 27, 2023 13:13

Implemented Callback sample (KhronosGroup#87)

cc5e561

* Implemented callback sample * Minor fixes from code review * Minor fixes from code review II.

Beanavil force-pushed the multi-device-sample branch from b818605 to b33c0e4 Compare December 7, 2023 14:20

Multi-device C/C++ sample

33535f9

Beanavil force-pushed the multi-device-sample branch from b33c0e4 to 0babaf4 Compare December 12, 2023 09:08

Fixes from review

b6475e4

Beanavil force-pushed the multi-device-sample branch from 0babaf4 to b6475e4 Compare December 13, 2023 12:06

Beanavil closed this Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-device C/C++ sample #4

Multi-device C/C++ sample #4

Beanavil commented Sep 26, 2023

Beanavil commented Sep 26, 2023

mfep commented Sep 27, 2023

mfep left a comment

mfep Sep 27, 2023

mfep Sep 27, 2023

mfep Sep 27, 2023

mfep Sep 27, 2023

mfep Sep 27, 2023

mfep Sep 27, 2023

mfep Sep 27, 2023

		void host_convolution(std::vector<cl_float> in, std::vector<cl_float>& out,
		std::vector<cl_float> mask, size_t x_dim, size_t y_dim)

		cl::Context context2 =
		cl::sdk::get_context(triplets.at((triplets.size() >= 2)));

Multi-device C/C++ sample #4

Multi-device C/C++ sample #4

Conversation

Beanavil commented Sep 26, 2023

Beanavil commented Sep 26, 2023

mfep commented Sep 27, 2023

mfep left a comment

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment

mfep Sep 27, 2023

Choose a reason for hiding this comment