Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Initial version of Reshape Layer with OpenCL ops #2651

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

niket-agarwal
Copy link
Contributor

Added initial version of Reshape Layer for GPU. This is a basic implementation using naive kernel.

Changes added with this PR:

  • reshape_cl.cpp added containing the new ReshapeLayerCL class for OpenCL implementation.
  • Added unittest_layers_reshape_cl.cpp to test Reshape Layer on GPU.

Signed-off-by: Niket Agarwal [email protected]

@taos-ci
Copy link

taos-ci commented Jun 26, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2651. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@djeong20
Copy link
Contributor

just wondering. why do we need this ReshapeLayer in the first place?
I think operations (e.g., forwarding, backwarding) would be a simple data copy that doesn't change any order of the data.

Comment on lines 51 to 46
std::string reshape_cl_kernel_ =
R"(__kernel void reshape_cl(__global const float* input,
__global float* output,
const int batchsize,
const int channels,
const int height,
const int width) {

int elements_per_batch = channels * height * width;
int global_id = get_global_id(0);
int batch_index = global_id / elements_per_batch;
int element_index = global_id % elements_per_batch;

if (batch_index < batchsize) {
int input_channel = element_index / (height * width);
int input_height = (element_index % (height * width)) / width;
int input_width = element_index % width;

int input_index = batch_index * channels * height * width +
input_channel * height * width +
input_height * width +
input_width;

int output_index = batch_index * elements_per_batch + element_index;

output[output_index] = input[input_index];
}
})";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the whole process is simply copying, couldn't it be as follows?

int i= get_global_id(0);
output[i] = input[i]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have tested and updated in the latest commit.

Copy link
Member

@SeoHyungjun SeoHyungjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* @brief Process data and dimensions for reshape operation
* @param[in] input Tensor
* @param[in] result Tensor
* @param[in] RunLayerContext reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the functions below, the same argument is written with a different explanation.

Suggested change
* @param[in] RunLayerContext reference
* @param[in] context RunLayerContext reference

Comment on lines 5 to 10
* @file unittest_layers_reshape_cl.cpp
* @date 18th June 2024
* @brief Reshape Layer Test
* @see https://github.com/nnstreamer/nntrainer
* @author Niket Agarwal <[email protected]>
* @bug No known bugs except for NYI items
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of the code is slightly different.
In addition, it would be better to change the date format and the order of @brief for unity with other parts.

Suggested change
* @file unittest_layers_reshape_cl.cpp
* @date 18th June 2024
* @brief Reshape Layer Test
* @see https://github.com/nnstreamer/nntrainer
* @author Niket Agarwal <[email protected]>
* @bug No known bugs except for NYI items
* @file unittest_layers_reshape_cl.cpp
* @date 18 June 2024
* @see https://github.com/nnstreamer/nntrainer
* @author Niket Agarwal <niket.a@samsung.com>
* @bug No known bugs except for NYI items
* @brief Reshape Layer Test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 51 to 47
std::string reshape_cl_kernel_ =
R"(__kernel void reshape_cl(__global const float* input,
__global float* output,
const int batchsize,
const int channels,
const int height,
const int width) {

int elements_per_batch = channels * height * width;
int global_id = get_global_id(0);
int batch_index = global_id / elements_per_batch;
int element_index = global_id % elements_per_batch;

if (batch_index < batchsize) {
int input_channel = element_index / (height * width);
int input_height = (element_index % (height * width)) / width;
int input_width = element_index % width;

int input_index = batch_index * channels * height * width +
input_channel * height * width +
input_height * width +
input_width;

int output_index = batch_index * elements_per_batch + element_index;

output[output_index] = input[input_index];
}
})";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kernel only holds for the case, where {dimsize,0,0}.
If you intended this global-level parallelization only, @djeong20 's suggestion seems to be better.
If you find more generalized one, however, please update this kernel to use the indices you computed in this code block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with @djeong20's suggestion. Thanks.

@niket-agarwal niket-agarwal force-pushed the gpu_reshape branch 2 times, most recently from ba6570e to 0a7c9cd Compare July 2, 2024 11:09
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the hard work! One last suggestion is renaming the reshape_cl kernel as copy_cl. Since the kernel is basically copying data, I think copy_cl better reflects the function of the kernel (the final result would be ReshapeLayerCl uses the copy_cl kernel).

@niket-agarwal niket-agarwal force-pushed the gpu_reshape branch 2 times, most recently from b9d7964 to 484f0f4 Compare July 3, 2024 06:58
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Added naive version of OpenCL implementation for Reshape Layer.
Incorporated kernel for ops used.
Added unit test for Reshape_layer_cl.

Signed-off-by: Niket Agarwal <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! LGTM

LayerGoldenTestParamOptions::SKIP_CALC_DERIV |
LayerGoldenTestParamOptions::SKIP_CALC_GRAD |
LayerGoldenTestParamOptions::USE_INC_FORWARD,
"nchw", "fp32", "fp32");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the fp16 test case and negative test case (maybe in a later PR?)

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jijoongmoon jijoongmoon merged commit 46400ac into nnstreamer:main Jul 3, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants