Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Broadcasting support added for GPU Addition kernel. #2759

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

niket-agarwal
Copy link
Contributor

@niket-agarwal niket-agarwal commented Oct 17, 2024

Performing addition where dimensions of InputA and InputB vary.
Added broadcasting support only where number of batches vary and other dimensions are same for both inputs.
Number of batch of InputB must be 1.
Output of add_i_cl(A,B) is stored in A inplace.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

@taos-ci
Copy link

taos-ci commented Oct 17, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2759. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.


CREATE_IF_EMPTY_DIMS(result, result.getDim());
CREATE_IF_EMPTY_DIMS(inputA, inputA.getDim());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question. The result tensor before modification could be empty, but is it possible that inputA tensor is empty in the current modification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! It shouldn't be. I'll modify it, thanks.

@niket-agarwal niket-agarwal force-pushed the broadcasting branch 2 times, most recently from 237e218 to b60e13f Compare October 18, 2024 08:05
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@niket-agarwal niket-agarwal force-pushed the broadcasting branch 2 times, most recently from 6d063ee to 52c823a Compare October 29, 2024 08:27
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@myungjoo
Copy link
Member

PTAL: @baek2sm @skykongkong8 @EunjuYang

addition_cl(data, rdata, size);

} else if (input.getDataType() == ml::train::TensorDim::DataType::FP16) {
void add_i_cl(Tensor &inputA, Tensor const &inputB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason you change the input & result to inputA and inputB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an output with inputA and inputB in the future, this change would make sense. If not, I think it would be better to preserve the naming input and result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same way. I guess the intention of this naming was to express "a = a + b" more clearly. (before "result(b) = input(a) + result(b)"). So the inputA and inputB parameter names are also good, but I think they seem to be inconsistent with other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'll update with this naming convention.

if (idx < size) {
output[idx] = output[idx] + input[idx];
if (idx < size_res) {
output[idx] = output[idx] + input[idx % size_input];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kernel, we are assuming size_res is always greater than or equal to size_input, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct.

*/
void add_i_cl(Tensor const &input, Tensor &result);
void add_i_cl(Tensor &inputA, Tensor const &inputB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming inputA is a result, and inputB is an input. is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are inputs, the addition is taking inplace and inputA is returned as output

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.
I left my opinions and questions bellow. Please check it. Thanks.

nntrainer/tensor/cl_operations/blas_kernel_interface.h Outdated Show resolved Hide resolved
nntrainer/tensor/cl_operations/blas_kernels.h Outdated Show resolved Hide resolved
nntrainer/tensor/cl_operations/blas_kernels_fp16.cpp Outdated Show resolved Hide resolved
@niket-agarwal niket-agarwal force-pushed the broadcasting branch 2 times, most recently from ed95ee3 to fcaad63 Compare November 13, 2024 07:30
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current main branch, unittest_layers_addition_cl is disabled.

# ../unittest/layers/unittest_layers_addition_cl.cpp \

Please enable it and check the unittest_layers pass all unittest cases with ./tools/android_test.sh. If you find *.nnlayergolden missing errors, please update unittest_layers.tar.gz as well. (c.f. #2798)

@EunjuYang
Copy link
Contributor

Also, for you added new feature of broadcasting support for addition kernel, what about adding unit test for the case?

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM!
As @EunjuYang mentioned, please add test cases for the newly added feature.

@niket-agarwal niket-agarwal force-pushed the broadcasting branch 2 times, most recently from 4c30910 to 45e15a2 Compare November 19, 2024 10:46
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Added support where number of batches vary for input A and input B.
Added unit test case for new feature in unittest_blas_kernels_cl.cpp

Self evaluation:

    Build test: [X]Passed [ ]Failed [ ]Skipped
    Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Niket Agarwal <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jijoongmoon jijoongmoon merged commit 7bd8e55 into nnstreamer:main Nov 22, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants