-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Broadcasting support added for GPU Addition kernel. #2759
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2759. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
|
||
CREATE_IF_EMPTY_DIMS(result, result.getDim()); | ||
CREATE_IF_EMPTY_DIMS(inputA, inputA.getDim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question. The result tensor before modification could be empty, but is it possible that inputA tensor is empty in the current modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! It shouldn't be. I'll modify it, thanks.
237e218
to
b60e13f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
b60e13f
to
aeb83e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
aeb83e7
to
507f943
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
6d063ee
to
52c823a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
52c823a
to
312b6ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
PTAL: @baek2sm @skykongkong8 @EunjuYang |
addition_cl(data, rdata, size); | ||
|
||
} else if (input.getDataType() == ml::train::TensorDim::DataType::FP16) { | ||
void add_i_cl(Tensor &inputA, Tensor const &inputB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason you change the input & result to inputA and inputB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an output with inputA and inputB in the future, this change would make sense. If not, I think it would be better to preserve the naming input and result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the same way. I guess the intention of this naming was to express "a = a + b"
more clearly. (before "result(b) = input(a) + result(b)"
). So the inputA and inputB parameter names are also good, but I think they seem to be inconsistent with other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I'll update with this naming convention.
if (idx < size) { | ||
output[idx] = output[idx] + input[idx]; | ||
if (idx < size_res) { | ||
output[idx] = output[idx] + input[idx % size_input]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this kernel, we are assuming size_res
is always greater than or equal to size_input
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct.
*/ | ||
void add_i_cl(Tensor const &input, Tensor &result); | ||
void add_i_cl(Tensor &inputA, Tensor const &inputB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming inputA is a result, and inputB is an input. is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are inputs, the addition is taking inplace and inputA is returned as output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
I left my opinions and questions bellow. Please check it. Thanks.
ed95ee3
to
fcaad63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
fcaad63
to
9dbde98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current main branch, unittest_layers_addition_cl
is disabled.
Line 479 in a80a6e1
# ../unittest/layers/unittest_layers_addition_cl.cpp \ |
Please enable it and check the unittest_layers pass all unittest cases with ./tools/android_test.sh
. If you find *.nnlayergolden missing errors, please update unittest_layers.tar.gz
as well. (c.f. #2798)
Also, for you added new feature of broadcasting support for addition kernel, what about adding unit test for the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM!
As @EunjuYang mentioned, please add test cases for the newly added feature.
4c30910
to
45e15a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
Added support where number of batches vary for input A and input B. Added unit test case for new feature in unittest_blas_kernels_cl.cpp Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Niket Agarwal <[email protected]>
45e15a2
to
9591434
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Performing addition where dimensions of InputA and InputB vary.
Added broadcasting support only where number of batches vary and other dimensions are same for both inputs.
Number of batch of InputB must be 1.
Output of add_i_cl(A,B) is stored in A inplace.
Self evaluation: