-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Initial version of Addition Layer with OpenCL ops #2606
[GPU/OpenCL] Initial version of Addition Layer with OpenCL ops #2606
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2606. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
addition_cl(data, rdata, size, context); | ||
|
||
} else | ||
throw std::invalid_argument("Error: OpenCL fp16 is not supported yet."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is because we are not implementing FP16 yet. isn't it? If it is, then I think supporting FP16 is more important because the LLM is working based on FP16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added FP16 support for Addition Layer in the latest commit.
input_step_dim.batch(1); | ||
input_step_dim.height(to - from); | ||
|
||
Tensor input_step = input_.getSharedDataTensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we do like this way, we need to data transfer from CPU to GPU. Doesn't it affect the latency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723 ping!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jijoongmoon This optimisation will be a part of future PRs.
5de81ff
to
a8f089f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
9976775
to
a98c244
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
cf43132
to
39bd0d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723 LGTM. However, please address Jijoong's concern before merging.
Added naive version of OpenCL implementation for Addition Layer. Incorporated kernel for ops used. Added unit test for addition_layer_cl. Signed-off-by: yash.singh <[email protected]>
Added addition kernel to enhance reusability of the common blas kernels. Used AdditionLayer interface for both CPU and GPU calls. Signed-off-by: yash.singh <[email protected]> [GPU/OpenCL] Initial version of Addition Layer with OpenCL ops Added naive version of OpenCL implementation for Addition Layer. Incorporated kernel for ops used. Added unit test for addition_layer_cl. Signed-off-by: yash.singh <[email protected]> [GPU/OpenCL] Addition Kernel added in reusable blas OpenCL kernels Added addition kernel to enhance reusability of the common blas kernels. Used AdditionLayer interface for both CPU and GPU calls. Signed-off-by: yash.singh <[email protected]>
39bd0d9
to
3a17d5f
Compare
cibot: @yashSingh0723, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2606-202406241907050.10789704322815-3a17d5f84c1064d30aa730011032456d1efe5e94/. |
Added fp16 support for Addition layer Added unit tests for fp16 support Updated the Layer Semantics for GPU Signed-off-by: yash.singh <[email protected]>
3a17d5f
to
5ad3e53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yashSingh0723, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added initial version of Addition Layer for GPU. This is a basic implementation using naive kernel.
Changes added with this PR:
addition_layer_cl.cpp
added containing the newAdditionLayerCL
class for OpenCL implementation.LayerKernel
enum insidelayer_context.h.
unittest_layers_addition_cl.cpp
to test Addition Layer on GPU.Signed-off-by: Yash Singh [email protected]