-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609
Conversation
Added blas_kernels_fp16.cpp for fp16 kernels. fp16 unit tests added. Signed-off-by: Debadri Samaddar <[email protected]>
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2609. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-debadri, 💯 All CI checkers are successfully verified. Thanks.
@@ -115,7 +115,7 @@ void sgemv_cl(const float *matAdata, const float *vecXdata, float *vecYdata, | |||
break; | |||
} | |||
|
|||
result = kernel_sgemv.SetKernelArguments(4, &dim2, sizeof(int)); | |||
result = kernel_sgemv.SetKernelArguments(4, &lda, sizeof(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bugfix? It doesn't look like an fp16 implementation.
If this is a bugfix, please elaborate what's wrong here in a bugfix commit message
Could you please separate bugfix commit and feature-implementation commit?
Mixing up the two topics in a single commit confuses reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used lda
in SGEMV kernel to generalize it for future uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.
PTAL: @skykongkong8 @lhs8928
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g., sgemv_cl_kernel
's parallel level is one thread per one component of out vector, which can be further parallelized.
It would be great to know the current speed-up status compared to CPU.
Yes kernels will be further improved going forward depending on the extent of optimizations we can achieve. Currently we are focusing on implementing the initial skeleton of running LLM on GPU. |
It was one of my suggestions to use terms like lda, ldb, or ldc from previous reviews, although it might have been better to separate feature-implementation commit and bugfix commit. I could confirm current implementation is more desirable than before |
Not this PR, blas_kernel code needs to be under the tensor directory for better maintenance. |
FC Layer GPU kernels added for
fp16
operation:blas_kernels_fp16.cpp
for BLASfp16
OpenCL kernels.lda
for SGEMV computation for generalization.fp16
support on GPU.Self evaluation:
Signed-off-by: Debadri Samaddar [email protected]