[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

s-debadri · 2024-05-29T09:06:06Z

FC Layer GPU kernels added for fp16 operation:

Added blas_kernels_fp16.cpp for BLAS fp16 OpenCL kernels.
Used lda for SGEMV computation for generalization.
Unit tests added for FC Layer fp16 support on GPU.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar [email protected]

Added blas_kernels_fp16.cpp for fp16 kernels. fp16 unit tests added. Signed-off-by: Debadri Samaddar <[email protected]>

taos-ci · 2024-05-29T09:06:09Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2609. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

myungjoo · 2024-06-01T01:07:01Z

nntrainer/layers/cl_layers/blas_kernels.cpp

@@ -115,7 +115,7 @@ void sgemv_cl(const float *matAdata, const float *vecXdata, float *vecYdata,
      break;
    }

-    result = kernel_sgemv.SetKernelArguments(4, &dim2, sizeof(int));
+    result = kernel_sgemv.SetKernelArguments(4, &lda, sizeof(int));


Is this a bugfix? It doesn't look like an fp16 implementation.
If this is a bugfix, please elaborate what's wrong here in a bugfix commit message

Could you please separate bugfix commit and feature-implementation commit?
Mixing up the two topics in a single commit confuses reviewers.

I have used lda in SGEMV kernel to generalize it for future uses.

myungjoo

Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.

PTAL: @skykongkong8 @lhs8928

EunjuYang

It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g., sgemv_cl_kernel's parallel level is one thread per one component of out vector, which can be further parallelized.
It would be great to know the current speed-up status compared to CPU.

s-debadri · 2024-06-03T05:44:39Z

It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g., sgemv_cl_kernel's parallel level is one thread per one component of out vector, which can be further parallelized. It would be great to know the current speed-up status compared to CPU.

Yes kernels will be further improved going forward depending on the extent of optimizations we can achieve. Currently we are focusing on implementing the initial skeleton of running LLM on GPU.

skykongkong8 · 2024-06-04T01:38:31Z

Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.

PTAL: @skykongkong8 @lhs8928

It was one of my suggestions to use terms like lda, ldb, or ldc from previous reviews, although it might have been better to separate feature-implementation commit and bugfix commit. I could confirm current implementation is more desirable than before

jijoongmoon · 2024-06-04T09:51:38Z

Not this PR, blas_kernel code needs to be under the tensor directory for better maintenance.

[GPU/OpenCL] Added fp16 support for FC layer on GPU

3da33e4

Added blas_kernels_fp16.cpp for fp16 kernels. fp16 unit tests added. Signed-off-by: Debadri Samaddar <[email protected]>

s-debadri requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8, djeong20, EunjuYang and a team as code owners May 29, 2024 09:06

github-actions bot added the Need Review label May 29, 2024

taos-ci approved these changes May 29, 2024

View reviewed changes

myungjoo reviewed Jun 1, 2024

View reviewed changes

myungjoo approved these changes Jun 1, 2024

View reviewed changes

EunjuYang approved these changes Jun 3, 2024

View reviewed changes

github-actions bot added PR/READY2MERGE and removed Need Review labels Jun 3, 2024

skykongkong8 approved these changes Jun 4, 2024

View reviewed changes

jijoongmoon merged commit 3874915 into nnstreamer:main Jun 4, 2024
33 checks passed

s-debadri deleted the gpu_fc_fp16 branch June 5, 2024 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

s-debadri commented May 29, 2024

taos-ci commented May 29, 2024

taos-ci left a comment

myungjoo Jun 1, 2024 •

edited

Loading

s-debadri Jun 3, 2024

myungjoo left a comment

EunjuYang left a comment

s-debadri commented Jun 3, 2024

skykongkong8 commented Jun 4, 2024

jijoongmoon commented Jun 4, 2024

[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

Conversation

s-debadri commented May 29, 2024

taos-ci commented May 29, 2024

taos-ci left a comment

Choose a reason for hiding this comment

myungjoo Jun 1, 2024 • edited Loading

Choose a reason for hiding this comment

s-debadri Jun 3, 2024

Choose a reason for hiding this comment

myungjoo left a comment

Choose a reason for hiding this comment

EunjuYang left a comment

Choose a reason for hiding this comment

s-debadri commented Jun 3, 2024

skykongkong8 commented Jun 4, 2024

jijoongmoon commented Jun 4, 2024

myungjoo Jun 1, 2024 •

edited

Loading