Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Added fp16 support for FC layer on GPU #2609

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

s-debadri
Copy link
Contributor

FC Layer GPU kernels added for fp16 operation:

  • Added blas_kernels_fp16.cpp for BLAS fp16 OpenCL kernels.
  • Used lda for SGEMV computation for generalization.
  • Unit tests added for FC Layer fp16 support on GPU.

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar [email protected]

Added blas_kernels_fp16.cpp for fp16 kernels.
fp16 unit tests added.

Signed-off-by: Debadri Samaddar <[email protected]>
@taos-ci
Copy link

taos-ci commented May 29, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2609. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

@@ -115,7 +115,7 @@ void sgemv_cl(const float *matAdata, const float *vecXdata, float *vecYdata,
break;
}

result = kernel_sgemv.SetKernelArguments(4, &dim2, sizeof(int));
result = kernel_sgemv.SetKernelArguments(4, &lda, sizeof(int));
Copy link
Member

@myungjoo myungjoo Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bugfix? It doesn't look like an fp16 implementation.
If this is a bugfix, please elaborate what's wrong here in a bugfix commit message

Could you please separate bugfix commit and feature-implementation commit?
Mixing up the two topics in a single commit confuses reviewers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used lda in SGEMV kernel to generalize it for future uses.

Copy link
Member

@myungjoo myungjoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.

PTAL: @skykongkong8 @lhs8928

Copy link
Contributor

@EunjuYang EunjuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g., sgemv_cl_kernel's parallel level is one thread per one component of out vector, which can be further parallelized.
It would be great to know the current speed-up status compared to CPU.

@s-debadri
Copy link
Contributor Author

It's good to read your contributions in GPU enablement. One quick question. Do you have a plan to further improve the kernels? e.g., sgemv_cl_kernel's parallel level is one thread per one component of out vector, which can be further parallelized. It would be great to know the current speed-up status compared to CPU.

Yes kernels will be further improved going forward depending on the extent of optimizations we can achieve. Currently we are focusing on implementing the initial skeleton of running LLM on GPU.

@skykongkong8
Copy link
Member

Please identify the changes in blas_kernels.cpp before merging. It appears unrelated to other changes.

PTAL: @skykongkong8 @lhs8928

It was one of my suggestions to use terms like lda, ldb, or ldc from previous reviews, although it might have been better to separate feature-implementation commit and bugfix commit. I could confirm current implementation is more desirable than before

@jijoongmoon
Copy link
Collaborator

Not this PR, blas_kernel code needs to be under the tensor directory for better maintenance.

@jijoongmoon jijoongmoon merged commit 3874915 into nnstreamer:main Jun 4, 2024
33 checks passed
@s-debadri s-debadri deleted the gpu_fc_fp16 branch June 5, 2024 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants