Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hgemm/bugfix] Added conditions for 1x8 and 1x4 kernel calls to enhance accuracy @open sesame 05/09 09:04 #2573

Merged
merged 3 commits into from
May 13, 2024

Conversation

s-debadri
Copy link
Contributor

@s-debadri s-debadri commented May 8, 2024

Changes added in this PR:

  • K%8 == 0 condition added before calling 1x4 and 1x8 kernels to enhance accuracy.
  • Added hgemm_noTrans_4x4 before hgemm_noTrans_1x8.
  • Added test cases for hgemm_noTrans_1x8.
  • Used bitmasks for dimension checks to reduce latency.

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar [email protected]

Moving 1x8 kernel call after 4x4 kernel call.
Added couple of testcases.

Signed-off-by: Debadri Samaddar <[email protected]>
@taos-ci
Copy link

taos-ci commented May 8, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2573. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

Added condition for better accuracy while calling 1x4 and 1x8 kernels

Signed-off-by: Debadri Samaddar <[email protected]>
@s-debadri s-debadri changed the title [hgemm] Interchanged hgemm_noTrans_1x8 and hgemm_noTrans_4x4 calls [hgemm/bugfix] Added conditions for 1x8 and 1x4 kernel calls to enhance accuracy May 8, 2024
@taos-ci
Copy link

taos-ci commented May 8, 2024

:octocat: cibot: @s-debadri, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2573-202405081810420.82360291481018-ba95483e08b45051723aefb4a427b5a83ce3b4d6/.

@skykongkong8 skykongkong8 changed the title [hgemm/bugfix] Added conditions for 1x8 and 1x4 kernel calls to enhance accuracy [hgemm/bugfix] Added conditions for 1x8 and 1x4 kernel calls to enhance accuracy @open sesame 05/09 09:04 May 9, 2024
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

@myungjoo
Copy link
Member

myungjoo commented May 9, 2024

Anyway, especially for non-SIMD operations (it's not going to be SIMDified because it's in if-statement condition), please note that (N & 0x7) might be cheaper than (N % 8).

@skykongkong8
Copy link
Member

Anyway, especially for non-SIMD operations (it's not going to be SIMDified because it's in if-statement condition), please note that (N & 0x7) might be cheaper than (N % 8).

@s-debadri Could you please apply this idea?

Copy link
Member

@myungjoo myungjoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For the readability of novice developers in the future, please leave a comment that you have applied bitwise operators instead of modulos for the performance. Otherwise, novice developers won't understand why in the heck you are using & 0x... here.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

Used bitmasks for dimension checks.
e.g: N % 8 is same as N & 0x7

Signed-off-by: Debadri Samaddar <[email protected]>
@s-debadri
Copy link
Contributor Author

LGTM.

For the readability of novice developers in the future, please leave a comment that you have applied bitwise operators instead of modulos for the performance. Otherwise, novice developers won't understand why in the heck you are using & 0x... here.

@myungjoo Added the comments as well. Thanks for your suggestion regarding this change.

Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-debadri, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Member

@skykongkong8 skykongkong8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done!

} else if (M % 4 == 0 && N % 4 == 0 && K % 4 == 0) {
hgemm_noTrans_4x4(M, N, K, A, K, B, N, C, N, alpha, beta);
} else if (N % 8 == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, there is a defect when kernel inside of this function is used for non-4-divisible K case.
How about adding conditions like other kernels?

} else if (M % 4 == 0 && N % 4 == 0 && K % 4 == 0) {
hgemm_noTrans_4x4(M, N, K, A, K, B, N, C, N, alpha, beta);
} else if (N % 4 == 0) {
} else if (K % 8 == 0 && N % 8 == 0) {

This comment was marked as off-topic.

} else if (N % 4 == 0) {
} else if (K % 8 == 0 && N % 8 == 0) {
hgemm_noTrans_1x8(M, N, K, A, K, B, N, C, N, alpha, beta);
} else if (K % 8 == 0 && N % 4 == 0) {

This comment was marked as off-topic.

@myungjoo
Copy link
Member

myungjoo commented May 9, 2024

Good to go! Thanks!

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@myungjoo myungjoo merged commit 258043c into nnstreamer:main May 13, 2024
32 checks passed
@s-debadri s-debadri deleted the hgemm_tc branch May 23, 2024 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants