-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hgemm/bugfix] Added conditions for 1x8 and 1x4 kernel calls to enhance accuracy @open sesame 05/09 09:04 #2573
Conversation
Moving 1x8 kernel call after 4x4 kernel call. Added couple of testcases. Signed-off-by: Debadri Samaddar <[email protected]>
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2573. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-debadri, 💯 All CI checkers are successfully verified. Thanks.
Added condition for better accuracy while calling 1x4 and 1x8 kernels Signed-off-by: Debadri Samaddar <[email protected]>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-debadri, 💯 All CI checkers are successfully verified. Thanks.
Anyway, especially for non-SIMD operations (it's not going to be SIMDified because it's in if-statement condition), please note that |
@s-debadri Could you please apply this idea? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For the readability of novice developers in the future, please leave a comment that you have applied bitwise operators instead of modulos for the performance. Otherwise, novice developers won't understand why in the heck you are using & 0x...
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-debadri, 💯 All CI checkers are successfully verified. Thanks.
Used bitmasks for dimension checks. e.g: N % 8 is same as N & 0x7 Signed-off-by: Debadri Samaddar <[email protected]>
@myungjoo Added the comments as well. Thanks for your suggestion regarding this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-debadri, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
nntrainer/tensor/hgemm/hgemm.cpp
Outdated
} else if (M % 4 == 0 && N % 4 == 0 && K % 4 == 0) { | ||
hgemm_noTrans_4x4(M, N, K, A, K, B, N, C, N, alpha, beta); | ||
} else if (N % 8 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, there is a defect when kernel inside of this function is used for non-4-divisible K case.
How about adding conditions like other kernels?
nntrainer/tensor/hgemm/hgemm.cpp
Outdated
} else if (M % 4 == 0 && N % 4 == 0 && K % 4 == 0) { | ||
hgemm_noTrans_4x4(M, N, K, A, K, B, N, C, N, alpha, beta); | ||
} else if (N % 4 == 0) { | ||
} else if (K % 8 == 0 && N % 8 == 0) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
nntrainer/tensor/hgemm/hgemm.cpp
Outdated
} else if (N % 4 == 0) { | ||
} else if (K % 8 == 0 && N % 8 == 0) { | ||
hgemm_noTrans_1x8(M, N, K, A, K, B, N, C, N, alpha, beta); | ||
} else if (K % 8 == 0 && N % 4 == 0) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Good to go! Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes added in this PR:
K%8 == 0
condition added before calling 1x4 and 1x8 kernels to enhance accuracy.hgemm_noTrans_4x4
beforehgemm_noTrans_1x8
.hgemm_noTrans_1x8
.Self evaluation:
Signed-off-by: Debadri Samaddar [email protected]