-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ tensor ] Apply SIMD in matrix transpose #2603
[ tensor ] Apply SIMD in matrix transpose #2603
Conversation
…matrix transpose - Previously, matrix transpose was relying on naive for-loop implementaion. - Using SIMD instructions, there is a room to be latency-optimized. - Note that current implementation only supports half-precision matrices. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- Add new function "transpos_matrix" to use newly implemented matrix transpose code **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- If it is for height-width transpose, we can enjoy SIMD accelerated code. - Use SIMD version if possible, otherwise fallback. - Through this commit, followings are expected to be accelerated, or can be accelerated with ease in the near future: - "0:2:1" transpose - BiQHGEMM - HGEMM **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2603. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @skykongkong8, nntrainer/tensor/matrix_transpose_neon/matrix_transpose_kernels_neon.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md |
- Previously, there was a code defect when transposing matrix with non-4-divisible col length. - Bugfix and refactor its using interface: move transpose fallback when NEON is supported. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
433ec98
to
4efa98b
Compare
cibot: @skykongkong8, nntrainer/tensor/matrix_transpose_neon/matrix_transpose_kernels_neon.h does not include Doxygen tags such as @file @brief @author @bug. You must include the Doxygen tags in the source code. Please refer to a Doxygen manual at http://github.com/nnstreamer/TAOS-CI/blob/main/ci/doc/doxygen-documentation.md |
41f6812
to
bc598c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
nntrainer/tensor/matrix_transpose_neon/matrix_transpose_kernels_neon.h
Outdated
Show resolved
Hide resolved
bc598c4
to
c42beca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c42beca
to
c7daba7
Compare
- add doxygen tags to avoid CI fail - trivial formatting **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
c7daba7
to
845c7d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR for issue raised in #2582
Matrix Transpose function in the latest NNTrainer (14.05.24) is implemented using for-loops.
Although current implementation is useful for general use in (b,c,h,w)-Tensor transpose, it would be a little bit naive implementation for the (h,w)-matrix transpose.
Latency measurement
transpose("0:2:1")
)