Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ blas ] Custom-defined scopy functions : aarch64, armv7, x86, incremental indices #2791

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

skykongkong8
Copy link
Member

@skykongkong8 skykongkong8 commented Nov 11, 2024

  • Using traditional cblas_scopy is causing some segfault in some cases.
  • Implement custom scopy function to avoid this issue.

X86 (AVX2 asm)

TC = 20, unit : microseconds

dim intrinsic cblas asm loop
3x1x768x768 567.384 238.796 346.572 1438.179

aarch64 (NEON asm), armv7l (NEON intrinsic)

TC = 20, unit : microseconds

dim intrinsic cblas asm loop
3x1x768x768 224.382 210.520 213.494 655.476

This is my first time coding in inline asm style, so there might be a little bit of awkwardness. Please review carefully.

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

- Using traditional cblas_scopy is causing some segfault in some cases.
- Implement custom scopy function to avoid this issue.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
@taos-ci
Copy link

taos-ci commented Nov 11, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2791. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link

taos-ci commented Nov 11, 2024

:octocat: cibot: @skykongkong8, nntrainer/tensor/blas_neon.cpp includes bug(s). Please fix incorrect coding constructs in your commit before entering a review process.

@taos-ci
Copy link

taos-ci commented Nov 11, 2024

:octocat: cibot: @skykongkong8, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2791-202411111330500.056691884994507-765b724b372bea90084e7970c778b5ea51ce56d2/.

- Current internal SIMD implementation does not support incremental indicies.
- Let __fallback_scopy to handle this issue.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
Copy link

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@baek2sm baek2sm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jijoongmoon jijoongmoon merged commit d7838fd into nnstreamer:main Nov 12, 2024
40 checks passed
@skykongkong8 skykongkong8 deleted the pr/blas/scopy branch November 14, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants