[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

djeong20 · 2024-02-20T09:06:50Z

This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture.
This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set.

Changes proposed in this PR:

Added new VCVT instructions to the blas_interface
Fix incorrect meson.build logic in checking gcc version and AVX support
Add enable-avx option to enable AVX hardware acceleration

Self-evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

taos-ci · 2024-02-20T09:06:53Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2480. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

skykongkong8

Nice!

myungjoo · 2024-02-21T05:58:42Z

nntrainer/tensor/blas_interface.cpp

+  } else {
+    for (unsigned int i = 0; i < N; ++i)
+      Y[i * incy] = static_cast<_FP16>(X[i * incx]);
+  }
 #else
  for (unsigned int i = 0; i < N; ++i)
    Y[i * incy] = static_cast<_FP16>(X[i * incx]);


This is not for this PR or this month, but for later versions (you are now having technical debts here):

maintain the same code in general cpp files across NEON/AVX/None.
centralize architectural dependencies to a corresponding header and its implementation only.

You may have a single header for all three cases and choose a corresponding implementation (.c/.cpp file) at build time, too.

Try the better approach from the next implementation (and fix this when you are ready)

You mean something like:

// new file : blas_raw.cpp ... void foo( ... ) { ... } else { for (unsigned int i = 0; i < N; ++i) Y[i * incy] = static_cast<_FP16>(X[i * incx]); } ... } ...

and use it for all blas_interface.cpp, blas_neon.cpp, blas_avx.cpp ?
?

Thank you for bringing this up! I will keep it in mind for future improvement.

Or.. you may have a base class as a set of operators/functions/methods for general CPU operations compatible for ALL systems with virtual functions and a derived class as a set of operators/functions/methods for architecture-dependent (SIMD) operations overriding the virtual functions.

Then, if you don't have an implementation in the derived class (SIMD), the fallback method in the base class will be automatically chosen. If you have an implementation in the derived class and the class is chosen at run-time/build-time, it will be automatically chosen WITHOUT having #if/#endif or even general if statements. You can control that in the base class initializer or "get_instance" method of the singleton design.

Because you still have a tight deadline for the release, don't start refactoring, yet. Try to refactor when you have completed the immenent release or when you start writing a new class or function.

Such refactoring needs planning and design before implementation. Don't just start without enough discussion and I don't want this refacotring effort hinder the immenent release.

Anyway, for your reference: #2482

The concept suggestion : https://github.com/myungjoo/nntrainer/tree/suggestion/refactoring/archdep
(this is not for actual PR)

both approaches seem great. I'll also research how other frameworks manage this. let's keep the discussion going

myungjoo · 2024-02-23T07:32:47Z

meson.build

+       avx512_code = '''#include <immintrin.h>
+       int main (int argc, char *argv[]) { const __m256i vec = {0, 1, 2, 3}; __m512 v = _mm512_cvtph_ps(vec); return 0; }
+       '''
+       has_avx512 = cc.compiles(avx512_code,  args : '-march=native', name: 'avx512 support')


You may proceed and fix it later, but...

Are you sure that the build machine is AVX512 capabile while the target machine is also AVX512 capable? As we build ARM binaries in X64 machines, we may build X64/AVX512 binaries in X64/non-AVX512 machines. We may need to force-enable them (and you will be required to apply proper arch name, in this case, -march=native won't work.) Note that a few instances in our OBS or QB might be not ready for AVX512.

The current meson script is fine for developers and most users.

And, the current script is fine for this release because the automated build system is not applied for x64 targets. (we do it only for arm targets)

But, you will need to consider this afterwards.

That makes sense. I thought that if AVX512 is not supported on the build machine, it should not be enabled in the first place. Since there are no AVX512 instructions in our code, I think we can modify it now without any problems.

… x86 architecture This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture. This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set. **Changes proposed in this PR:** - Added new VCVT instructions to the blas_interface - Fix incorrect meson.build logic in checking gcc version and AVX support - Add enable-avx option to enable AVX hardware acceleration **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>

taos-ci

@djeong20, 💯 All CI checkers are successfully verified. Thanks.

djeong20 requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8, EunjuYang and a team as code owners February 20, 2024 09:06

github-actions bot added the Need Review label Feb 20, 2024

taos-ci approved these changes Feb 20, 2024

View reviewed changes

skykongkong8 approved these changes Feb 20, 2024

View reviewed changes

myungjoo reviewed Feb 21, 2024

View reviewed changes

myungjoo approved these changes Feb 23, 2024

View reviewed changes

myungjoo reviewed Feb 23, 2024

View reviewed changes

djeong20 force-pushed the blas/avx/vcvt branch from 25cad57 to 11d3784 Compare February 26, 2024 02:03

github-actions bot removed the Need Review label Feb 26, 2024

github-actions bot added the PR/READY2MERGE label Feb 26, 2024

taos-ci approved these changes Feb 26, 2024

View reviewed changes

myungjoo merged commit 3d68b89 into nnstreamer:main Feb 27, 2024
30 checks passed

djeong20 deleted the blas/avx/vcvt branch March 20, 2024 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

djeong20 commented Feb 20, 2024

taos-ci commented Feb 20, 2024

taos-ci left a comment

skykongkong8 left a comment

myungjoo Feb 21, 2024

myungjoo Feb 21, 2024

skykongkong8 Feb 21, 2024

djeong20 Feb 21, 2024

myungjoo Feb 23, 2024 •

edited

Loading

myungjoo Feb 23, 2024

myungjoo Feb 23, 2024

djeong20 Feb 26, 2024

myungjoo Feb 23, 2024 •

edited

Loading

djeong20 Feb 26, 2024

taos-ci left a comment

[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480

Conversation

djeong20 commented Feb 20, 2024

taos-ci commented Feb 20, 2024

taos-ci left a comment

Choose a reason for hiding this comment

skykongkong8 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myungjoo Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myungjoo Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

myungjoo Feb 23, 2024 •

edited

Loading

myungjoo Feb 23, 2024 •

edited

Loading