-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AVX] Added VCVT (between floating-point and integer) AVX support for x86 architecture #2480
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2480. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
} else { | ||
for (unsigned int i = 0; i < N; ++i) | ||
Y[i * incy] = static_cast<_FP16>(X[i * incx]); | ||
} | ||
#else | ||
for (unsigned int i = 0; i < N; ++i) | ||
Y[i * incy] = static_cast<_FP16>(X[i * incx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not for this PR or this month, but for later versions (you are now having technical debts here):
maintain the same code in general cpp files across NEON/AVX/None.
centralize architectural dependencies to a corresponding header and its implementation only.
You may have a single header for all three cases and choose a corresponding implementation (.c/.cpp file) at build time, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the better approach from the next implementation (and fix this when you are ready)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean something like:
// new file : blas_raw.cpp
...
void foo( ... ) {
...
} else {
for (unsigned int i = 0; i < N; ++i)
Y[i * incy] = static_cast<_FP16>(X[i * incx]);
}
...
}
...
and use it for all blas_interface.cpp
, blas_neon.cpp
, blas_avx.cpp
?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this up! I will keep it in mind for future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or.. you may have a base class as a set of operators/functions/methods for general CPU operations compatible for ALL systems with virtual functions and a derived class as a set of operators/functions/methods for architecture-dependent (SIMD) operations overriding the virtual functions.
Then, if you don't have an implementation in the derived class (SIMD), the fallback method in the base class will be automatically chosen. If you have an implementation in the derived class and the class is chosen at run-time/build-time, it will be automatically chosen WITHOUT having #if/#endif or even general if statements. You can control that in the base class initializer or "get_instance" method of the singleton design.
Because you still have a tight deadline for the release, don't start refactoring, yet. Try to refactor when you have completed the immenent release or when you start writing a new class or function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such refactoring needs planning and design before implementation. Don't just start without enough discussion and I don't want this refacotring effort hinder the immenent release.
Anyway, for your reference: #2482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept suggestion : https://github.com/myungjoo/nntrainer/tree/suggestion/refactoring/archdep
(this is not for actual PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both approaches seem great. I'll also research how other frameworks manage this. let's keep the discussion going
meson.build
Outdated
avx512_code = '''#include <immintrin.h> | ||
int main (int argc, char *argv[]) { const __m256i vec = {0, 1, 2, 3}; __m512 v = _mm512_cvtph_ps(vec); return 0; } | ||
''' | ||
has_avx512 = cc.compiles(avx512_code, args : '-march=native', name: 'avx512 support') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may proceed and fix it later, but...
Are you sure that the build machine is AVX512 capabile while the target machine is also AVX512 capable? As we build ARM binaries in X64 machines, we may build X64/AVX512 binaries in X64/non-AVX512 machines. We may need to force-enable them (and you will be required to apply proper arch name, in this case, -march=native won't work.) Note that a few instances in our OBS or QB might be not ready for AVX512.
The current meson script is fine for developers and most users.
And, the current script is fine for this release because the automated build system is not applied for x64 targets. (we do it only for arm targets)
But, you will need to consider this afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I thought that if AVX512 is not supported on the build machine, it should not be enabled in the first place. Since there are no AVX512 instructions in our code, I think we can modify it now without any problems.
… x86 architecture This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture. This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set. **Changes proposed in this PR:** - Added new VCVT instructions to the blas_interface - Fix incorrect meson.build logic in checking gcc version and AVX support - Add enable-avx option to enable AVX hardware acceleration **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>
25cad57
to
11d3784
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djeong20, 💯 All CI checkers are successfully verified. Thanks.
This pull request adds VCVT (between floating-point and integer) AVX support for the x86 architecture.
This feature allows efficient conversion between single-precision floating-point and half-precision floating-point data types using the AVX instruction set.
Changes proposed in this PR:
Self-evaluation: