-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ util ] Implement softmax calculation function in util #2479
Conversation
skykongkong8
commented
Feb 20, 2024
•
edited
Loading
edited
- Current activation functions are implemented as a function template, and fully computes with the function template parameter's precision unless using NEON intrinsics with inter-fp32-values explicitly.
- According to current papers, for safe convergence of mixed precision training, it is quite critical to calculate softmax with fp32 precision.
- This PR proposes a SIMD version of softmax calculation, and uses temporally higher precision when using half-precision
- For mathematical stability, applied linear translation (using minus values for the input of exponential function) to avoid precision overflow
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2479. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
8c72f55
to
c87da5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
float max(const unsigned int N, float *X) { | ||
#ifdef USE_NEON | ||
return nntrainer::neon::max(N, X); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for your future reference.
Use STL properly
std::vector<float> v (X, X+N);
return *std::max_element(v.begin(), v.end());
And if you compile it properly, you may get x64/SIMD (AVX/SSE) for free:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will apply this right away
94d3328
to
400b6ae
Compare
- Current softmax implementation does not consider fp32 use in half-precision softmax - Implement raw, and neon-simd version of softmax function with fp32 and fp16 with fp32 accumulation **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- Unlike isamax function of BLAS, this function returns the maximum 'value', not index - Note that this function is applicable only when the input data is continuous **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- For numerical stability, using negative values for the input of exponential function is recommended. (since negative output will range from 0 to 1) - Subtract the maximum value before calculating exponential vectors **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- Add exponential inplace function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- For cleaner code use std::max_element instead of for-loop **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
400b6ae
to
e128e06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except minor comments.
unsigned int i = 0; | ||
float sum = 0.f; | ||
float max_x = max(N, X); | ||
while (i < N) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might depend on the size of Matrix N, but you can also optimize further using omp and temporal buffer to save X[i] - max_x. You can optimize it later cause we also have to consider not using the NEON case.