-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Layer] add tanh-based approximate gelu activation function #2658
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2658. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue that is directly related in this PR, but seems activation functions in this implementation is not using SIMD implementation for half-precision at all. (For example, swish
, softmax
)
As you might know already, mathematical implementations are not really desired to be leaved here... Think we need to discuss about the implementation structure of the activation function computation in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is currently from the branch in the upstream, not a forked repo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ACT_MISH, /**< Mish */ | ||
ACT_NONE, /**< no op */ | ||
ACT_UNKNOWN /**< unknown */ | ||
ACT_TANH, /**< tanh */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a conflict. I think it will be fine if you just fix this.
I added pr 2665 which adds the missing activation type after this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I rebased it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there are many GELU approximations but this one is representative because it mentioned in GELU paper.
So please consider that rename this activation as approximate gelu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cpp-linter Review
Click here for the full clang-format patch
diff --git a/nntrainer/layers/acti_func.h b/nntrainer/layers/acti_func.h
index 8988a75..9e43219 100644
--- a/nntrainer/layers/acti_func.h
+++ b/nntrainer/layers/acti_func.h
@@ -476,2 +476,7 @@ public:
- [&](T x) { return static_cast<T>(
- 0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);
+ [&](T x) {
+ return static_cast<T>(
+ 0.5 * x *
+ (1 + tanhFloat<T>(
+ static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3))))));
+ },
+ t_out);
@@ -490,2 +495,2 @@ public:
- Tensor &outgoing_derivative,
- Tensor const &incoming_derivative = Tensor()) {
+ Tensor &outgoing_derivative,
+ Tensor const &incoming_derivative = Tensor()) {
@@ -493 +498,2 @@ public:
- ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
+ ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "
+ "not yet implemented");
@@ -505 +511,4 @@ public:
- [&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);
+ [&](T x) {
+ return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x))));
+ },
+ t_out);
@@ -517,3 +526,4 @@ public:
- static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
- Tensor &outgoing_derivative,
- Tensor const &incoming_derivative = Tensor()) {
+ static Tensor &
+ sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
+ Tensor &outgoing_derivative,
+ Tensor const &incoming_derivative = Tensor()) {
@@ -521 +531,2 @@ public:
- ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
+ ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "
+ "function is not yet implemented");
diff --git a/nntrainer/layers/common_properties.h b/nntrainer/layers/common_properties.h
index 9a9bb0c..4a94f84 100644
--- a/nntrainer/layers/common_properties.h
+++ b/nntrainer/layers/common_properties.h
@@ -917,4 +917,3 @@ struct ActivationTypeInfo {
- static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu",
- "softmax", "leaky_relu", "swish",
- "gelu", "tanh_gelu", "sigmoid_gelu",
- "none", "unknown"};
+ static constexpr const char *EnumStr[] = {
+ "tanh", "sigmoid", "relu", "softmax", "leaky_relu", "swish",
+ "gelu", "tanh_gelu", "sigmoid_gelu", "none", "unknown"};
Have any feedback or feature suggestions? Share it here.
nntrainer/layers/acti_func.h
Outdated
[&](T x) { return static_cast<T>( | ||
0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
[&](T x) { return static_cast<T>( | |
0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out); | |
[&](T x) { | |
return static_cast<T>( | |
0.5 * x * | |
(1 + tanhFloat<T>( | |
static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3)))))); | |
}, | |
t_out); |
nntrainer/layers/acti_func.h
Outdated
Tensor &outgoing_derivative, | ||
Tensor const &incoming_derivative = Tensor()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
Tensor &outgoing_derivative, | |
Tensor const &incoming_derivative = Tensor()) { | |
Tensor &outgoing_derivative, | |
Tensor const &incoming_derivative = Tensor()) { |
nntrainer/layers/acti_func.h
Outdated
Tensor &outgoing_derivative, | ||
Tensor const &incoming_derivative = Tensor()) { | ||
// NYI | ||
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented"); | |
ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is " | |
"not yet implemented"); |
nntrainer/layers/acti_func.h
Outdated
* @param[in] t_in input tensor | ||
* @param[in] t_out output tensor | ||
*/ | ||
template <typename T = float> | ||
static Tensor &quickGelu(Tensor const &t_in, Tensor &t_out) { | ||
static Tensor &sigmoidGelu(Tensor const &t_in, Tensor &t_out) { | ||
t_in.apply<T>( | ||
[&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
[&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out); | |
[&](T x) { | |
return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); | |
}, | |
t_out); |
nntrainer/layers/acti_func.h
Outdated
static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out, | ||
Tensor &outgoing_derivative, | ||
Tensor const &incoming_derivative = Tensor()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out, | |
Tensor &outgoing_derivative, | |
Tensor const &incoming_derivative = Tensor()) { | |
static Tensor & | |
sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out, | |
Tensor &outgoing_derivative, | |
Tensor const &incoming_derivative = Tensor()) { |
nntrainer/layers/acti_func.h
Outdated
Tensor &outgoing_derivative, | ||
Tensor const &incoming_derivative = Tensor()) { | ||
// NYI | ||
ml_logw("quickGeluPrime which is calculate derivate of quickGelu function is not yet implemented"); | ||
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented"); | |
ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu " | |
"function is not yet implemented"); |
nntrainer/layers/common_properties.h
Outdated
static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu", | ||
"softmax", "leaky_relu", "swish", | ||
"gelu", "tanh_gelu", "sigmoid_gelu", | ||
"none", "unknown"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format suggestions
static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu", | |
"softmax", "leaky_relu", "swish", | |
"gelu", "tanh_gelu", "sigmoid_gelu", | |
"none", "unknown"}; | |
static constexpr const char *EnumStr[] = { | |
"tanh", "sigmoid", "relu", "softmax", "leaky_relu", "swish", | |
"gelu", "tanh_gelu", "sigmoid_gelu", "none", "unknown"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.
- add tanh-based approximate gelu(tanh gelu) for vision transformer. - rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu) **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cpp-linter Review
No concerns from clang-format.
Great job! 🎉
Have any feedback or feature suggestions? Share it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Self evaluation: