[Layer] add tanh-based approximate gelu activation function #2658

baek2sm · 2024-07-01T11:49:15Z

add tanh-based approximate gelu(tanh gelu) for vision transformer.
rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu)

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

taos-ci · 2024-07-01T11:49:18Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2658. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

skykongkong8

Not an issue that is directly related in this PR, but seems activation functions in this implementation is not using SIMD implementation for half-precision at all. (For example, swish, softmax)
As you might know already, mathematical implementations are not really desired to be leaved here... Think we need to discuss about the implementation structure of the activation function computation in the near future.

DonghakPark

LGTM!

EunjuYang

LGTM

djeong20

This PR is currently from the branch in the upstream, not a forked repo!

SeoHyungjun

LGTM!

SeoHyungjun · 2024-07-03T03:18:56Z

nntrainer/layers/common_properties.h

-  ACT_MISH,       /**< Mish */
-  ACT_NONE,       /**< no op */
-  ACT_UNKNOWN     /**< unknown */
+  ACT_TANH,         /**< tanh */


There is a conflict. I think it will be fine if you just fix this.

I added pr 2665 which adds the missing activation type after this pr.

Thanks. I rebased it.

lhs8928

I know there are many GELU approximations but this one is representative because it mentioned in GELU paper.
So please consider that rename this activation as approximate gelu.

github-actions

Cpp-linter Review

Click here for the full clang-format patch

diff --git a/nntrainer/layers/acti_func.h b/nntrainer/layers/acti_func.h
index 8988a75..9e43219 100644
--- a/nntrainer/layers/acti_func.h
+++ b/nntrainer/layers/acti_func.h
@@ -476,2 +476,7 @@ public:
-      [&](T x) { return static_cast<T>(
-        0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);
+      [&](T x) {
+        return static_cast<T>(
+          0.5 * x *
+          (1 + tanhFloat<T>(
+                 static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3))))));
+      },
+      t_out);
@@ -490,2 +495,2 @@ public:
-                           Tensor &outgoing_derivative,
-                           Tensor const &incoming_derivative = Tensor()) {
+                               Tensor &outgoing_derivative,
+                               Tensor const &incoming_derivative = Tensor()) {
@@ -493 +498,2 @@ public:
-    ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
+    ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "
+            "not yet implemented");
@@ -505 +511,4 @@ public:
-      [&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);
+      [&](T x) {
+        return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x))));
+      },
+      t_out);
@@ -517,3 +526,4 @@ public:
-  static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
-                           Tensor &outgoing_derivative,
-                           Tensor const &incoming_derivative = Tensor()) {
+  static Tensor &
+  sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
+                   Tensor &outgoing_derivative,
+                   Tensor const &incoming_derivative = Tensor()) {
@@ -521 +531,2 @@ public:
-    ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
+    ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "
+            "function is not yet implemented");
diff --git a/nntrainer/layers/common_properties.h b/nntrainer/layers/common_properties.h
index 9a9bb0c..4a94f84 100644
--- a/nntrainer/layers/common_properties.h
+++ b/nntrainer/layers/common_properties.h
@@ -917,4 +917,3 @@ struct ActivationTypeInfo {
-  static constexpr const char *EnumStr[] = {"tanh",    "sigmoid",    "relu",
-                                            "softmax", "leaky_relu", "swish",
-                                            "gelu",    "tanh_gelu",  "sigmoid_gelu",
-                                            "none",    "unknown"};
+  static constexpr const char *EnumStr[] = {
+    "tanh", "sigmoid",   "relu",         "softmax", "leaky_relu", "swish",
+    "gelu", "tanh_gelu", "sigmoid_gelu", "none",    "unknown"};

Have any feedback or feature suggestions? Share it here.

github-actions · 2024-07-03T08:26:17Z

nntrainer/layers/acti_func.h

+      [&](T x) { return static_cast<T>(
+        0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);


clang-format suggestions

Suggested change

[&](T x) { return static_cast<T>(

0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);

[&](T x) {

return static_cast<T>(

0.5 * x *

(1 + tanhFloat<T>(

static_cast<T>(sqrt(2 / M_PI) * (x + 0.044715 * pow(x, 3))))));

},

t_out);

github-actions · 2024-07-03T08:26:17Z

nntrainer/layers/acti_func.h

+                           Tensor &outgoing_derivative,
+                           Tensor const &incoming_derivative = Tensor()) {


clang-format suggestions

Suggested change

Tensor &outgoing_derivative,

Tensor const &incoming_derivative = Tensor()) {

Tensor &outgoing_derivative,

Tensor const &incoming_derivative = Tensor()) {

github-actions · 2024-07-03T08:26:18Z

nntrainer/layers/acti_func.h

+                           Tensor &outgoing_derivative,
+                           Tensor const &incoming_derivative = Tensor()) {
+    // NYI
+    ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");


clang-format suggestions

Suggested change

ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");

ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "

"not yet implemented");

github-actions · 2024-07-03T08:26:18Z

nntrainer/layers/acti_func.h

   * @param[in] t_in input tensor
   * @param[in] t_out output tensor
   */
  template <typename T = float>
-  static Tensor &quickGelu(Tensor const &t_in, Tensor &t_out) {
+  static Tensor &sigmoidGelu(Tensor const &t_in, Tensor &t_out) {
    t_in.apply<T>(
      [&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);


clang-format suggestions

Suggested change

[&](T x) { return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x)))); }, t_out);

[&](T x) {

return static_cast<T>(x * (sigmoid<T>(static_cast<T>(1.702 * x))));

},

t_out);

github-actions · 2024-07-03T08:26:18Z

nntrainer/layers/acti_func.h

+  static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,
                           Tensor &outgoing_derivative,
                           Tensor const &incoming_derivative = Tensor()) {


clang-format suggestions

Suggested change

static Tensor &sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,

Tensor &outgoing_derivative,

Tensor const &incoming_derivative = Tensor()) {

static Tensor &

sigmoidGeluPrime(Tensor const &t_in, Tensor const &t_out,

Tensor &outgoing_derivative,

Tensor const &incoming_derivative = Tensor()) {

github-actions · 2024-07-03T08:26:18Z

nntrainer/layers/acti_func.h

                           Tensor &outgoing_derivative,
                           Tensor const &incoming_derivative = Tensor()) {
    // NYI
-    ml_logw("quickGeluPrime which is calculate derivate of quickGelu function is not yet implemented");
+    ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");


clang-format suggestions

Suggested change

ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");

ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "

"function is not yet implemented");

github-actions · 2024-07-03T08:26:18Z

nntrainer/layers/common_properties.h

+  static constexpr const char *EnumStr[] = {"tanh",    "sigmoid",    "relu",
+                                            "softmax", "leaky_relu", "swish",
+                                            "gelu",    "tanh_gelu",  "sigmoid_gelu",
+                                            "none",    "unknown"};


clang-format suggestions

Suggested change

static constexpr const char *EnumStr[] = {"tanh", "sigmoid", "relu",

"softmax", "leaky_relu", "swish",

"gelu", "tanh_gelu", "sigmoid_gelu",

"none", "unknown"};

static constexpr const char *EnumStr[] = {

"tanh", "sigmoid", "relu", "softmax", "leaky_relu", "swish",

"gelu", "tanh_gelu", "sigmoid_gelu", "none", "unknown"};

taos-ci

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

- add tanh-based approximate gelu(tanh gelu) for vision transformer. - rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu) **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

outdated suggestion

github-actions

Cpp-linter Review

No concerns from clang-format.

Great job! 🎉

Have any feedback or feature suggestions? Share it here.

taos-ci

@nnstreamer, 💯 All CI checkers are successfully verified. Thanks.

djeong20

LGTM

baek2sm requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928 and songgot as code owners July 1, 2024 11:49

baek2sm requested review from jihochu, DonghakPark, SeoHyungjun, skykongkong8, djeong20, EunjuYang and a team as code owners July 1, 2024 11:49

github-actions bot added the Need Review label Jul 1, 2024

taos-ci approved these changes Jul 1, 2024

View reviewed changes

skykongkong8 approved these changes Jul 2, 2024

View reviewed changes

DonghakPark approved these changes Jul 2, 2024

View reviewed changes

github-actions bot added PR/READY2MERGE and removed Need Review labels Jul 2, 2024

EunjuYang approved these changes Jul 2, 2024

View reviewed changes

djeong20 added the rebase required label Jul 3, 2024

djeong20 reviewed Jul 3, 2024

View reviewed changes

SeoHyungjun approved these changes Jul 3, 2024

View reviewed changes

SeoHyungjun assigned baek2sm Jul 3, 2024

lhs8928 approved these changes Jul 3, 2024

View reviewed changes

baek2sm force-pushed the tanh_gelu branch from c957d13 to b8c5812 Compare July 3, 2024 08:25

github-actions bot previously requested changes Jul 3, 2024

View reviewed changes

taos-ci approved these changes Jul 3, 2024

View reviewed changes

baek2sm changed the title ~~[Layer] add tanh-based approximate gelu activation function~~ [WIP][Layer] add tanh-based approximate gelu activation function Jul 3, 2024

baek2sm force-pushed the tanh_gelu branch from b8c5812 to dc18567 Compare July 4, 2024 02:35

github-actions bot approved these changes Jul 4, 2024

View reviewed changes

taos-ci approved these changes Jul 4, 2024

View reviewed changes

baek2sm changed the title ~~[WIP][Layer] add tanh-based approximate gelu activation function~~ [Layer] add tanh-based approximate gelu activation function Jul 4, 2024

baek2sm removed the rebase required label Jul 4, 2024

myungjoo mentioned this pull request Jul 6, 2024

[Layer] Add missing activation type #2665

Merged

djeong20 approved these changes Jul 8, 2024

View reviewed changes

myungjoo merged commit 64bd12e into main Jul 11, 2024
43 checks passed

djeong20 deleted the tanh_gelu branch July 26, 2024 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Layer] add tanh-based approximate gelu activation function #2658

[Layer] add tanh-based approximate gelu activation function #2658

baek2sm commented Jul 1, 2024

taos-ci commented Jul 1, 2024

taos-ci left a comment

skykongkong8 left a comment •

edited

Loading

DonghakPark left a comment

EunjuYang left a comment

djeong20 left a comment

SeoHyungjun left a comment

SeoHyungjun Jul 3, 2024

baek2sm Jul 3, 2024

lhs8928 left a comment

github-actions bot left a comment

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

github-actions bot Jul 3, 2024

taos-ci left a comment

github-actions bot left a comment

taos-ci left a comment

djeong20 left a comment

		[&](T x) { return static_cast<T>(
		0.5 * x * (1 + tanhFloat<T>(static_cast<T>(sqrt(2/M_PI) * (x + 0.044715 * pow(x, 3)))))); }, t_out);

		Tensor &outgoing_derivative,
		Tensor const &incoming_derivative = Tensor()) {

	ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is not yet implemented");
	ml_logw("tanhGeluPrime which is calculate derivate of tanhGelu function is "
	"not yet implemented");

	ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu function is not yet implemented");
	ml_logw("sigmoidGeluPrime which is calculate derivate of sigmoidGelu "
	"function is not yet implemented");

[Layer] add tanh-based approximate gelu activation function #2658

[Layer] add tanh-based approximate gelu activation function #2658

Conversation

baek2sm commented Jul 1, 2024

taos-ci commented Jul 1, 2024

taos-ci left a comment

Choose a reason for hiding this comment

skykongkong8 left a comment • edited Loading

Choose a reason for hiding this comment

DonghakPark left a comment

Choose a reason for hiding this comment

EunjuYang left a comment

Choose a reason for hiding this comment

djeong20 left a comment

Choose a reason for hiding this comment

SeoHyungjun left a comment

Choose a reason for hiding this comment

SeoHyungjun Jul 3, 2024

Choose a reason for hiding this comment

baek2sm Jul 3, 2024

Choose a reason for hiding this comment

lhs8928 left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

Cpp-linter Review

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

github-actions bot Jul 3, 2024

Choose a reason for hiding this comment

clang-format suggestions

taos-ci left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

Cpp-linter Review

taos-ci left a comment

Choose a reason for hiding this comment

djeong20 left a comment

Choose a reason for hiding this comment

skykongkong8 left a comment •

edited

Loading