v gates #25
evanatyourservice
started this conversation in
General
v gates
#25
Replies: 2 comments 1 reply
-
oh interesting, thanks for sharing this finding what happens if you replace the |
Beta Was this translation helpful? Give feedback.
0 replies
-
Ah yeah let me try that, I was actually wondering if the problem is the silu or not bounding activation but didn’t try any replacements. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
fwiw i've found v gates to cause unstable training, with some weird gradient explosions in early layers and NaNs, when the networks are very large. It helps with smaller networks, but doesn't train with large networks.
Beta Was this translation helpful? Give feedback.
All reactions