You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the interest and good question! For MNIST classification, we use elementwise sigmoid followed by cross entropy. The output mean of the sigmoid will take the mean and variance from the previous layer (pre-activation linear layer) as input. This is how both the mean and variance can affect the final prediction.
There has also been follow-ups of NPN (e.g., work from ICLR 2018 if I remember correctly) trying to extend it with softmax layer.
I am curious how the classification setting works. You mention in your paper that you use the cross entropy loss.
Do you use as final layer a softmax? How do you propagate the variance through the softmax?
The text was updated successfully, but these errors were encountered: