Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Likelihood Computations? #81

Open
AlexanderMath opened this issue May 6, 2019 · 1 comment
Open

Error in Likelihood Computations? #81

AlexanderMath opened this issue May 6, 2019 · 1 comment

Comments

@AlexanderMath
Copy link

It seems there is an error in the NLL computations on MNIST. The article reports NLL in "bits per pixels". The "per pixel" part is computed by dividing by the shape of x

glow/model.py

Line 185 in eaff217

bits_x = nobj / (np.log(2.) * int(x.get_shape()[1]) * int(

But in "data_loaders/get_mnist_cifar.py" the MNIST data is padded with zeros to size 32x32.
x_train = np.lib.pad(x_train, ((0, 0), (2, 2), (2, 2)), 'minimum')

This has two consequences:
1. The NLL is divided by 32^2 and not 28^2. This improves the loss.
2. The "scaling penalty" is also multiplied by 32^2 and not 28^2 (see below). This worsens the loss.

glow/model.py

Line 172 in eaff217

objective += - np.log(hps.n_bins) * np.prod(Z.int_shape(z)[1:])

It is not clear to us why this would yield the correct likelihood computations.

On an intuitive level, it seems that the "per pixel" loss should be computed by dividing with the original data size 28x28 instead of the padded data size 32x32x3. Below we argue that if the computations were correct we could obtain loss arbitrarily close to 0 by just increasing the amount of zero padding.

Suppose we normalize the input to [-127.5, +127.5] and change the Gaussian to be N(0, 127.5/2). The scaling is then 1 so the log "scaling penalty" becomes 0. Since more zero padding increases the shape and decreases the loss, we can add more and more zero padding and make the loss arbitrarily close to 0, which seems to be a problem.

Our sincerest apologies if we understood something wrong. In either case, we'd be happy to see an argument for how one can compute the likelihood of a 28x28 MNIST image under a model that is trained on a 32x32 padded variant of MNIST. The reasons we are interested in this, is that it would allow a data augmentation trick that interpolates images to larger resolution by zero padding in fourier space. For appropriate scaling the fourier transform is unitary and thus has unit determinant.

@christabella
Copy link

I guess the zero padding is like the pre-processing in order to obtain input x, and from the paper "M is the dimensionality of x", which would be 32x32?

While this wouldn't impact the training/optimization since they are constants, I agree that intuitively it seems like the absolute likelihood computation is "wrong" as you pointed out since original x is 28x28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants