Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running it in Pytorch 1.7 #2

Open
mancunian1792 opened this issue Nov 26, 2020 · 3 comments
Open

Error while running it in Pytorch 1.7 #2

mancunian1792 opened this issue Nov 26, 2020 · 3 comments

Comments

@mancunian1792
Copy link

Issue: Cluster GAN existing code doesn't work in Pytorch 1.7

Disclaimer: I know there is nothing wrong with the code or implementation you've written. But any help on this would be appreciated.

The following steps work in Pytorch 1.0 but not in torch 1.7.0+cu101

optimizer_ge = Adam(itertools.chain(encoder.parameters(), generator.parameters()) ....)
opt_disc = Adam(discriminator.parameters() .....) 

The generator and the encoder are updated together and the discriminator is updated separately.

The following is done for each batch of images

generator.train()
encoder.train()

generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()

optimizer_ge.zero_grad()

fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)

ge_loss = (Cross_entropy loss) + (Clustering_loss) 
ge_loss.backward(retain_graph=True)
optimizer_ge.step()

opt_disc.zero_grad()
# Compute vannila gan discriminator loss disc_loss using bce loss function
disc_loss.backward()
opt_disc.step()

The above code works fine in torch 1.0 but torch 1.7 throws the following error.

one of the variables needed for gradient computation has been modified by an inplace operation: 
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead. 
Hint: enable anomaly detection to find the operation that failed to
 compute its gradient, with torch.autograd.set_detect_anomaly(True).

The error seems to be resolved when I do

fake_op = discriminator(fake_image.detach())

or

ge_loss.backward(retain_graph=True)
disc_loss.backward()
optimizer_ge.step()
opt_disc.step()

However, the results after doing the above changes aren't matching up with the results of the code run in torch 1.0

@Hong753
Copy link

Hong753 commented Jan 23, 2021

It seems that that error appears due to using the same variables that require grad after implementing opt_GE.step()
I was able to correct that by simply re-initializing those variables,
i.e.,
fake_images = netG(zn, zc)
pred_fake = netD(fake_images)
...
optGE.step()
pred_fake = netD(fake_images.detach())
...
optD.step()

@djsavic
Copy link

djsavic commented Sep 7, 2022

@Hong753 Could you please paste the exact code how you solved this issue (and where the code should be pasted)? I am really struggling in migrating to pytorch from Keras and I would like to reproduce these results before even attempting to modify the code for other cases. Thank you

@timodw
Copy link

timodw commented Sep 28, 2022

@djsavic I have fixed it in the following way:
First I took a deepcopy of the discriminator using the Python copy library
`generator.train()
encoder.train()
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
optimizer_G.zero_grad()

d_c = copy.deepcopy(discriminator)

x, y = batch
#zn, zc, zc_idx = generator_input_sampler(latent_space_zn, batch_size=32) # create fake digits
zn, zc, zc_idx = sample_z(latent_dim=latent_space_zn, shape=BATCH_SIZE)`

Then I calculated pred_real and pred_fake using this copy
x_fake = generator(zn.to(device),zc.to(device)) # create fake imgs pred_real = d_c(x.to(device)) pred_fake = d_c(x_fake)

Now the code runs without errors and produces the results from the paper.
Hope this helped you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants