Error while running it in Pytorch 1.7 #2

mancunian1792 · 2020-11-26T05:41:51Z

Issue: Cluster GAN existing code doesn't work in Pytorch 1.7

Disclaimer: I know there is nothing wrong with the code or implementation you've written. But any help on this would be appreciated.

The following steps work in Pytorch 1.0 but not in torch 1.7.0+cu101

optimizer_ge = Adam(itertools.chain(encoder.parameters(), generator.parameters()) ....)
opt_disc = Adam(discriminator.parameters() .....)

The generator and the encoder are updated together and the discriminator is updated separately.

The following is done for each batch of images

generator.train()
encoder.train()

generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()

optimizer_ge.zero_grad()

fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)

ge_loss = (Cross_entropy loss) + (Clustering_loss) 
ge_loss.backward(retain_graph=True)
optimizer_ge.step()

opt_disc.zero_grad()
# Compute vannila gan discriminator loss disc_loss using bce loss function
disc_loss.backward()
opt_disc.step()

The above code works fine in torch 1.0 but torch 1.7 throws the following error.

one of the variables needed for gradient computation has been modified by an inplace operation: 
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead. 
Hint: enable anomaly detection to find the operation that failed to
 compute its gradient, with torch.autograd.set_detect_anomaly(True).

The error seems to be resolved when I do

fake_op = discriminator(fake_image.detach())

or

ge_loss.backward(retain_graph=True)
disc_loss.backward()
optimizer_ge.step()
opt_disc.step()

However, the results after doing the above changes aren't matching up with the results of the code run in torch 1.0

The text was updated successfully, but these errors were encountered:

Hong753 · 2021-01-23T09:34:34Z

It seems that that error appears due to using the same variables that require grad after implementing opt_GE.step()
I was able to correct that by simply re-initializing those variables,
i.e.,
fake_images = netG(zn, zc)
pred_fake = netD(fake_images)
...
optGE.step()
pred_fake = netD(fake_images.detach())
...
optD.step()

djsavic · 2022-09-07T19:53:59Z

@Hong753 Could you please paste the exact code how you solved this issue (and where the code should be pasted)? I am really struggling in migrating to pytorch from Keras and I would like to reproduce these results before even attempting to modify the code for other cases. Thank you

timodw · 2022-09-28T09:07:33Z

@djsavic I have fixed it in the following way:
First I took a deepcopy of the discriminator using the Python copy library
`generator.train()
encoder.train()
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
optimizer_G.zero_grad()

d_c = copy.deepcopy(discriminator)

x, y = batch
#zn, zc, zc_idx = generator_input_sampler(latent_space_zn, batch_size=32) # create fake digits
zn, zc, zc_idx = sample_z(latent_dim=latent_space_zn, shape=BATCH_SIZE)`

Then I calculated pred_real and pred_fake using this copy
x_fake = generator(zn.to(device),zc.to(device)) # create fake imgs pred_real = d_c(x.to(device)) pred_fake = d_c(x_fake)

Now the code runs without errors and produces the results from the paper.
Hope this helped you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running it in Pytorch 1.7 #2

Error while running it in Pytorch 1.7 #2

mancunian1792 commented Nov 26, 2020

Hong753 commented Jan 23, 2021

djsavic commented Sep 7, 2022

timodw commented Sep 28, 2022

Error while running it in Pytorch 1.7 #2

Error while running it in Pytorch 1.7 #2

Comments

mancunian1792 commented Nov 26, 2020

Hong753 commented Jan 23, 2021

djsavic commented Sep 7, 2022

timodw commented Sep 28, 2022