-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvTranspose2d layers not being tracked #43
Comments
Hi, |
So, I'm trying to implement the autoencoder from world models (https://worldmodels.github.io/): This was the output from delve on that arch (minus the deconv, plus an extra layer cuz I couldn't read, and a straight-through rather than variational latent): I wasn't getting as good of reconstruction results as the other paper, and so I wanted to use delve to try and figure out the bottlenecks. I switched to a different arch that mostly used regular conv layers (well, coordconv because that paper looked useful and I'm already a mess) so I could use it. The arch in the graph I first posted:
I'm mostly just messing around at this point, I'm definitely not a professional in this field. Thanks for the help! |
For closure, I figured out what my issue was! I ended up going back to basics and implementing the network without any modifications. That didn't solve my issue; it actually made it worse! It looked like the network wasn't training at all, and I ended up using some code from here to look at the gradients in the network, which showed that they were tiny. The issue ended up being with my optimizer: I had messed with the Adam hyperparameters in a misguided attempt to fix some previous issue. Resetting those fixed it, and I got much better results than I had before. Now the graph from delve looks like this: Getting a graph of the gradients was super helpful; it might be another good statistic to track with delve. If you want, I can open another issue with a feature request for that. |
I have two minor updates regarding this. After digging through the code, I think a mid-sized refactoring is necessary to allow the inclusion of non-standard layers into the saturation statistics like transposed convolutions or more custom, functional-style convolutions, like the ones used in the EfficientNet-models. |
output:
This is an awesome tool, but I'd love to see how well the decoder part of my autoencoder works.
The text was updated successfully, but these errors were encountered: