-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Artifacts compared to Lua version #71
Comments
@genekogan It's not immediately clear to me what is causing these artifacts, but I'll look into it and see if I can figure it out. Have you tested any of the other models? And have you ruled out artifacts from the input images (like JPEG artifacts)? |
Just tried with vgg16. At first glance, it wasn't as noticeable but my eye is not very trained to it yet, and I can't compare to the Lua version as I don't see VGG16 available for it? As far as jpeg artifacts, it seems unlikely because the artifacts appear in different places each time you run it, and I'm using the exact same images for both the Lua and PyTorch versions. |
The VGG-16 model for the Lua version can be found here: https://gist.github.com/ksimonyan/211839e770f7b538e2d8. A list of many of the supported models for the Lua version can be found here: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models The full list of models supported by neural-style-pt can be found here: https://github.com/ProGamerGov/neural-style-pt/wiki/Other-Models |
Maybe there are layer implementation differences between Torch and PyTorch that could be causing the artifacts? Have you tried using resize convolutions (that are designed to deal with checker board artifacts)? |
I took a quick stab at this, by replacing at line 119:
with
and then doubling style_scale to compensate for the 2x upsampling. It doesn't solve the checkerboard problem for me, and is now also slower and more memory intensive. I made this change based on the discussion/advice here but I'm not sure I made the right change/understood it correctly. Need to read a bit deeper and try again, but wanted to put early attempts here. |
@genekogan ever have any success with this? One area that I had been looking in was in the deprocess code, which might differ slightly in how the normalization is done. I though t that the clamp might be doing it:
|
@ajhool I'm not sure if |
@genekogan Can you still reproduce this issue on the latest version of PyTorch? I was trying to see if my gradient normalization code fixed it, but I can't even get the original artifacts to show up like they did before. |
Just tried it, with pytorch 1.7.
content, style, and output: Same thing but with I'm still getting the checkerboard artifacts but normalizing the gradients seems to reduce those grayish de-saturated regions, which is very nice. I haven't found anything yet for the checkerboard artifacts. |
@genekogan I was testing with the brad_pitt.jpg example image and hokusai: Tests with and without gradient normalization (some control tests were set to (strength * strength) to make the weights similar to normalize gradients): There are artifacts in my test results, but they are nowhere near as bad as yours with the Mona Lisa. In the original neural-style, I found that Adam produced grayish regions when it's parameters like the beta parameters were not optimal. PyTorch's Adam optimizer actually uses the same parameters that I figured out for neural_style.lua. Maybe L-BFGS needs it's parameters tuned in order to avoid creating grayish areas? |
Another thing is that
https://github.com/ProGamerGov/neural-style-pt/blob/master/neural_style.py#L414 And from
https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L480 Hopefully now that I've figured out how to modify things in the backward pass without breaking neural-style-pt, we can get closer to a solution to the artifacts issue! More information about PyTorch autograd functions (what ScaleGradients uses) can be found here. |
This is what I get using default parameters (regular first, then with -normalize_gradients). The artifacts are still there, they are kind of mild (and with normalized gradients even milder). Your output seem to have no artifacts though. I'm a bit puzzled why we are getting differing results. Are you using ADAM, or is it the default settings for L-BFGS? I'm on the latest version of your master branch and pytorch 1.7. Maybe something even lower is causing the disparity? @alexjc wrote earlier a bit about the issue of muddy regions in this thread and this one. The paper by Heitz et al says Gram loss fails to represent the whole style feature distribution and suggest that swapping in Sliced Wasserstein Distance instead may help reduce or eliminate the muddy "desaturated" zones, and I also suspect it could improve the issue I was struggling with earlier of blending styles together. I tried to implement SWD the other day but ran into issues. I recall swapping in histogram loss helped a bit but did not solve it, I'm curious if this looks interesting to you. |
@genekogan These are the different parameters that I used for testing with both Adam and L-BFGS, I think (the tests above are all with L-BFGS, and master branch with no changes):
I think for the tests above, I may have also forgotten to set tv weight to zero, which I normally do. I also have in the past used a learning rate of 2 with the Adam optimizer in the original neural-style, but I'm not sure if I ever did the same with neural-style-pt. |
@genekogan I have implemented spatial decorrelation, color decorrelation, and transform robustness into neural-style-pt here to see if they could help resolve the artifacts, but it appears to have them so much worse as you can see above: https://gist.github.com/ProGamerGov/7294364e7e58d239fb1a8c0ae8a0957e The main area in the script where you can adjust parameters is here. Be warned that there appear to some bugs related to the size of your chosen content image and the FFT class. I still have to fix those for Captum and my other projects. The code is based on dream-creator and my work on Captum. There's a lot to experiment with, and you'll have to comment out / un-comment stuff to add or remove it for testing as the new features don't have argparse parameters. A good starting point might be Lucid's style transfer stuff: |
@ProGamerGov The artifacts in your above images look different, and look more like the kind of artifacts that you get when TV regularization is set too low. The ones in the initial post are more sporadic and spaced out. The artifacts seem to be lesser when style weight is increased relative to content weight, but I haven't tested that thoroughly. I will try out the the links. |
So, the Mona Lisa image uses the sRGB IEC61966-2.1 color profile and when PIL loads it and converts it to an RGB image there's a slight change in the colors. But that doesn't explain the artifacts and why changing the weights influences their prevalence. I also think that the artifacts in your images almost look like ISO noise artifacts, and that they may have been somewhat aligned due to the optimization process. |
@genekogan Since your artifacts don't seem to resemble the checkerboard artifacts from conv layers, maybe the issue comes from the pooling layers? Figure 3 from this research paper: https://arxiv.org/abs/1511.06394 seems to show examples that look like your artifacts. VGG pooling artifacts (click on the images to make them bigger): If we wanted to test it, we'd have to replace the pooling layers with L2 pooling layers. But I'm not sure how to turn the equation they give: L2(x) =√g∗x2 into a PyTorch class.
|
So, I can implement my own version of MaxPool2d in one of two ways:
Or:
But I'm not sure how to do the l2 pooling, or how to apply blur. Bluring with Conv2d seems to alter the size of the input. Edit, I have bluring setup for a MaxPool2d layer here:
|
Cool! I tried this out -- the results are very striking. I'm still getting artifacts, but they are almost gone with the normalized gradients option. The features also look much sharper and more saturated, and the muddy regions are also reduced, especially with the tv regularization turned off. Regular, default parameters with blurred maxpool2d: With normalized gradients: With tv_weight = 0 (un-normalized): With normalized gradients and tv_weight = 0: |
One hacky idea I have that could help balance the tradeoff between checkerboard artifacts/high-freq noise (which seem to reduce with increased TV regularization) and muddy regions (which reduce with decreased TV regularization) would be to modify the TVLoss by first multiplying it element-wise with a saturation map (or something similar) of the image before summing it all together. So instead of:
Something like:
where Or even simpler would be to use L2 sum instead of L1, i.e. square/raise each element in I'm not sure if there's much to gain from trying to optimize TV noise, as the effect is pretty subtle, and maybe the artifacts aren't even my biggest problem anymore, but some untrained/half-baked food for thought. |
Hi guys. I am not very good at your code, could you please provide the code of the neural_style.py file with ready changes maxpool2d or genekogan's tv optimization? How to use this code? |
Coming back to this repo after a long break, interested in developing this further...
I was just comparing this implementation to the original Lua version and noticed something I hadn't before. The two produce almost identical results, but the PyTorch version appears to produce very subtle artifacts.
The following is an example, using Hokusai as the style image. On the left side is neural-style (lua), on the right side is neural-style-pt.
Notice in scattered places the presence of high-frequency discolorations, often in almost checkerboard-like patterns. These do not appear in the Lua version. If you zoom in on a few parts of the neural-style-pt versions, you can see them clearly. Notice the pink and green checkers.
This generally happens consistently for any combination of content and style images, although for some style images the artifacts are more obvious. Sometimes obvious discolorations appear, other times they are smaller, giving the output an almost grainy appearance. The artifacts can be reduced by increasing
-tv_weight
but at the expense of content/style reconstruction, and even then it's still visible.I tried fixing it a few ways. Clamping the image between iterations (not just at the end) didn't fix it. I tried playing with the
TVLoss
module. For example, changingto an L2-loss, i.e.
also did not get rid of the artifact (I tried this because my reading of the TVLoss formula is that they use L2-loss not absolute values. But I'm not sure this makes a big difference.
The artifact is very subtle, but I'm hoping to fix it, as I'd like to produce more print-quality images in the future, and multi-stage or multi-scaled techniques on top may amplify the artifact. I wonder if you have any idea what might be causing this or what could potentially fix it.
The text was updated successfully, but these errors were encountered: