diff --git a/images/output.png b/images/output.png new file mode 100644 index 0000000..ecd55a0 Binary files /dev/null and b/images/output.png differ diff --git a/tutorial notebooks/simple diffusion flax.ipynb b/tutorial notebooks/simple diffusion flax.ipynb index a51fcda..d479efb 100644 --- a/tutorial notebooks/simple diffusion flax.ipynb +++ b/tutorial notebooks/simple diffusion flax.ipynb @@ -6,6 +6,8 @@ "source": [ "# Diffusion Fundamentals\n", "\n", + "**Please bear with me at times and read through the entire notebook thoroughly.** I have tried to explain the concepts in the simplest way possible but some concepts are inherently complex and require a bit of mathematical understanding. I have tried to simplify the math as much as possible and have provided links to the original papers for further reading.\n", + "\n", "## Diffusion and Score Based Models\n", "\n", "### Introduction\n", @@ -107,7 +109,17 @@ "\n", "These are basically defined via noise schedules in FlaxDiff and thus using a discretized noise schedule vs a continuous one is what differentiates DDPM/DDIM from the latest techniques (and also score based methods but along with the loss function which is just slightly different).\n", "\n", - "You can read more about the generalization of the diffusion process in the paper [Score based generative modeling through stochastic differential equations](https://arxiv.org/pdf/2011.13456)" + "You can read more about the generalization of the diffusion process in the paper [Score based generative modeling through stochastic differential equations](https://arxiv.org/pdf/2011.13456)\n", + "\n", + "## The Intuition\n", + "\n", + "Now that we have formulated all the maths in the previous sections, let's try to build the intuition of what we are trying to do. As we discussed briefly in the [The Idea](#the-idea) section, Every image (or data sample) can be thought of as a point in a high dimensional space. For example, a 64x64 image is basically an array of 64x64x3=12288 pixels. If you treat this array as a vector, its a vector in a 12288 dimensional space. Now, its hard for our monkey brains to imagine anything beyond a 4D space, let alone a 12288D one, but bear with me. A noisy image is thus also a similar point in this same high dimensional space.\n", + "\n", + "Now, in the forward diffusion process, all we are doing is trying to go from the point of low entropy i.e, the data sample (image) $x_0$ to a point of high entropy i.e, a complete noise sample $\\epsilon_0$. Every point we encounter along the trajectory we take is our intermediate noisy image $x_t$. Here is a simple visualization of the process:\n", + "\n", + "\n", + "\n", + "The denoising diffusion model that we would train (or the score based model, depending on the formulation) is what guides us along the trajectory of the reverse diffusion process. It shows the way (the direction) from any noisy image towards a cleaner image sample. The samplers then take steps along this direction to reach the final denoised image. In the above visualization, the green curve does not truly represent the reverse path though because it can be that our denoising model takes us on a different path towards a different image in the vicinity of the starting noise, but it would be a clean image of low entropy nonetheless." ] }, {