rearrange readings/miscs; add analysis part of math; fixed some typos

ZhouTimeMachine · Nov 1, 2024 · bdc546d · bdc546d
1 parent 15c1d71
commit bdc546d
Show file tree

Hide file tree

Showing 19 changed files with 332 additions and 14 deletions.
diff --git a/docs/courses/info-theory.md b/docs/courses/info-theory.md
@@ -0,0 +1,3 @@
+# Information Theory
+
+!!! warning "该页面还在施工中"
diff --git a/docs/courses/probability/prob_lim.md b/docs/courses/probability/prob_lim.md
@@ -261,7 +261,7 @@ $$
 
     也就得到了依分布收敛。
 
-!!! info "依概率收敛 $\nRightarrow$ 依分布收敛"
+!!! info "依概率收敛 $\Rightarrow$ 依分布收敛"
 
 ??? general "Counterexample"
 

diff --git a/docs/math/DEs/Intro2SDE/Brownian_noise.md b/docs/math/DEs/Intro2SDE/Brownian_noise.md
@@ -846,10 +846,30 @@ $$
 
 ## Sample Path Properties
 
-!!! warning "本节还在施工中"
+布朗运动的采样路径具有一定的 Hölder 连续性，为了详细阐述与证明，需要首先阐明 Hölder 连续性的定义。
+
+> 可以参考 [Hölder condition - Wikipedia](https://en.wikipedia.org/wiki/H%C3%B6lder_condition)，另外对于各种连续性在本笔记的 [Continuities](../../../analysis/continuities) 中也有详细的阐述
+
+!!! info "Hölder Continuity"
+    考虑函数 $f:[0, T]\to \mathbb{R}$ 与 $0 < \gamma \leqslant 1$：
+
+    **(1)** 如果存在常数 $K$ 使得下式成立，则称 $f$ 是 $\:\gamma$-Hölder 一致连续 (uniformly $\:\gamma$-Hölder continuous) 的：
+
+    $$
+    |f(t) - f(s)| \leqslant K|t-s|^\gamma, \quad \forall t, s\in [0, T]
+    $$
+
+    **(2)** 如果存在常数 $K$ 使得下式成立，则称 $f$ 是在 $s$ 点是 $\:\gamma$-Hölder 连续的：
+
+    $$
+    |f(t) - f(s)| \leqslant K|t-s|^\gamma, \quad \forall t\in [0, T]
+    $$
+
+对于随机过程的采样路径的 Hölder 连续性，Kolmogorov 连续性定理常常被使用：
 
 !!! abstract "Kolmogorov continuity theorem"
-
+    asd
 
+> 详见 [Kolmogorov continuity theorem - Wikipedia](https://en.wikipedia.org/wiki/Kolmogorov_continuity_theorem)
 
 ## Markov Property
diff --git a/docs/math/misc/continuities.md → docs/math/analysis/continuities.md b/docs/math/misc/continuities.md → docs/math/analysis/continuities.md
diff --git a/docs/math/analysis/funcRvar/LebesgueMeasure.md b/docs/math/analysis/funcRvar/LebesgueMeasure.md
@@ -0,0 +1,5 @@
+<link rel="stylesheet" href="../../../../css/counter.css" />
+
+# Lebesgue Measure
+
+!!! warning "本页面还在施工中"
diff --git a/docs/math/analysis/funcRvar/index.md b/docs/math/analysis/funcRvar/index.md
@@ -0,0 +1,41 @@
+# Functions of a Real Variable
+
+> *real analysis* (实分析) is too difficult for me, so I just try to learn *functions of a real variable* (实变函数) first. 
+
+!!! warning "本页面还在施工中。目前计划在[智云课堂](https://classroom.zju.edu.cn)学习贾厚玉老师开设的《实变函数》"
+
+## Contents
+
+- [Basics of Set Theory](sets.md)
+- [Lebesgue Measure](LebesgueMeasure.md)
+- ......(to be continued)
+
+## Objective: Lebesgue Integral
+
+Riemann 可积函数列存在一个问题，极限运算是不封闭的，也即可积函数列的极限函数可能不可积。
+
+!!! example "极限运算不封闭"
+    记 $[0, 1]$ 上的可积函数类为 $R[0, 1]$，则考虑 $[0, 1]$ 上的“二进有理数”数列：
+
+    $$
+        0, 1, \frac{1}{2}, \frac{1}{4}, \frac{3}{4}, \frac{1}{8}, \frac{3}{8}, \frac{5}{8}, \frac{7}{8}, \cdots
+    $$
+
+    即该数列由 $0, 1$ 以及形式为 $1/2^n, 3/2^n, \cdots, (2^n-1)/2^n$ 的数构成，记该数列为 $\{r_n\}$。则定义函数列 $f_k(x)$ 如下：
+
+    $$
+        f_k(x) = \begin{cases}
+            1, & x = r_n, k\geqslant n \in \mathbb{N} \\
+            0, & \text{otherwise}
+        \end{cases}
+    $$
+
+    对于其极限函数 $f(x)$，对于任意细的划分，每个子区间一定有取值为 1 的离散点和取值为 0 的离散点，因此类似于 Dirichlet 函数，有 $f\notin R[0, 1]$。
+
+引入 Lebesgue 可积函数类，是因为其在极限运算下是封闭的，即积分和极限运算可以交换：
+
+$$
+\lim_{n\to \infty} \int_a^b f_n(x)\mathrm{d}x = \int_a^b \lim_{n\to \infty} f_n(x)\mathrm{d}x
+$$
+
+而且 Lebesgue 可积是对 Riemann 可积的扩展，即原来 Riemann 可积的函数依然也是 Lebesgue 可积的。因此，Lebesgue 积分对 Riemann 积分进行了完备化。Lebesgue 积分正是实变函数的核心内容。
diff --git a/docs/math/analysis/funcRvar/sets.md b/docs/math/analysis/funcRvar/sets.md
@@ -0,0 +1,38 @@
+<link rel="stylesheet" href="../../../../css/counter.css" />
+
+# Basics of Set Theory
+
+## Common Symbols
+
+- 幂集 (power set)：$\mathcal{P}(X) = \{A: A\subseteq X\}$
+- 用指标集 $\Lambda$ 选取 $X$ 的子集构成集族：$\mathcal{A} = \{A_{\alpha}\subseteq X: \alpha\in\Lambda\}$
+- 用指标集可以方便地表示多个集合的并、交等运算：
+
+$$
+\begin{gathered}
+    \bigcup_{\alpha\in\Lambda}A_{\alpha} = \{x: \exists \alpha\in \Lambda, x\in A_{\alpha}\}, \\
+    \bigcap_{\alpha\in\Lambda}A_{\alpha} = \{x: \forall \alpha\in \Lambda, x\in A_{\alpha}\}. \\
+\end{gathered}
+$$
+
+- 差集 $A \backslash B = \{x: x\in A, x\notin B\}$，补集 $A^c = \Omega \backslash A$（用 $\Omega$ 表示全集）
+
+??? example "基本集合论练习"
+    - $\{x: \sup _n f_n(x) > t\} = \bigcup_{n=1}^{\infty}\{ x: f_n(x) > t \}$
+    - $\{x: \sup _n f_n(x) \leqslant t\} = \bigcap_{n=1}^{\infty}\{ x: f_n(x) \leqslant t \}$
+
+    第二行由第一行应用 De Morgan 律容易得到。令 $A=\{x: \sup _n f_n(x) > t\}$, $A_n=\{ x: f_n(x) > t \}$
+
+    - $\forall x\in A$，如果不存在 $n_0$ 使得 $f_{n_0}(x) = \sup_n f_n(x)$，则有 $f_n(x) < \sup_n f_n(x)$，两边同取 $\sup_n$ 后发现矛盾，因此有 $f_{n_0}(x) > t$, $x\in A_{n_0}$，也就有 $x\in \bigcup A_n\Rightarrow A\subseteq \bigcup A_n$
+    - $\forall x\in \bigcup A_n$，存在 $n_0$ 使得 $x\in A_{n_0}$，也就有 $\sup_n f_n(x) \geqslant f_{n_0}(x) > t$，因此 $x\in A\Rightarrow \bigcup A_n\subseteq A$
+
+## Limitation of Set Sequence
+
+像定义数列的极限一样，讨论对集列的极限的定义之前，先考虑单调集列这一特殊情况。
+
+- 单增集列：$A_k\subseteq A_{k+1}$, $k\in \mathbb{N}$，一定在全集 $\Omega$ 中，其极限一定存在，为 $\bigcup_{k=1}^{\infty} A_k\subseteq \Omega$
+- 单减集列：$A_k\supseteq A_{k+1}$, $k\in \mathbb{N}$，其极限一定存在，为 $\bigcap_{k=1}^{\infty} A_k$
+
+
+
+!!! warning "本页面还在施工中"
diff --git a/docs/math/analysis/index.md b/docs/math/analysis/index.md
@@ -0,0 +1,7 @@
+# Analysis
+
+- [Functions of a Real Variables](funcRvar/index.md)
+    - [Basics of Set Theory](funcRvar/sets.md)
+    - [Lebesgue Measure](funcRvar/LebesgueMeasure.md)
+    - ......
+- [Continuities](continuities.md)
diff --git a/docs/math/index.md b/docs/math/index.md
@@ -9,5 +9,5 @@
     - [Partial Differential Equations](DEs/PDE/index.md)
     - [Stochastic Differential Equations](DEs/Intro2SDE/index.md)
 - [Probability & Statistics](probability/index.md)
-- Misc
-    - [Continuities](misc/continuities.md)
+- [Analysis](analysis/index.md)
+    - [Functions of a Real Variable](analysis/funcRvar/index.md)
diff --git a/docs/readings/ICCV2023/DDPM_latent.md b/docs/readings/ICCV2023/DDPM_latent.md
@@ -83,7 +83,7 @@ $$
     - PSNR: Quality
     - SSIM: Similarity
 
-    Details in [Metrics](../metrics.md).
+    Details in [Metrics](../miscs/metrics.md).
 
 ### Zero-shot Image Editing
 
@@ -109,7 +109,7 @@ $$
     - PSNR: Quality
     - SSIM: Similarity between real image and generated image
 
-    Details in [Metrics](../metrics.md).
+    Details in [Metrics](../miscs/metrics.md).
 
 <div style="text-align:center;">
     <img src="../../imgs/ICCV2023/DDPM_latent_5.png" alt="DDPM_latent_5" style="zoom:80%;" />

diff --git a/docs/readings/diffusion/SGM.md b/docs/readings/diffusion/SGM.md
@@ -1,3 +1,5 @@
+<link rel="stylesheet" href="../../../css/counter.css" />
+
 # Score-based Generative Models
 
 !!! info "Reference"

diff --git a/docs/readings/imgs/miscs/bn_vs_ln.png b/docs/readings/imgs/miscs/bn_vs_ln.png
diff --git a/docs/readings/imgs/miscs/normalizations.png b/docs/readings/imgs/miscs/normalizations.png
diff --git a/docs/readings/index.md b/docs/readings/index.md
@@ -6,5 +6,4 @@
 
 - [Diffusion Models](diffusion/index.md)
 - [ICLR2024](ICLR2024/index.md)
-- [ICCV2023](ICCV2023/index.md)
-- [Metrics](metrics.md)
+- [ICCV2023](ICCV2023/index.md)
diff --git a/docs/readings/abbrs.md → docs/readings/miscs/abbrs.md b/docs/readings/abbrs.md → docs/readings/miscs/abbrs.md
diff --git a/docs/readings/miscs/einsum.md b/docs/readings/miscs/einsum.md
@@ -0,0 +1,130 @@
+<link rel="stylesheet" href="../../../css/counter.css" />
+
+# Einsum
+
+!!! info "Reference: [Einsum Is All You Need: NumPy, PyTorch and TensorFlow](https://www.youtube.com/watch?v=pkVwUVEHmfI)"
+
+## Example Uses in PyTorch
+
+```python hl_lines="4 10 14 18"
+>>> x = torch.tensor([[1, 2, 3],
+                      [4, 5, 6]])
+
+# permutation
+>>> torch.einsum("ij->ji", x)
+tensor([[1, 4],
+        [2, 5],
+        [3, 6]])
+
+# summation
+>>> torch.einsum("ij->", x)
+tensor(21)
+
+# column sum
+>> torch.einsum("ij->j", x)
+tensor([5, 7, 9])
+
+# row sum
+>> torch.einsum("ij->i", x)
+tensor([ 6, 15])
+```
+
+```python
+>>> x = torch.tensor([[1, 2, 3],
+                      [4, 5, 6]])
+>>> v = torch.tensor([[1, 0, -1]])
+
+# martix-vector multiplication: xv^T
+>>> torch.einsum("ij,kj->ik", x, v)
+tensor([[-2],
+        [-2]])
+
+# martix-matrix multiplication: xx^T
+>>> torch.einsum("ij,kj->ik", x, x)  # 2*2: (2*3) @ (3*2)
+tensor([[14, 32],
+        [32, 77]])
+
+# dot product first row with first row of matrix
+>>> torch.einsum("i,i->", x[0], x[0])
+tensor(14)
+```
+
+```python
+>>> x = torch.tensor([[1, 2, 3],
+                      [4, 5, 6],
+                      [7, 8, 9]])
+
+# dot product with matrix
+>>> torch.einsum("ij,ij->", x, x)
+tensor(285)
+
+# Hadamard product (element-wise multiplication)
+>>> torch.einsum("ij,ij->ij", x, x)
+tensor([[ 1,  4,  9],
+        [16, 25, 36],
+        [49, 64, 81]])
+```
+
+```python
+# outer product
+>>> a = torch.tensor([1, 0, -1])
+>>> b = torch.tensor([1, 2, 3, 4, 5])
+>>> torch.einsum("i,j->ij", a, b)
+tensor([[ 1,  2,  3,  4,  5],
+        [ 0,  0,  0,  0,  0],
+        [-1, -2, -3, -4, -5]])
+
+# batch matrix multiplication
+>>> generator = torch.manual_seed(12)
+>>> a = torch.rand((3, 2, 5), generator=generator)
+tensor([[[0.4657, 0.2328, 0.4527, 0.5871, 0.4086],
+         [0.1272, 0.6373, 0.2421, 0.7312, 0.7224]],
+
+        [[0.1992, 0.6948, 0.5830, 0.6318, 0.5559],
+         [0.1262, 0.9790, 0.8443, 0.1256, 0.4456]],
+
+        [[0.6601, 0.0554, 0.1573, 0.8137, 0.7216],
+         [0.2717, 0.3003, 0.6099, 0.5784, 0.6083]]])
+>>> b = torch.rand((3, 5, 3), generator=generator)
+tensor([[[0.4339, 0.8813, 0.3216],
+         [0.2604, 0.2566, 0.1872],
+         [0.6423, 0.1786, 0.1435],
+         [0.7490, 0.7275, 0.1641],
+         [0.3273, 0.1239, 0.6138]],
+
+        [[0.4535, 0.7659, 0.1800],
+         [0.3338, 0.9526, 0.8919],
+         [0.9859, 0.6348, 0.8811],
+         [0.9391, 0.1173, 0.1342],
+         [0.9405, 0.6803, 0.5556]],
+
+        [[0.8713, 0.0782, 0.8578],
+         [0.7540, 0.6698, 0.5817],
+         [0.3829, 0.7163, 0.8930],
+         [0.5597, 0.2803, 0.2476],
+         [0.4738, 0.1306, 0.2024]]])
+>>> torch.einsum("ijk,ikl->ijl", a, b)
+tensor([[[1.1270, 1.0287, 0.6055],
+         [1.1608, 0.9403, 0.7584]],
+
+        [[2.0132, 1.6369, 1.5629],
+         [1.7535, 1.8831, 1.9043]],
+
+        [[1.4744, 0.5236, 1.0864],
+         [1.3086, 0.9008, 1.2188]]])
+```
+
+
+```python
+>>> x = torch.tensor([[1, 2, 3],
+                      [4, 5, 6],
+                      [7, 8, 9]])
+
+# matrix diagonal
+>>> torch.einsum("ii->i", x)
+tensor([1, 5, 9])
+
+# matrix trace
+>>> torch.einsum("ii->", x)
+tensor(15)
+```
diff --git a/docs/readings/metrics.md → docs/readings/miscs/metrics.md b/docs/readings/metrics.md → docs/readings/miscs/metrics.md
diff --git a/docs/readings/miscs/normalization.md b/docs/readings/miscs/normalization.md
@@ -0,0 +1,64 @@
+<link rel="stylesheet" href="../../../css/counter.css" />
+
+# Normalization
+
+<div style="text-align:center;">
+    <img src="../../imgs/miscs/normalizations.png" alt="normalizations" style="50%;" />
+</div>
+
+## Batch Normalization
+
+对每个 mini batch $(z^{(1)}, \cdots, z^{(b)})$, $b$ 是 batch size。计算各个样本特征向量的均值 $\mu$ 和方差 $\sigma^2$
+
+$$
+\mu = \frac{1}{b}\sum_{i=1}^{b}z^{(i)}, \quad \sigma^2 = \frac{1}{b}\sum_{i=1}^{b}(z^{(i)}-\mu)^2
+$$
+
+随后有（为了数值稳定性引入一个很小的 $\varepsilon>0$）
+
+$$
+z^{(i)}_{\text{norm}}=\frac{z^{(i)}-\mu}{\sqrt{\sigma^2+\varepsilon}}
+$$
+
+用以替换 $z^{(i)}$ 的 $\tilde{z}^{(i)}$ 还需要对标准化得到的 $z^{(i)}_{\text{norm}}$ 进行线性(仿射)变换
+
+$$
+\tilde{z}^{(i)} = \gamma z^{(i)}_{\text{norm}} + \beta
+$$
+
+这里线性变换的参数 $\gamma$ 和 $\beta$ 也相当于网络参数 $w$，参与前向传播和反向传播的参数更新过程。线性变换的意义在于让每层的直接输出的分布更加多元化，而不总是被标准化所限制。
+
+> 有趣的是，每一层的偏置项 (bias) 和 $\beta$ 是重复的，所以可以去掉偏置项。
+
+于是，使用了 Batch Normalization 之后，只是把某一层的直接输出 $z^{(i)}$ 替换为 $\tilde{z}^{(i)}$，然后再应用激活函数后得到 $a^{(i)}$，再输入下一层。
+
+> 这里先 BN 还是先应用激活函数是一个问题，吴恩达认为经常先 BN 再使用激活函数。
+
+## Layer Normalization
+
+Layer Normalization 和 Batch Normalization 的不同之处只在于 $\mu$ 和 $\sigma^2$ 的计算方法。 Batch Normalization 是沿着 mini batch 这一维计算均值 $\mu$ 和方差 $\sigma^2$，而 Layer Normalization 则是单个样本内部进行 $\mu$ 和 $\sigma^2$ 的计算。
+
+!!! warning "该页面还在建设中"
+
+$$
+\mu = \frac{1}{b}\sum_{i=1}^{b}z^{(i)}, \quad \sigma^2 = \frac{1}{b}\sum_{i=1}^{b}(z^{(i)}-\mu)^2
+$$
+
+可以从下图简单看到计算维度的差别：
+
+<div style="text-align:center;">
+    <img src="../../imgs/miscs/bn_vs_ln.png" alt="bn_vs_ln" style="50%;" />
+</div>
+
+将 Batch Normalization 和 Layer Normalization 举例如下：
+
+\begin{figure}[H]
+    \centering
+    \includegraphics[scale=0.4]{graph/6.2.png}
+    \includegraphics[scale=0.4]{graph/6.3.png}
+    \caption{examples of Batch Normalization(left) and Layer Normalization(right)}
+\end{figure}
+
+## Instance Normalization
+
+## Group Normalization
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Information Theory

		!!! warning "该页面还在施工中"