Merge pull request RemyDegenne#173 from RemyDegenne/blueprint

Blueprint: add results about absolute continuity of Measure.compProd
LorenzoLuccioli · Oct 21, 2024 · 0f09ff1 · 0f09ff1
2 parents 006b68f + 22c2efa
commit 0f09ff1
Show file tree

Hide file tree

Showing 7 changed files with 151 additions and 47 deletions.
diff --git a/blueprint/src/sections/dpi_div.tex b/blueprint/src/sections/dpi_div.tex
@@ -106,7 +106,7 @@ \section{Generic divergences}
   \label{lem:div_comp_le_div_compProd_right}
   %\lean{}
   %\leanok
-  \uses{def:div, def:dpi}
+  \uses{def:div, def:condDiv, def:dpi}
   Let $D$ be a divergence that satisfies the DPI and for which $D(\kappa, \eta \mid \mu) = D(\mu \otimes \kappa, \mu \otimes \eta)$.
   Let $\mathcal \mu \in \mathcal M(\mathcal X)$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be Markov kernels. Then
   \begin{align*}

diff --git a/blueprint/src/sections/f_divergence.tex b/blueprint/src/sections/f_divergence.tex
@@ -1387,8 +1387,8 @@ \subsubsection{Data-processing and mean values}
 \end{lemma}
 
 \begin{proof}%\leanok
-\uses{thm:fDiv_data_proc_2}
-Let $u$ be the uniform distribution on $[0,1]$. Then $D_f(\mu, \nu) = D_f(\mu \times u, \nu \times u)$ (by lemma TODO about kernel with identity as first coordinate).
+\uses{thm:fDiv_data_proc_2,lem:fDiv_compProd_prod_eq}
+Let $u$ be the uniform distribution on $[0,1]$. Then $D_f(\mu, \nu) = D_f(\mu \times u, \nu \times u)$ (by Lemma~\ref{lem:fDiv_compProd_prod_eq}).
 Let $g : [0,1] \times [0,1] \to \{0,1\}$ be the measurable function with $g(x, y) = \mathbb{I}\{y \le x\}$. We also write $g$ for the corresponding deterministic kernel. By the data-processing inequality (Theorem~\ref{thm:fDiv_data_proc_2}),
 \begin{align*}
 D_f(\mu \times u, \nu \times u)

diff --git a/blueprint/src/sections/introduction.tex b/blueprint/src/sections/introduction.tex
@@ -6,20 +6,15 @@ \chapter*{Introduction}
 Our goal is to compile properties of information divergences and their uses in hypothesis testing. Most software libraries have tutorials, to learn how to use them, as well as a documentation listing all the available objects and methods of the library: in this project, we write the ``documentation'' of a part of the ``information theory library''.
 
 
-\section*{Source material}
-
-A very useful book (draft) about information theory and hypothesis testing: \cite{polyanskiy2024information} 
-
-Main reference for the properties of the Rényi divergence: \cite{van2014renyi}
-
-%As a reference, we want all notions needed in the recent paper \href{https://arxiv.org/abs/2403.01892}{[Degenne and Mathieu, Information Lower Bounds for Robust Mean Estimation, 2024]}. % TODO: why is this not a cite command?
 
 \section*{This document and the Lean implementation}
 
 This document is generated from a latex file written by the authors, thanks to the \texttt{leanblueprint} library developed by Patrick Massot (\url{https://github.com/PatrickMassot/leanblueprint}). It contains links to a Lean implementation of the results, but there is no technical guarantee that the results written here and the ones in the code match in any way.
 In particular, nothing in this document is automatically generated from the code.
 In summary, although the authors strive to make this document as close to the Lean implementation as possible, the only formally verified statements are those written in Lean.
 
+
+
 \section*{Assumptions}
 
 Information divergences are usually considered for probability measures, but we will be more granular about the assumptions we impose on the measures.
@@ -31,18 +26,27 @@ \section*{Assumptions}
 The countably generated assumption means that there is a countable set of sets that generates the sigma-algebra.
 This is the case for example in any standard Borel space, which covers the vast majority of applications.
 
+
+
 \section*{Notation}
 
 $\mathcal X$, $\mathcal Y$ will denote measurable spaces. Additional assumptions such as $\mathcal X$ being standard Borel will be specified where necessary.
 
 $\mu$, $\nu$ denote measures on a space $\mathcal X$. A measure $\mu$ is said to be finite if $\mu(\mathcal X) < \infty$, to be a probability measure if $\mu(\mathcal X) = 1$. A measure is s-finite if it can be written as a countable sum of finite measures, and is $\sigma$-finite if it can be written as a countable sum of finite measures which are pairwise mutually singular.
 The set of measures on $\mathcal X$ is $\mathcal M(\mathcal X)$. The set of probability measures is denoted by $\mathcal P(\mathcal X)$.
 
+For a function $g : \mathcal X \to \mathcal Y$ which is $a.e.$-measurable with respect to $\mu$, $g_* \mu$ denotes the pushforward or map of $\mu$ by $g$. This is a measure on $\mathcal Y$ such that $g_* \mu (A) = \mu(g^{-1} A)$.
+For $g : \mathcal X \to \mathbb{R}_{+,\infty}$, $g \cdot \mu$ denotes the measure on $\mathcal X$ with density $g$ with respect to $\mu$. That is, $g \cdot \mu (A) = \int_{x \in A} g(x) \partial\mu$.
+The Radon-Nikodym of a measure $\mu$ with respect to another measure $\mu$ is denoted by $\frac{d\mu}{d\nu}$. The singular part of $\mu$ with respect to $\mu$ is $\mu_{\perp \nu}$.
+
 $\kappa, \eta$ are used to refer to transition kernels, and we write $\kappa : \mathcal X \rightsquigarrow \mathcal Y$ to indicate that $\kappa$ is a transition kernel from $\mathcal X$ to $\mathcal Y$. That is, $\kappa$ is a measurable map from $\mathcal X$ to the measures on $\mathcal Y$.
 A kernel is finite if there exists $C < + \infty$ such that $\kappa(x, \mathcal Y) \le C$ for all $x \in \mathcal X$ (note that this is stronger than the condition that all $\kappa(x)$ are finite since it requires a uniform bound). It is s-finite if it can be written as a countable sum of finite kernels, and is called a Markov kernel if $\kappa(x)$ is a probability measure for all $x \in \mathcal X$.
+See Appendix~\ref{app:kernels} for more details about kernels and Appendix~\ref{app:rnDeriv} for results about Radon-Nikodym derivatives of kernels.
+
 
-TODO: composition/product of measures and kernels. $\mu \otimes \kappa$, $\kappa \circ \mu$.
 
-For a function $g : \mathcal X \to \mathcal Y$ which is $a.e.$-measurable with respect to $\mu$, $g_* \mu$ denotes the pushforward of $\mu$ by $g$. This is a measure on $\mathcal Y$ such that $g_* \mu (A) = \mu(g^{-1} A)$.
+\section*{Some references}
 
-For a function $g : \mathcal X \to \mathbb{R}_{+,\infty}$, $g \cdot \mu$ denotes the measure on $\mathcal X$ with density $g$ with respect to $\mu$. That is, $g \cdot \mu (A) = \int_{x \in A} g(x) \partial\mu$.
+A very useful book (draft) about information theory and hypothesis testing: \cite{polyanskiy2024information} 
+
+Main reference for the properties of the Rényi divergence: \cite{van2014renyi}
diff --git a/blueprint/src/sections/kernels.tex b/blueprint/src/sections/kernels.tex
@@ -1,4 +1,5 @@
 \chapter{Probability transition kernels}
+\label{app:kernels}
 
 Consider, informally, a process where we first observe a random state (for example the result of a coin flip), and then, depending on the value of that first state, we produce another random value.
 For example, if the coin lands on heads, we roll two dice and observe the sum. If on the other hand the coin lands on tails, we roll only one die and observe the result.
@@ -258,13 +259,45 @@ \section{Disintegration}
 
 
 
+TODO: find a way to hide the following dummy lemma
+\begin{lemma}[Dummy lemma: kernel properties]
+  \label{lem:kernel_properties}
+  %\lean{}
+  \leanok
+  \uses{
+    def:measure_measurableSpace,
+    def:kernel,
+    def:deterministic_kernel,
+    def:finite_kernel,
+    def:markov_kernel,
+    def:sFinite_kernel,
+    def:kernel_compProd,
+    def:kernel_comp,
+    def:kernel_prod,
+    def:kernel_parallel_prod,
+    def:measure_compProd,
+  }
+  Dummy node to summarize kernel properties.
+\end{lemma}
+
+\begin{proof}\leanok
+\uses{
+  lem:measurable_measure_fun,
+  thm:kernel_ext,
+  thm:disintegration,
+}
+\end{proof}
+
+
+
+
 \section{Bayesian inverse of a kernel}
 
 \begin{definition}
   \label{def:bayesInv}
   \lean{ProbabilityTheory.bayesInv}
   \leanok
-  \uses{def:kernel_comp, def:measure_compProd}
+  \uses{lem:kernel_properties}
   For $\mu \in \mathcal M(\mathcal X)$ and $\kappa : \mathcal X \rightsquigarrow \mathcal Y$, a Bayesian inverse of $\kappa$ is a Markov kernel $\kappa_\mu^\dagger : \mathcal Y \rightsquigarrow \mathcal X$ such that $\mu \otimes \kappa = ((\kappa \circ \mu) \otimes \kappa_\mu^\dagger)_\leftrightarrow$ in which $(\cdot)_\leftrightarrow$ denotes swapping the two coordinates.
   If such an inverse exists it is unique up to a $(\kappa \circ \mu)$-null set, and we talk about \emph{the} Bayesian inverse of $\kappa$ with respect to $\mu$.
 \end{definition}
@@ -412,33 +445,20 @@ \section{Bayesian inverse of a kernel}
 \end{proof}
 
 
+
 TODO: find a way to hide the following dummy lemma
-\begin{lemma}[Dummy lemma: kernel properties]
-  \label{lem:kernel_properties}
+\begin{lemma}[Dummy lemma: bayesInv properties]
+  \label{lem:bayesInv_properties}
   %\lean{}
   \leanok
   \uses{
-    def:measure_measurableSpace,
-    def:kernel,
-    def:deterministic_kernel,
-    def:finite_kernel,
-    def:markov_kernel,
-    def:sFinite_kernel,
-    def:kernel_compProd,
-    def:kernel_comp,
-    def:kernel_prod,
-    def:kernel_parallel_prod,
-    def:measure_compProd,
     def:bayesInv
   }
-  Dummy node to summarize kernel properties.
+  Dummy node to summarize properties of the Bayesian inverse.
 \end{lemma}
 
 \begin{proof}\leanok
 \uses{
-  lem:measurable_measure_fun,
-  thm:kernel_ext,
-  thm:disintegration,
   lem:exists_bayesInv,
   lem:eq_bayesInv_of_compProd_eq,
   lem:bayesInv_comp_self,

diff --git a/blueprint/src/sections/risk.tex b/blueprint/src/sections/risk.tex
@@ -255,7 +255,7 @@ \subsection{Properties of the Bayes risk of a prior}
 \end{lemma}
 
 \begin{proof}\leanok
-\uses{}
+\uses{lem:bayesInv_properties}
 Use the main property of the Bayesian inverse.
 \begin{align*}
   (P_\pi^\dagger \times \hat{y}) \circ P \circ \pi
@@ -359,7 +359,7 @@ \subsection{Properties of the Bayes risk of a prior}
 \end{lemma}
 
 \begin{proof}%\leanok
-\uses{lem:rnDeriv_bayesInv}
+\uses{lem:bayesInv_properties}
 
 \end{proof}
 
@@ -377,7 +377,7 @@ \subsection{Properties of the Bayes risk of a prior}
 \end{lemma}
 
 \begin{proof}%\leanok
-\uses{thm:isBayesEstimator_genBayesEstimator}
+\uses{thm:isBayesEstimator_genBayesEstimator,lem:bayesInv_properties}
 
 \end{proof}
 
@@ -539,7 +539,7 @@ \section{Simple binary hypothesis testing}
 \end{lemma}
 
 \begin{proof}\leanok
-\uses{lem:kernel_properties}
+\uses{lem:bayesInv_properties}
 
 \end{proof}
 

diff --git a/blueprint/src/sections/rn_deriv.tex b/blueprint/src/sections/rn_deriv.tex
@@ -1,4 +1,5 @@
 \chapter{Radon-Nikodym derivatives and disintegration}
+\label{app:rnDeriv}
 
 
 \section{Kernel Radon-Nikodym derivative}
@@ -44,6 +45,8 @@ \section{Kernel Radon-Nikodym derivative}
 \section{Composition-product}
 
 
+\subsection{Absolute Continuity}
+
 \begin{lemma}
   \label{lem:ac_compProd_iff}
   \lean{MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd,MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd',MeasureTheory.Measure.absolutelyContinuous_of_compProd}
@@ -55,16 +58,92 @@ \section{Composition-product}
     \item[(1b)] if $\mu \otimes \kappa \ll \nu \otimes \eta$ and $\kappa(x) \ne 0$ for all $x$ then $\mu \ll \nu$,
     \item[(2)]  if $\mu \ll \nu$ and $\mu \otimes \kappa \ll \mu \otimes \eta$ then $\mu \otimes \kappa \ll \nu \otimes \eta$.
   \end{itemize}
-  In particular, if $\kappa(x) \ne 0$ for all $x$ then
+  In particular,
+  \begin{itemize}
+    \item if $\kappa(x) \ne 0$ for all $x$ then
+      $\mu \otimes \kappa \ll \nu \otimes \eta \iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)$~.
+    \item If $\mu \ll \nu$ then $\mu \otimes \kappa \ll \nu \otimes \eta \iff \mu \otimes \kappa \ll \mu \otimes \eta$~.
+  \end{itemize}
+
+\end{lemma}
+
+\begin{proof}\leanok
+\uses{lem:kernel_properties}
+
+\end{proof}
+
+
+\begin{lemma}
+  \label{lem:mutuallySingular_compProd}
+  %\lean{}
+  %\leanok
+  \uses{lem:kernel_properties}
+  Let $\mu, \nu$ be two $\sigma$-finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two finite kernels.
+  Let $\mu \sqcap \nu$ denote the infimum of $\mu$ and $\nu$.
+  Then
+  \begin{enumerate}
+    \item if $\mu \perp \nu$ then $\mu \otimes \kappa \perp \nu \otimes \eta$,
+    \item $\mu \otimes \kappa \perp \nu \otimes \eta \iff (\mu \sqcap \nu) \otimes \kappa \perp (\mu \sqcap \nu) \otimes \eta$, and the same holds for any measure which is equivalent to $\mu \sqcap \nu$, like $\frac{d \mu}{d \nu} \cdot \nu$~,
+    \item if $\mu \otimes \kappa \perp \nu \otimes \eta$ then for $(\mu \sqcap \nu)$-almost every $x$, $\kappa(x) \perp \eta(x)$.
+  \end{enumerate}
+\end{lemma}
+
+\begin{proof}%\leanok
+\uses{}
+First, let's state two facts about mutually singular measures that we will use without proof:
+\begin{itemize}
+  \item $\mu \perp (\nu + \nu') \iff \mu \perp \nu \ \wedge \ \mu \perp \nu'$~,
+  \item if $\mu \ll \mu'$ and  $\mu' \ll \mu$ then $\mu \perp \nu \iff \mu' \perp \nu$~.
+\end{itemize}
+
+\emph{1.} Let $s$ be a measurable set of $\mathcal X$ such that $\mu(s) = 0$ and $\nu(s^c) = 0$. One can check that the sets $s \times \mathcal Y$ and $s^c \times \mathcal Y$ demonstrate $\mu \otimes \kappa \perp \nu \otimes \eta$.
+
+\emph{2.} Write $\mu = \frac{d \mu}{d \nu} \cdot \nu + \mu_{\perp \nu}$ and $\nu = \frac{d \nu}{d \mu} \cdot \mu + \nu_{\perp \mu}$.
+Then
+\begin{align*}
+\mu \otimes \kappa \perp \nu \otimes \eta
+&\iff \left(\frac{d \mu}{d \nu} \cdot \nu\right) \otimes \kappa \perp \left(\frac{d \nu}{d \mu} \cdot \mu\right) \otimes \eta
+\\&\qquad \wedge \ \left(\frac{d \mu}{d \nu} \cdot \nu\right) \otimes \kappa \perp \nu_{\perp \mu} \otimes \eta
+\\&\qquad \wedge \ \mu_{\perp \nu} \otimes \kappa \perp \left(\frac{d \nu}{d \mu} \cdot \mu\right) \otimes \eta
+\\&\qquad \wedge \ \mu_{\perp \nu} \otimes \kappa \perp \nu_{\perp \mu} \otimes \eta
+\end{align*}
+The three last lines are true since $\mu \perp \nu_{\perp \mu}$ and $\mu_{\perp \nu} \perp \nu$. Only the first line remains.
+It suffices then to prove that $\mu \sqcap \nu$, $\frac{d \mu}{d \nu} \cdot \nu$ and $\frac{d \nu}{d \mu} \cdot \mu$ are equivalent.
+
+TODO
+
+\emph{3.} By 2. it suffices to consider the case $\nu = \mu$ and show that for $\mu$-almost every $x$, $\kappa(x) \perp \eta(x)$.
+Let $s$ be a measurable set of $\mathcal X \times \mathcal Y$ such that $(\mu \otimes \kappa)(s) = 0$ and $(\mu \otimes \eta)(s^c) = 0$.
+Then
+\begin{align*}
+0 &= (\mu \otimes \kappa)(s)
+\\
+&= \int_{x} \kappa(x)(\{y \mid (x, y) \in s\}) \partial \mu
+\: .
+\end{align*}
+Hence for $\mu$-almost all $x$, $\kappa(x)(\{y \mid (x, y) \in s\}) = 0$. Similarly, for $\mu$-almost all $x$, $\eta(x)(\{y \mid (x, y) \in s^c\}) = 0$. Since $\{y \mid (x, y) \in s^c\} = \{y \mid (x, y) \in s\}^c$, we have a measurable set witnessing $\kappa(x) \perp \eta(x)$ for $\mu$-almost all $x$.
+\end{proof}
+
+
+
+\subsection{Radon-Nikodym derivative and singular part}
+
+\begin{lemma}
+  \label{lem:singularPart_compProd}
+  %\lean{}
+  %\leanok
+  \uses{lem:kernel_properties}
+  Let $\mu, \nu$ be two $\sigma$-finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two s-finite kernels.
+  We denote $\frac{d\mu}{d\nu}\cdot \nu$ by $\mu_{\parallel \nu}$.
+  Then
   \begin{align*}
-  \mu \otimes \kappa \ll \nu \otimes \eta
-  \iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)
+  (\mu \otimes \kappa)_{\perp (\nu \otimes \eta)} = \mu_{\perp \nu} \otimes \kappa + (\mu_{\parallel \nu} \otimes \kappa)_{\perp (\mu_{\parallel \nu} \otimes \eta)}
   \: .
   \end{align*}
 \end{lemma}
 
-\begin{proof}\leanok
-\uses{lem:kernel_properties}
+\begin{proof}%\leanok
+\uses{}
 
 \end{proof}
 
@@ -75,8 +154,8 @@ \section{Composition-product}
   \leanok
   \uses{}
   Let $\mu, \nu$ be two finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two finite kernels.
-  Let $\mu' = \left(\frac{\partial \mu}{\partial \nu}\right) \cdot \nu$.
-  Then for $(\nu \otimes \eta)$-almost all $z$, $\frac{d (\mu' \otimes \kappa)}{d (\nu \otimes \eta)}(z) = \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \eta)}(z)$.
+  Let $\mu_{\parallel \nu} = \left(\frac{\partial \mu}{\partial \nu}\right) \cdot \nu$.
+  Then for $(\nu \otimes \eta)$-almost all $z$, $\frac{d (\mu_{\parallel \nu} \otimes \kappa)}{d (\nu \otimes \eta)}(z) = \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \eta)}(z)$.
 \end{lemma}
 
 \begin{proof} \leanok
@@ -93,8 +172,9 @@ \section{Composition-product}
 \end{lemma}
 
 \begin{proof} \leanok
-\uses{}
+\uses{lem:rnDeriv_eq_ac_left}
 We can suppose $\mu \ll \nu$ without loss of generality (by Lemma~\ref{lem:rnDeriv_eq_ac_left}).
+That implies $\mu \otimes \kappa \ll \nu \otimes \kappa$.
 It suffices to show that the integrals of the two functions agree on all sets in the $\pi$-system of products of measurable sets. Let $s, t$ be two measurable sets of $\mathcal X$ and $\mathcal Y$ respectively.
 \begin{align*}
 \int_{p \in s \times t} \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \kappa)}(p) \partial(\nu \otimes \kappa)
@@ -119,7 +199,7 @@ \section{Composition-product}
 \begin{lemma}
   \label{lem:rnDeriv_chain}
   \lean{MeasureTheory.Measure.rnDeriv_mul_rnDeriv,MeasureTheory.Measure.rnDeriv_mul_rnDeriv'}
-  \leanok
+  \mathlibok
   \uses{}
   Let $\mu, \nu, \xi$ be $\sigma$-finite measures on $\mathcal X$.
   \begin{enumerate}
@@ -128,7 +208,7 @@ \section{Composition-product}
   \end{enumerate}
 \end{lemma}
 
-\begin{proof}\leanok
+\begin{proof}\mathlibok
 \uses{}
 \end{proof}
 
@@ -360,8 +440,6 @@ \section{Sigma-algebras}
 
 \begin{proof}\leanok
 \uses{lem:rnDeriv_map_eq_condexp}
-TODO: the current Lean proof is not the proof presented here (but a direct computation).
-
 The restriction $\mu_{| \mathcal A}$ is the map of $\mu$ by the identity, seen as a function from $\mathcal X$ with its $\sigma$-algebra to $\mathcal X$ with $\mathcal{A}$.
 We can thus apply Lemma~\ref{lem:rnDeriv_map_eq_condexp}.
 \end{proof}

diff --git a/blueprint/src/web.tex b/blueprint/src/web.tex
@@ -16,6 +16,8 @@
 \input{macros/common}
 \input{macros/web}
 
+\graphcolor{mathlib}{purple}{The statement of this result is in Mathlib}
+
 \home{https://RemyDegenne.github.io/testing-lower-bounds}
 \github{https://github.com/RemyDegenne/testing-lower-bounds}
 \dochome{https://RemyDegenne.github.io/testing-lower-bounds/docs}