Skip to content

Commit

Permalink
Merge pull request RemyDegenne#173 from RemyDegenne/blueprint
Browse files Browse the repository at this point in the history
Blueprint: add results about absolute continuity of Measure.compProd
  • Loading branch information
RemyDegenne authored Oct 21, 2024
2 parents 006b68f + 22c2efa commit 0f09ff1
Show file tree
Hide file tree
Showing 7 changed files with 151 additions and 47 deletions.
2 changes: 1 addition & 1 deletion blueprint/src/sections/dpi_div.tex
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ \section{Generic divergences}
\label{lem:div_comp_le_div_compProd_right}
%\lean{}
%\leanok
\uses{def:div, def:dpi}
\uses{def:div, def:condDiv, def:dpi}
Let $D$ be a divergence that satisfies the DPI and for which $D(\kappa, \eta \mid \mu) = D(\mu \otimes \kappa, \mu \otimes \eta)$.
Let $\mathcal \mu \in \mathcal M(\mathcal X)$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be Markov kernels. Then
\begin{align*}
Expand Down
4 changes: 2 additions & 2 deletions blueprint/src/sections/f_divergence.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1387,8 +1387,8 @@ \subsubsection{Data-processing and mean values}
\end{lemma}

\begin{proof}%\leanok
\uses{thm:fDiv_data_proc_2}
Let $u$ be the uniform distribution on $[0,1]$. Then $D_f(\mu, \nu) = D_f(\mu \times u, \nu \times u)$ (by lemma TODO about kernel with identity as first coordinate).
\uses{thm:fDiv_data_proc_2,lem:fDiv_compProd_prod_eq}
Let $u$ be the uniform distribution on $[0,1]$. Then $D_f(\mu, \nu) = D_f(\mu \times u, \nu \times u)$ (by Lemma~\ref{lem:fDiv_compProd_prod_eq}).
Let $g : [0,1] \times [0,1] \to \{0,1\}$ be the measurable function with $g(x, y) = \mathbb{I}\{y \le x\}$. We also write $g$ for the corresponding deterministic kernel. By the data-processing inequality (Theorem~\ref{thm:fDiv_data_proc_2}),
\begin{align*}
D_f(\mu \times u, \nu \times u)
Expand Down
24 changes: 14 additions & 10 deletions blueprint/src/sections/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,15 @@ \chapter*{Introduction}
Our goal is to compile properties of information divergences and their uses in hypothesis testing. Most software libraries have tutorials, to learn how to use them, as well as a documentation listing all the available objects and methods of the library: in this project, we write the ``documentation'' of a part of the ``information theory library''.


\section*{Source material}

A very useful book (draft) about information theory and hypothesis testing: \cite{polyanskiy2024information}

Main reference for the properties of the Rényi divergence: \cite{van2014renyi}

%As a reference, we want all notions needed in the recent paper \href{https://arxiv.org/abs/2403.01892}{[Degenne and Mathieu, Information Lower Bounds for Robust Mean Estimation, 2024]}. % TODO: why is this not a cite command?

\section*{This document and the Lean implementation}

This document is generated from a latex file written by the authors, thanks to the \texttt{leanblueprint} library developed by Patrick Massot (\url{https://github.com/PatrickMassot/leanblueprint}). It contains links to a Lean implementation of the results, but there is no technical guarantee that the results written here and the ones in the code match in any way.
In particular, nothing in this document is automatically generated from the code.
In summary, although the authors strive to make this document as close to the Lean implementation as possible, the only formally verified statements are those written in Lean.



\section*{Assumptions}

Information divergences are usually considered for probability measures, but we will be more granular about the assumptions we impose on the measures.
Expand All @@ -31,18 +26,27 @@ \section*{Assumptions}
The countably generated assumption means that there is a countable set of sets that generates the sigma-algebra.
This is the case for example in any standard Borel space, which covers the vast majority of applications.



\section*{Notation}

$\mathcal X$, $\mathcal Y$ will denote measurable spaces. Additional assumptions such as $\mathcal X$ being standard Borel will be specified where necessary.

$\mu$, $\nu$ denote measures on a space $\mathcal X$. A measure $\mu$ is said to be finite if $\mu(\mathcal X) < \infty$, to be a probability measure if $\mu(\mathcal X) = 1$. A measure is s-finite if it can be written as a countable sum of finite measures, and is $\sigma$-finite if it can be written as a countable sum of finite measures which are pairwise mutually singular.
The set of measures on $\mathcal X$ is $\mathcal M(\mathcal X)$. The set of probability measures is denoted by $\mathcal P(\mathcal X)$.

For a function $g : \mathcal X \to \mathcal Y$ which is $a.e.$-measurable with respect to $\mu$, $g_* \mu$ denotes the pushforward or map of $\mu$ by $g$. This is a measure on $\mathcal Y$ such that $g_* \mu (A) = \mu(g^{-1} A)$.
For $g : \mathcal X \to \mathbb{R}_{+,\infty}$, $g \cdot \mu$ denotes the measure on $\mathcal X$ with density $g$ with respect to $\mu$. That is, $g \cdot \mu (A) = \int_{x \in A} g(x) \partial\mu$.
The Radon-Nikodym of a measure $\mu$ with respect to another measure $\mu$ is denoted by $\frac{d\mu}{d\nu}$. The singular part of $\mu$ with respect to $\mu$ is $\mu_{\perp \nu}$.

$\kappa, \eta$ are used to refer to transition kernels, and we write $\kappa : \mathcal X \rightsquigarrow \mathcal Y$ to indicate that $\kappa$ is a transition kernel from $\mathcal X$ to $\mathcal Y$. That is, $\kappa$ is a measurable map from $\mathcal X$ to the measures on $\mathcal Y$.
A kernel is finite if there exists $C < + \infty$ such that $\kappa(x, \mathcal Y) \le C$ for all $x \in \mathcal X$ (note that this is stronger than the condition that all $\kappa(x)$ are finite since it requires a uniform bound). It is s-finite if it can be written as a countable sum of finite kernels, and is called a Markov kernel if $\kappa(x)$ is a probability measure for all $x \in \mathcal X$.
See Appendix~\ref{app:kernels} for more details about kernels and Appendix~\ref{app:rnDeriv} for results about Radon-Nikodym derivatives of kernels.


TODO: composition/product of measures and kernels. $\mu \otimes \kappa$, $\kappa \circ \mu$.

For a function $g : \mathcal X \to \mathcal Y$ which is $a.e.$-measurable with respect to $\mu$, $g_* \mu$ denotes the pushforward of $\mu$ by $g$. This is a measure on $\mathcal Y$ such that $g_* \mu (A) = \mu(g^{-1} A)$.
\section*{Some references}

For a function $g : \mathcal X \to \mathbb{R}_{+,\infty}$, $g \cdot \mu$ denotes the measure on $\mathcal X$ with density $g$ with respect to $\mu$. That is, $g \cdot \mu (A) = \int_{x \in A} g(x) \partial\mu$.
A very useful book (draft) about information theory and hypothesis testing: \cite{polyanskiy2024information}

Main reference for the properties of the Rényi divergence: \cite{van2014renyi}
56 changes: 38 additions & 18 deletions blueprint/src/sections/kernels.tex
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
\chapter{Probability transition kernels}
\label{app:kernels}

Consider, informally, a process where we first observe a random state (for example the result of a coin flip), and then, depending on the value of that first state, we produce another random value.
For example, if the coin lands on heads, we roll two dice and observe the sum. If on the other hand the coin lands on tails, we roll only one die and observe the result.
Expand Down Expand Up @@ -258,13 +259,45 @@ \section{Disintegration}



TODO: find a way to hide the following dummy lemma
\begin{lemma}[Dummy lemma: kernel properties]
\label{lem:kernel_properties}
%\lean{}
\leanok
\uses{
def:measure_measurableSpace,
def:kernel,
def:deterministic_kernel,
def:finite_kernel,
def:markov_kernel,
def:sFinite_kernel,
def:kernel_compProd,
def:kernel_comp,
def:kernel_prod,
def:kernel_parallel_prod,
def:measure_compProd,
}
Dummy node to summarize kernel properties.
\end{lemma}

\begin{proof}\leanok
\uses{
lem:measurable_measure_fun,
thm:kernel_ext,
thm:disintegration,
}
\end{proof}




\section{Bayesian inverse of a kernel}

\begin{definition}
\label{def:bayesInv}
\lean{ProbabilityTheory.bayesInv}
\leanok
\uses{def:kernel_comp, def:measure_compProd}
\uses{lem:kernel_properties}
For $\mu \in \mathcal M(\mathcal X)$ and $\kappa : \mathcal X \rightsquigarrow \mathcal Y$, a Bayesian inverse of $\kappa$ is a Markov kernel $\kappa_\mu^\dagger : \mathcal Y \rightsquigarrow \mathcal X$ such that $\mu \otimes \kappa = ((\kappa \circ \mu) \otimes \kappa_\mu^\dagger)_\leftrightarrow$ in which $(\cdot)_\leftrightarrow$ denotes swapping the two coordinates.
If such an inverse exists it is unique up to a $(\kappa \circ \mu)$-null set, and we talk about \emph{the} Bayesian inverse of $\kappa$ with respect to $\mu$.
\end{definition}
Expand Down Expand Up @@ -412,33 +445,20 @@ \section{Bayesian inverse of a kernel}
\end{proof}



TODO: find a way to hide the following dummy lemma
\begin{lemma}[Dummy lemma: kernel properties]
\label{lem:kernel_properties}
\begin{lemma}[Dummy lemma: bayesInv properties]
\label{lem:bayesInv_properties}
%\lean{}
\leanok
\uses{
def:measure_measurableSpace,
def:kernel,
def:deterministic_kernel,
def:finite_kernel,
def:markov_kernel,
def:sFinite_kernel,
def:kernel_compProd,
def:kernel_comp,
def:kernel_prod,
def:kernel_parallel_prod,
def:measure_compProd,
def:bayesInv
}
Dummy node to summarize kernel properties.
Dummy node to summarize properties of the Bayesian inverse.
\end{lemma}

\begin{proof}\leanok
\uses{
lem:measurable_measure_fun,
thm:kernel_ext,
thm:disintegration,
lem:exists_bayesInv,
lem:eq_bayesInv_of_compProd_eq,
lem:bayesInv_comp_self,
Expand Down
8 changes: 4 additions & 4 deletions blueprint/src/sections/risk.tex
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ \subsection{Properties of the Bayes risk of a prior}
\end{lemma}

\begin{proof}\leanok
\uses{}
\uses{lem:bayesInv_properties}
Use the main property of the Bayesian inverse.
\begin{align*}
(P_\pi^\dagger \times \hat{y}) \circ P \circ \pi
Expand Down Expand Up @@ -359,7 +359,7 @@ \subsection{Properties of the Bayes risk of a prior}
\end{lemma}

\begin{proof}%\leanok
\uses{lem:rnDeriv_bayesInv}
\uses{lem:bayesInv_properties}

\end{proof}

Expand All @@ -377,7 +377,7 @@ \subsection{Properties of the Bayes risk of a prior}
\end{lemma}

\begin{proof}%\leanok
\uses{thm:isBayesEstimator_genBayesEstimator}
\uses{thm:isBayesEstimator_genBayesEstimator,lem:bayesInv_properties}

\end{proof}

Expand Down Expand Up @@ -539,7 +539,7 @@ \section{Simple binary hypothesis testing}
\end{lemma}

\begin{proof}\leanok
\uses{lem:kernel_properties}
\uses{lem:bayesInv_properties}

\end{proof}

Expand Down
102 changes: 90 additions & 12 deletions blueprint/src/sections/rn_deriv.tex
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
\chapter{Radon-Nikodym derivatives and disintegration}
\label{app:rnDeriv}


\section{Kernel Radon-Nikodym derivative}
Expand Down Expand Up @@ -44,6 +45,8 @@ \section{Kernel Radon-Nikodym derivative}
\section{Composition-product}


\subsection{Absolute Continuity}

\begin{lemma}
\label{lem:ac_compProd_iff}
\lean{MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd,MeasureTheory.Measure.absolutelyContinuous_compProd_of_compProd',MeasureTheory.Measure.absolutelyContinuous_of_compProd}
Expand All @@ -55,16 +58,92 @@ \section{Composition-product}
\item[(1b)] if $\mu \otimes \kappa \ll \nu \otimes \eta$ and $\kappa(x) \ne 0$ for all $x$ then $\mu \ll \nu$,
\item[(2)] if $\mu \ll \nu$ and $\mu \otimes \kappa \ll \mu \otimes \eta$ then $\mu \otimes \kappa \ll \nu \otimes \eta$.
\end{itemize}
In particular, if $\kappa(x) \ne 0$ for all $x$ then
In particular,
\begin{itemize}
\item if $\kappa(x) \ne 0$ for all $x$ then
$\mu \otimes \kappa \ll \nu \otimes \eta \iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)$~.
\item If $\mu \ll \nu$ then $\mu \otimes \kappa \ll \nu \otimes \eta \iff \mu \otimes \kappa \ll \mu \otimes \eta$~.
\end{itemize}

\end{lemma}

\begin{proof}\leanok
\uses{lem:kernel_properties}

\end{proof}


\begin{lemma}
\label{lem:mutuallySingular_compProd}
%\lean{}
%\leanok
\uses{lem:kernel_properties}
Let $\mu, \nu$ be two $\sigma$-finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two finite kernels.
Let $\mu \sqcap \nu$ denote the infimum of $\mu$ and $\nu$.
Then
\begin{enumerate}
\item if $\mu \perp \nu$ then $\mu \otimes \kappa \perp \nu \otimes \eta$,
\item $\mu \otimes \kappa \perp \nu \otimes \eta \iff (\mu \sqcap \nu) \otimes \kappa \perp (\mu \sqcap \nu) \otimes \eta$, and the same holds for any measure which is equivalent to $\mu \sqcap \nu$, like $\frac{d \mu}{d \nu} \cdot \nu$~,
\item if $\mu \otimes \kappa \perp \nu \otimes \eta$ then for $(\mu \sqcap \nu)$-almost every $x$, $\kappa(x) \perp \eta(x)$.
\end{enumerate}
\end{lemma}

\begin{proof}%\leanok
\uses{}
First, let's state two facts about mutually singular measures that we will use without proof:
\begin{itemize}
\item $\mu \perp (\nu + \nu') \iff \mu \perp \nu \ \wedge \ \mu \perp \nu'$~,
\item if $\mu \ll \mu'$ and $\mu' \ll \mu$ then $\mu \perp \nu \iff \mu' \perp \nu$~.
\end{itemize}

\emph{1.} Let $s$ be a measurable set of $\mathcal X$ such that $\mu(s) = 0$ and $\nu(s^c) = 0$. One can check that the sets $s \times \mathcal Y$ and $s^c \times \mathcal Y$ demonstrate $\mu \otimes \kappa \perp \nu \otimes \eta$.

\emph{2.} Write $\mu = \frac{d \mu}{d \nu} \cdot \nu + \mu_{\perp \nu}$ and $\nu = \frac{d \nu}{d \mu} \cdot \mu + \nu_{\perp \mu}$.
Then
\begin{align*}
\mu \otimes \kappa \perp \nu \otimes \eta
&\iff \left(\frac{d \mu}{d \nu} \cdot \nu\right) \otimes \kappa \perp \left(\frac{d \nu}{d \mu} \cdot \mu\right) \otimes \eta
\\&\qquad \wedge \ \left(\frac{d \mu}{d \nu} \cdot \nu\right) \otimes \kappa \perp \nu_{\perp \mu} \otimes \eta
\\&\qquad \wedge \ \mu_{\perp \nu} \otimes \kappa \perp \left(\frac{d \nu}{d \mu} \cdot \mu\right) \otimes \eta
\\&\qquad \wedge \ \mu_{\perp \nu} \otimes \kappa \perp \nu_{\perp \mu} \otimes \eta
\end{align*}
The three last lines are true since $\mu \perp \nu_{\perp \mu}$ and $\mu_{\perp \nu} \perp \nu$. Only the first line remains.
It suffices then to prove that $\mu \sqcap \nu$, $\frac{d \mu}{d \nu} \cdot \nu$ and $\frac{d \nu}{d \mu} \cdot \mu$ are equivalent.

TODO

\emph{3.} By 2. it suffices to consider the case $\nu = \mu$ and show that for $\mu$-almost every $x$, $\kappa(x) \perp \eta(x)$.
Let $s$ be a measurable set of $\mathcal X \times \mathcal Y$ such that $(\mu \otimes \kappa)(s) = 0$ and $(\mu \otimes \eta)(s^c) = 0$.
Then
\begin{align*}
0 &= (\mu \otimes \kappa)(s)
\\
&= \int_{x} \kappa(x)(\{y \mid (x, y) \in s\}) \partial \mu
\: .
\end{align*}
Hence for $\mu$-almost all $x$, $\kappa(x)(\{y \mid (x, y) \in s\}) = 0$. Similarly, for $\mu$-almost all $x$, $\eta(x)(\{y \mid (x, y) \in s^c\}) = 0$. Since $\{y \mid (x, y) \in s^c\} = \{y \mid (x, y) \in s\}^c$, we have a measurable set witnessing $\kappa(x) \perp \eta(x)$ for $\mu$-almost all $x$.
\end{proof}



\subsection{Radon-Nikodym derivative and singular part}

\begin{lemma}
\label{lem:singularPart_compProd}
%\lean{}
%\leanok
\uses{lem:kernel_properties}
Let $\mu, \nu$ be two $\sigma$-finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two s-finite kernels.
We denote $\frac{d\mu}{d\nu}\cdot \nu$ by $\mu_{\parallel \nu}$.
Then
\begin{align*}
\mu \otimes \kappa \ll \nu \otimes \eta
\iff \left( \mu \ll \nu \ \wedge \ \mu \otimes \kappa \ll \mu \otimes \eta \right)
(\mu \otimes \kappa)_{\perp (\nu \otimes \eta)} = \mu_{\perp \nu} \otimes \kappa + (\mu_{\parallel \nu} \otimes \kappa)_{\perp (\mu_{\parallel \nu} \otimes \eta)}
\: .
\end{align*}
\end{lemma}

\begin{proof}\leanok
\uses{lem:kernel_properties}
\begin{proof}%\leanok
\uses{}

\end{proof}

Expand All @@ -75,8 +154,8 @@ \section{Composition-product}
\leanok
\uses{}
Let $\mu, \nu$ be two finite measures on $\mathcal X$ and let $\kappa, \eta : \mathcal X \rightsquigarrow \mathcal Y$ be two finite kernels.
Let $\mu' = \left(\frac{\partial \mu}{\partial \nu}\right) \cdot \nu$.
Then for $(\nu \otimes \eta)$-almost all $z$, $\frac{d (\mu' \otimes \kappa)}{d (\nu \otimes \eta)}(z) = \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \eta)}(z)$.
Let $\mu_{\parallel \nu} = \left(\frac{\partial \mu}{\partial \nu}\right) \cdot \nu$.
Then for $(\nu \otimes \eta)$-almost all $z$, $\frac{d (\mu_{\parallel \nu} \otimes \kappa)}{d (\nu \otimes \eta)}(z) = \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \eta)}(z)$.
\end{lemma}

\begin{proof} \leanok
Expand All @@ -93,8 +172,9 @@ \section{Composition-product}
\end{lemma}

\begin{proof} \leanok
\uses{}
\uses{lem:rnDeriv_eq_ac_left}
We can suppose $\mu \ll \nu$ without loss of generality (by Lemma~\ref{lem:rnDeriv_eq_ac_left}).
That implies $\mu \otimes \kappa \ll \nu \otimes \kappa$.
It suffices to show that the integrals of the two functions agree on all sets in the $\pi$-system of products of measurable sets. Let $s, t$ be two measurable sets of $\mathcal X$ and $\mathcal Y$ respectively.
\begin{align*}
\int_{p \in s \times t} \frac{d (\mu \otimes \kappa)}{d (\nu \otimes \kappa)}(p) \partial(\nu \otimes \kappa)
Expand All @@ -119,7 +199,7 @@ \section{Composition-product}
\begin{lemma}
\label{lem:rnDeriv_chain}
\lean{MeasureTheory.Measure.rnDeriv_mul_rnDeriv,MeasureTheory.Measure.rnDeriv_mul_rnDeriv'}
\leanok
\mathlibok
\uses{}
Let $\mu, \nu, \xi$ be $\sigma$-finite measures on $\mathcal X$.
\begin{enumerate}
Expand All @@ -128,7 +208,7 @@ \section{Composition-product}
\end{enumerate}
\end{lemma}

\begin{proof}\leanok
\begin{proof}\mathlibok
\uses{}
\end{proof}

Expand Down Expand Up @@ -360,8 +440,6 @@ \section{Sigma-algebras}

\begin{proof}\leanok
\uses{lem:rnDeriv_map_eq_condexp}
TODO: the current Lean proof is not the proof presented here (but a direct computation).

The restriction $\mu_{| \mathcal A}$ is the map of $\mu$ by the identity, seen as a function from $\mathcal X$ with its $\sigma$-algebra to $\mathcal X$ with $\mathcal{A}$.
We can thus apply Lemma~\ref{lem:rnDeriv_map_eq_condexp}.
\end{proof}
Expand Down
2 changes: 2 additions & 0 deletions blueprint/src/web.tex
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
\input{macros/common}
\input{macros/web}

\graphcolor{mathlib}{purple}{The statement of this result is in Mathlib}

\home{https://RemyDegenne.github.io/testing-lower-bounds}
\github{https://github.com/RemyDegenne/testing-lower-bounds}
\dochome{https://RemyDegenne.github.io/testing-lower-bounds/docs}
Expand Down

0 comments on commit 0f09ff1

Please sign in to comment.