Suppose we have a biased die, but we don't know what the bias is. A good
estimate of the bias might come from calculating the average, and comparing
that with the expected value. Since
How large does
To put this into a formula:
In order to prove this, let's take a detour to Markov's Inequality.
For a nonnegative random variable X (i.e. $X(\omega) \geq 0, \forall \omega\in \Omega$) with a finite mean,
for any positive constant
Let $\mathscr{A}$ denote the range of $X$, and consider any constant $c\in
\mathscr{A}$. Then,
$$\mathbb{E}[X]=\sum_{a\in\mathbb{A}}a\times\mathbb{P}[X=a] \geq\sum_{a\geq c}
a\times\mathbb{P}[X=a]\geq\sum_{a\geq c} c\times\mathbb{P}[X=a] \\
= c\sum_{a\geq c}\mathbb{P}[X=a]=c\times\mathbb{P}[X\geq c]$$
The proof starts with the definition of expectation, then uses the fact that
We can define a slicker proof using the indicator function
Since $X$ is a nonnegative random variable and $c>0$, we have, $\forall \omega
\in \Omega$,
$$\begin{gather}X(\omega)\geq cI\{X(\omega)\geq c\},\end{gather}$$
since the right hand side is 0 if $X(\omega)<c$ and is $c$ if $X(\omega)>c$.
Multiply both sides by $\mathbb{P}[\omega]$ and sum over $\omega\in\Omega$ to
give,
$$\mathbb{E}[X]\geq c\mathbb{E}[I\{X\geq c\}]=c\times\mathbb{P}[X\geq c]$$
where the first inequality follows from (1) and the fact that $\mathbb{P}[
\omega] \geq 0, \forall \omega \in \Omega$ , while the last inequality follows
from the fact that $I\{X\geq c\}$ is an indicator random variable.
Let
For $c>0$ and $r>0$ (if $r$ were negative, the last inequality below does not
hold), we have,
$$|Y|^r\geq |Y|^rI\{|Y|\geq c\}\geq c^rI\{|Y|\geq c\}$$.
Taking the expectations of both sides,
$$\mathbb{E}[|Y|^r]\geq c^r\mathbb{E}[I\{|Y|\geq c\}]=c^r\mathbb{P}[|Y|\geq c]$$
Divde by $c^r$ to find the desired result.
For a random variable
Let $Y=(X-\mu)^2$, and note that $\mathbb{E}[Y]=\mathbb{E}[(X-\mu)^2]=Var(X)$.
$$|X-\mu|\geq c\implies Y=(X-\mu)^2\geq c^2$$
Therefore, $\mathbb{P}[|X-\mu|\geq c]=\mathbb{P}[Y\geq c^2]$. Since $Y$ is
nonnegative, applying Markov's inequality yields
$$\mathbb{P}[|X-\mu|\geq c]=\mathbb{P}[Y\geq c^2]\leq \frac{\mathbb{E}[Y]}{c^2}=
\frac{Var(X)}{c^2}$$
And here's another proof using the generalized theorem:
Let $Y=X-\mu$. Then, since $|Y|^2=(X-\mu)^2$, we have $\mathbb{E}[|Y|^2]=\mathbb
{E}[(X-\mu)^2]=Var(X)$. Hence,
$$|X-\mu|\geq c\implies Y=(X-\mu)^2\geq c^2$$
follows from applying Generalized Markov's Inequality to $Y=X-\mu$ with $r=2$.
Stopping to consider what Chebyshev's Inequality means, it tells us that the
probability of any given deviation,
For any random variable
for any constant
This means that, for example, the probability of deviating from the mean by more
than 2 standard deviations on either side is at most