Until now, we have focused on discrete sample spaces
Previously, we discussed uniform probability as the total probability divided by
the number of sample points. However, in a continuous setting, there are
infinitely many sample points
Consider any interval
Note that intervals are subsets of the sample space
By specifying the probability of intervals, we have also specified the
probability of any event
Instead of specying
A probability density function (p.d.f.) for a real-valued random variable
-
$f$ is non-negative:$f(x)\geq 0$ for all$x\in\mathbb{R}$ . - The total integral of
$f$ is equal to 1:$\int^\infty_{-\infty}f(x)dx=1$ .
Then the distribution of
It is tempting to think of
For a continuous random variable, one often discusses the cumulative
distribution function (c.d.f), which is the function
Thus, one can describe an r.v.
To connect to discrete probability, one might think of approximating a
continuous random variable
for small "dx", and
Taking the limit as
The expectation of a continuous random variable
The integral plays the role of summation in the discrete case. Likewise, the
variance of a continuous random variable
A joint density function for two random variables
-
$f$ is non-negative:$f(x,y)\geq 0$ for all$x,y\in \mathbb{R}$ . - The total integral of
$f$ is equal to 1:$\int^\infty_{-\infty} \int^\infty_{-\infty}f(x,y)dx dxy=1$ .
The joint distribution of
In analogy with the above, we can connect the joint density
Thus, we can interpret
Two continuous random variables
For small
where
The exponential distribution is a continuous version of the geometric distribution. An example is, if we are waiting for an apple to fall off a tree, it can be at any time, not necessarily on the tick of a discrete clock. Such events are naturally modelled by exponential distribution:
For
is called an exponential random variable with parameter
By definition,
Let
Using integration by parts,
$$\mathbb{E}[X]=\int^\infty_{-\infty}xf(x)dx=\int^\infty_0\lambda xe^{-\lambda
x}dx=-xe^{-\lambda x}\rvert^\infty_0+\int^\infty_0e^{-\lambda x}dx=0+(-\frac{e
^{-\lambda x}}{\lambda})\rvert^\infty_0=\frac{1}{\lambda}$$.
$$\mathbb{E}[X]=\int^\infty_{-\infty}x^2f(x)dx=\int^\infty_0\lambda x^2e^{-
\lambda x}dx=-x^2e^{-\lambda x}\rvert^\infty_0+\int^\infty_02xe^{-\lambda x}dx=
0+\frac{2}{\lambda}\mathbb{E}[X]=\frac{2}{\lambda^2}$$.
Therefore,
$$Var(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2=\frac{2}{\lambda^2}-\frac{1}{\lambda^2}
=\frac{1}{\lambda^2}$$.
Like geometric distribution, exponential distribution has a single parameter
In other words, the probability that we have to wait more than time
For any
where the final approximation holds in the limit as
We see this distribution has the same form as the exponential distribution with
parameter
The normal, or Gaussian distribution has two parameters
is called a normal random variable with parameters
By definition
The fact that this integral evaluates to 1 is a routine exercise in multivariable integral calculus.
A plot of the p.d.f. reveals the classic bell curve symmetric around
If $X\sim N(\mu,\sigma^2)$, then $Y=\frac{X-\mu}{\sigma}\sim N(0,1)$.
Equivalently, if $Y\sim N(0,1)$, then $X=\sigma Y+\mu\sim N(\mu,\sigma^2)$.
Given $X\sim N(\mu,\sigma^2)$, we can calculate the distribution of $Y=\frac{X-
\mu}{\sigma}$ as
$$\mathbb{P}[a\leq Y\leq b]=\mathbb{P}[\sigma a+\mu\leq X\leq \sigma b+\mu]=
\frac{1}{\sqrt{2\pi\sigma^2}}\int^{\sigma b+\mu}_{\sigma a+\mu} e^{-\frac{(x-
\mu)^2}{2\sigma^2}} dx=\frac{1}{\sqrt{2\pi}}\int^b_a e^{-\frac{y^2}{2}} dy$$.
For $X\sim N(\mu,\sigma^2)$,
$$\mathbb{E}[X]=\mu\textit{ and } Var(X)=\sigma^2$$.
Consider the case when $X\sim N(0,1)$. By definition, its expectation is
$$\mathbb{E}[X]=\int^\infty_{-\infty}xf(x)dx=\frac{1}{\sqrt{2\pi}}\int^\infty_
{-\infty} e^{-\frac{x^2}{2}} dx=\frac{1}{\sqrt{2\pi}}(\int^0_{-\infty}xe^{-\frac
{x^2}{2}}dx+\int^\infty_0xe^{-\frac{x^2}{2}dx})=0$$.
The last step follows from the fact that $e^{-\frac{x^2}{2}}$ is symmetrical
about $x=0$, so the two integrals are the same except for the sign. For
variance,
$$Var(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2=\frac{1}{\sqrt{2\pi}}\int^infty_
{-\infty}x^2e^{-\frac{x^2}{2}}dx\\=\frac{1}{\sqrt{2\pi}}(-xe^{-\frac{x^2}{2}})
\rvert^\infty_{-\infty}+\frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty e^{-\frac{
x^2}{2}}dx\\=\frac{1}{\sqrt{2\pi}}\int^\infty_{-\infty}e^{-\frac{x^2}{2}}dx=1$$.
In the first line we have used the fact $\mathbb{E}[X]=0$, in the second line we
used integration by parts, and in the last line we used the special case. So in
the standard normal distribution, $\mathbb{E}[X]=\mu$ and $Var(X)=\sigma^2$.
Now, consider the general case. By the Lemma, we know $Y=\frac{X-\mu}{\sigma}$
is a standard normal random variable, so $\mathbb{E}[Y]=0$ and $Var(Y)=1$, as we
have established above. Therefore, using linearity,
$$0=\mathbb{E}[Y]=\mathbb{E}[\frac{X-\mu}{\sigma}]=\frac{\mathbb{E}[X]-\mu}
{\sigma},$$
and hence $\mathbb{E}[X]=\mu$. For variance,
$$1=Var(Y)=Var(\frac{X-\mu}{\sigma})=\frac{Var(X)}{\sigma^2},$$
and hence $Var(X)=\sigma^2$.
The sum of independent random normal variables is also normally distributed.
Let
Since $X$ and $Y$ are independent, by the theorem we know the joint density of
$(X,Y)$ is
$$f(x,y)=f(x)f(y)=\frac{1}{2\pi}e^{-\frac{x^2+y^2}{2}}$$.
The key observation is that $f(x,y)$ is rotationally symmetric around the origin
(i.e. $f(x,y)$ only depends on the value $x^2+y^2$, the distance of the point
$(x,y)$ from the origin $(0,0)$.
Thus, $f(T(x,y))=f(x,y)$ where $T$ is any rotation of the plane $\mathbb{R}^2$
about the origin. It follows that for any set $A\subseteq\mathbb{R}^2$. Now,
given any $t\in\mathbb{R}$, we have
$$\mathbb{P}[Z\leq t]=\mathbb{P}[aX+bY\leq t]=\mathbb{P}[(X,Y)\in A]$$
where $A$ is the half plane $\{(x,y)|ax+by\leq t\}$. The boundary line $ax+by=t$
lies at a distance $d=\frac{t}{\sqrt{a^2+b^2}}$ from the origin. Therefore, set
$A$ can be rotated into the set
$$T(A)=\{(x,y)\rvert x\leq\frac{t}{\sqrt{a^2+b^2}}\}$$.
This rotation does not change probability:
$$\mathbb{P}[Z\leq t]=\mathbb{P}[(X,Y)\in A]=\mathbb{P}[(X,Y)\in T(A)]\\=
\mathbb{P}[X\leq\frac{t}{\sqrt{a^2+b^2}}]=\mathbb{P}[\sqrt{a^2+b^2}X\leq t]$$.
Since the equation above holds for all $t\in\mathbb{R}$, we can conclude $Z$ has
the same distribution as $\sqrt{a^2+b^2}X$. Since $X$ has standard normal
distribution, we know by the lemma that $\sqrt{a^2+b^2}X$ has normal
distribution with mean 0 and variance $a^2+b^2$, hence we conclude $Z=aX+bY$
has the same values.
The general case follows such that:
Let $X\sim N(\mu_X,\sigma_X^2)$ and $Y\sim N(\mu_Y,\sigma^2_Y)$ be independent
normal random variables. Then for any constants $a, b \in \mathbb{R}$, the
random variable $Z=aX+bY$ is also normally distributed with mean $\mu=a\mu_X+
b\mu_Y$ and variance $\sigma^2=a^2\sigma^2_X+b^2\sigma^2_Y$.
By the lemma, $Z_1=\frac{X-\mu_X}{\sigma_X}$ and $Z_2=\frac{Y-\mu_Y}{\sigma_Y}$
are independent standard normal random variables. We can write
$$Z=aX+bY=a(\mu_X+\sigma_XZ_1))+b(\mu_Y+\sigma_YZ_2)=(a\mu_X+b\mu_Y)+(a\sigma_X
Z_1+b\sigma_YZ_2)$$.
By the previous theorem, $Z'=a\sigma_XZ_1+b\sigma_YZ_2$ is normally distributed
with mean $0$ and variance $\sigma^2=a^2\sigma^2_x+b^2\sigma_Y^2$. Since $\mu=
a\mu_X+b\mu_Y$ is a constant, by the lemma we conclude that $Z=\mu+Z'$ is a
normal random variable with mean $\mu$ and variance $\sigma^2$.