diff --git a/doc/binomial-priors.png b/doc/binomial-priors.png
new file mode 100644
index 0000000..4aeae70
Binary files /dev/null and b/doc/binomial-priors.png differ
diff --git a/doc/objective-bayesian-inference.md b/doc/objective-bayesian-inference.md
new file mode 100644
index 0000000..36ad573
--- /dev/null
+++ b/doc/objective-bayesian-inference.md
@@ -0,0 +1,815 @@
+# Introduction to Objective Bayesian Inference
+There's a common misconception that Bayesian inference is primarily a subjective approach to statistics and
+frequentist inference provides objectivity. While Bayesian statistics can certainly be used to incorporate
+subjective knowledge, it also provides powerful methods to do objective analysis. Moreover, objective Bayesian
+inference probably has a stronger argument for objectivity than many frequentist methods such as P-values
+([[13](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)]).
+
+This blog post provides a brief introduction to objective Bayesian inference. It discusses the
+history of objective Bayesian inference from inverse probability up to modern reference priors, how
+metrics such as frequentist matching coverage provide a way to quantify what it means for a prior to be objective,
+and how to build objective priors. Finally, it revists a few classical problems studied by Bayes and Laplace to
+show how they might be solved with a more modern approach to objective Bayesian inference.
+
+1. [History](#history)
+2. [Priors and Frequentist Matching](#priors-and-frequentist-matching)
+ * [Example 1: A Normal Distribution with Unknown Mean](#example-1-a-normal-distribution-with-unknown-mean)
+ * [Example 2: A Normal Distribution with Unknown Variance](#example-2-a-normal-distribution-with-unknown-variance)
+3. [The Binomial Distribution Prior](#the-binomial-distribution-prior)
+4. [Applications from Bayes and Laplace](#applications-from-bayes-and-laplace)
+ * [Example 3: Observing only 1s](#example-3-observing-only-1s)
+ * [Example 4: A Lottery](#example-4-a-lottery)
+ * [Example 5: Birth Rates](#example-5-birth-rates)
+5. [Discussion](#discussion)
+6. [Conclusion](#conclusion)
+# History
+In 1654 Pascal and Fermat worked together to solve the [problem of the points](https://en.wikipedia.org/wiki/Problem_of_points)
+and in so doing developed an early theory for deductive reasoning with direct probabilities. Thirty years later, Jacob
+Bernoulli worked to extend probability theory to solve inductive problems.
+He recognized that unlike in games of chance, it was futile to
+*a priori* enumerate possible cases and find out "how much more easily can some occur than the others":
+> But, who from among the mortals will be able to determine, for example, the
+> number of diseases, that is, the same number of cases which at each age
+> invade the innumerable parts of the human body and can bring about our
+> death; and how much easier one disease (for example, the plague) can
+> kill a man than another one (for example, rabies; or, the rabies than
+> fever), so that we would be able to conjecture about the future state
+> of life or death? And who will count the innumerable cases of changes
+> to which the air is subjected each day so as to form a conjecture about
+> its state in a month, to say nothing about a year? Again, who knows the
+> nature of the human mind or the admirable fabric of our body shrewdly
+> enough for daring to determine the cases in which one or another
+> participant can gain victory or be ruined in games completely or partly
+> depending on acumen or agility of body?
+[[1](http://www.sheynin.de/download/bernoulli.pdf), p. 18]
+
+The way forward, he reasoned, was to determine probabilities *a posteriori*
+> Here, however, another way for attaining the desired is really opening
+> for us. And, what we are not given to derive a priori, we at least can
+> obtain a posteriori, that is, can extract it from a repeated
+> observation of the results of similar examples.
+[[1](http://www.sheynin.de/download/bernoulli.pdf), p. 18]
+
+To establish the validity of the approach, Bernoulli proved a version of the law of large numbers for the binomial
+distribution. Let $X_n$ represent a sample from a Bernoulli distribution with parameter $r/t$ ($r$ and $t$ integers). Then
+if $c$ represents some positive integer, Bernoulli showed that for $N$ large enough
+```math
+ P\left(|\frac{}{N} - \frac{r}{t} | < \frac{2}{t}\right)
+ >
+ c \cdot P\left(|\frac{X_1 + \cdots + X_N}{N} - \frac{r}{t} | > \frac{2}{t}\right).
+```
+In other words, *the probability the sampled ratio from a binomial distribution is contained within the bounds (r−1)/t to (r+1)/t is at least c times more likely than the the probability it is outside the bounds*. Thus, by taking enough samples, “we determine the [parameter] a posteriori almost as though it was known to us a prior”.
+
+Bernoulli, additionally, derived lower bounds, given $r$ and $t$, for how many samples would be needed to achieve
+a desired levels of accuracy. For example, if $r=30$ and $t=50$, he showed
+> having made 25550 experiments, it will be more than a thousand times more
+> likely that the ratio of the number of obtained fertile observations to
+> their total number is contained within the limits 31/50 and 29/50
+> rather than beyond them [[1](http://www.sheynin.de/download/bernoulli.pdf), p. 30]
+
+This suggested an approach to inference, but it came up short in several respects. 1) The bounds derived were
+conditional on knowledge of the true parameter. It didn't provide a way to quantify uncertainty
+when the parameter was unknown. And 2) the number of experiments required to reach a high level of confidence in an
+estimate, *moral certainty* in Bernoulli's words, was quite large, limiting the approach's practicality.
+Abraham de Moivre would later improve on
+Bernoulli's work in his highly popular textbook *The Doctrine of Chances*.
+He derive considerably tighter bounds, but again failed to provide a way to quantify uncertainty when
+the binomial distribution's parameter was unknown, offering only this qualitative guidance
+> if after taking a great number of Experiments, it should be perceived that the
+> happenings and failings have been nearly in a certain proportion, such
+> as of 2 to 1, it may safely be concluded that the Probabilities of
+> happening or failing at any one time assigned will be very near that
+> proportion, and that the greater the number of Experiments has been, so
+> much nearer the Truth will the conjectures be that are derived from
+> them.
+[[2](https://www.ime.usp.br/~walterfm/cursos/mac5796/DoctrineOfChances.pdf), p. 242]
+
+
+
+INSPIRED BY DE MOIVRE'S book, Thomas Bayes took up the problem of inference with the
+binomial distribution. He reframed the goal to
+> Given the number of times in which an unknown event has happened and failed: Required the chance
+> that the probability of its happening in a single trial lies somewhere between any two degrees of
+> probability that can be named.
+[[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 4]
+
+Recognizing that a solution would depend on prior probability, Bayes sought to
+give an answer for
+> the case of an event concerning the probability of which we absolutely know nothing antecedently
+> to any trials made concerning it
+[[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 11]
+
+He reasoned that knowing nothing was equivalent to a uniform prior distribution [4, p. 184-188].
+Using the
+uniform prior and a geometric analogy with balls, Bayes succeeded in approximating integrals of posterior
+distributions of the form
+```math
+ \frac{\Gamma(n+2)}{\Gamma(y+1) \Gamma(n - y + 1)} \int_a^b \theta^y (1 - \theta)^{n - y} d\theta
+```
+and was able to answer questions like "if I observe $y$ success and $n - y$ failures from a binomial
+distribution with unknown parameter $\theta$, what is the probability that $\theta$ is between $a$ and $b$?".
+
+Despite Bayes' success answering inferential questions, his method was not widely adopted
+and his work, published posthumously in 1763, remained obscure up until De Morgan renewed attention
+to it over fifty years later. A major obstacle was Bayes' geometric treatment of integration; as
+mathematical historian Stephen Stigler writes,
+> Bayes essay 'Towards solving a problem in the doctrine of chances' is extremely
+> difficult to read today--even when we know what to look for. [4, p. 179]
+
+
+
+A DECADE AFTER Bayes' death and likely unaware of his discoveries, Laplace pursued similar problems
+and independently arrive at the same approach. Laplace revisited the famous problem of the points, but this time considered
+the case of a skilled game where the probability of a player winning a round was modeled by a Bernoulli
+distribution with unknown parameter $p$. Like Bayes, Laplace assumed
+a uniform prior, noting only
+> because the probability that A will win a point is unknown, we may
+> suppose it to be any unspecified number whatever between $0$ and $1$. [[5](https://www.york.ac.uk/depts/maths/histstat/memoir1774.pdf)]
+
+Unlike Bayes, though, Laplace did not use a geometric approach. He approached the problems with a much more developed
+analytical toolbox and was able to derive more usable formulas with integrals and clearer notation.
+
+Following Laplace and up until the early 20th century,
+using a uniform prior together with Bayes' theorem became a popular approach to statistical inference.
+In 1837, De Morgan introduced the term *inverse probability*
+to refer to such methods and acknowledged Bayes' earlier work
+> De Moivre, nevertheless, did not discover the inverse method. This was
+> first used by the Rev. T. Bayes, in Phil. Trans. liii. 370.; and the author,
+> though now almost forgotten, deserves the most honourable rememberance from
+> all who read the history of this science. [[6](https://archive.org/details/134257988/page/n7/mode/2up)]
+
+
+
+IN THE EARLY 20TH CENTURY, inverse probability came under serious attack for its use of a uniform prior.
+Ronald Fisher, one of the fiercest critics, wrote
+> I know only one case in mathematics of a doctrine which has
+> been accepted and developed by the most eminent men of their
+> time, and is now perhaps accepted by men now living, which at the
+> same time has appeared to a succession of sound writers to be
+> fundamentally false and devoid of foundation. Yet that is quite
+> exactly the position in respect of inverse probability
+[[7](https://errorstatistics.com/wp-content/uploads/2016/02/fisher-1930-inverse-probability.pdf)]
+
+Fisher criticized inverse probability as "extremely arbitrary". Reviewing
+Bayes' essay, he pointed out how naive use of a uniform prior leads to solutions that depend on
+the scale used to measure probability. He gave a concrete example [[8](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Fisher1922.pdf)]: Let $p$ denote the unknown parameter for a
+binomial distribution. Suppose that instead of $p$ we parameterize by
+```math
+ \theta = \arcsin \left(2p-1\right), \quad -\frac{\pi}{2} \le \theta \le \frac{\pi}{2},
+```
+and apply the uniform prior. Then the probability that $\theta$ is between $a$ and $b$ after observing
+$S$ successes and $F$ failures is
+```math
+ \frac{1}{\pi} \int_a^b \left(\frac{\sin \theta + 1}{2}\right)^S \left(\frac{1 - \sin \theta}{2}\right)^F d\theta.
+```
+A change of variables back to $p$ shows us this is equivalent to
+```math
+ \frac{1}{\pi} \int_{(\sin a + 1)/2}^{(\sin b + 1)/2} \left(p\right)^{S-1/2} \left(1 - p\right)^{F-1/2} dp.
+```
+Hence, the uniform prior in $\theta$ is equivalent to the prior $\frac{1}{\pi}p^{-1/2} (1-p)^{-1/2}$ in $p$. As
+an alternative to inverse probability, Fisher promoted maximum likelihood methods, p-values, and a frequentist
+definition for probability.
+
+
+
+WHILE FISHER and others advocated for abandoning inverse probability in favor of frequentist methods,
+Harold Jeffreys worked to put inverse probability on a firmer foundation. He acknowledged previous
+approaches to inverse probability had lacked consistency, but he agreed with their goal of delivering
+statistical results in terms of degree of belief and thought frequentist definitions of probability
+to be hopelessly flawed:
+> frequentist definitions themselves lead to no results of the kind that we need until the notion of
+> reasonable degree of belief is reintroduced, and that since the whole purpose of these definitions
+> is to avoid this notion they necessarily fail in their object. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 34]
+
+Jeffreys pointed out that inverse probability needn't be tied to the uniform prior:
+> There is no more need for [the idea that the uniform distribution of the prior probability was a necessary
+> part of the principle of inverse probability] than there is to say that an oven that has once cooked
+> roast beef can never cook anything but roast beef. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 103]
+
+Seeking to achieve results that would be consistent under reparameterizations, Jeffreys proposed priors
+based on the Fisher information matrix,
+```math
+ \begin{align*}
+ \pi(\boldsymbol{\theta}) &\propto \left[\det{\mathcal{I}(\boldsymbol{\theta})}\right]^{1/2} \\
+ \mathcal{I}(\boldsymbol{\theta})_{st} &= \mathbb{E}_{\boldsymbol{y}}\left\{
+ \left(\frac{\partial}{\partial \theta_s} \log P(\boldsymbol{y}\mid\boldsymbol{\theta})\right)
+ \left(\frac{\partial}{\partial \theta_t} \log P(\boldsymbol{y}\mid\boldsymbol{\theta})\right)
+ \mid\boldsymbol{\theta}
+ \right\},
+ \end{align*}
+```
+writing
+> If we took the prior probability density for the parameters to be proportional to
+> [(det I(θ))^(1/2)$]
+> ... any arbitrariness in the choice of the parameters could make no difference to the results,
+> and it is proved that for this wide class of laws a consistent theory of probability can be
+> constructed. [[9](https://archive.org/details/in.ernet.dli.2015.2608), p. 159]
+
+Twenty years later, Welch and Peers investigated priors from a different perspective ([[10](https://academic.oup.com/jrsssb/article-abstract/25/2/318/7035245?redirectedFrom=PDF)]).
+They analyzed one-tailed credible sets from posterior distributions and asked how closely probability
+mass coverage matched frequentist coverage. They found that for the case of a single parameter
+the prior Jeffreys proposed was asymptotically optimal, providing further justification for the
+prior that aligned with how intuition suggests we might quantify Bayes criterion of
+"knowing absolutely nothing".
+> Note: Deriving good priors in the multi-parameter case is considerably more involved. Jeffreys himself
+> was dissatisfied with the prior his rule produced for multi-parameter models and proposed an
+> alternative known as *Jeffreys independent prior* but never developed a rigorous approach.
+> José-Miguel Bernardo and James Berger would later develop *reference priors* as a refinement of
+> Jeffreys prior. Reference priors provide a general mechanism to produce good priors that works for
+> multi-parameter models and cases
+> where the Fisher information matrix doesn't exist. See [[11](https://projecteuclid.org/journals/annals-of-statistics/volume-37/issue-2/The-formal-definition-of-reference-priors/10.1214/07-AOS587.full)] and
+> [[12](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903), part 3].
+
+
+
+IN AN UNFORTUNATE turn of events, mainstream statistics mostly ignored Jeffreys approach
+to inverse probability to chase a mirage of objectivity that frequentist methods seemed to provide.
+> Note: Development of inverse probability in the direction Jeffreys outlined
+> continued under the name objective Bayesian analysis; however, it hardly
+> occupies the center stage of statistics, and many people mistakenly think of
+> Bayesian analysis as more of a subjective theory.
+>
+> See [[13](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)] for background on
+> why the objectivity that many perceive frequentist methods to have is largely
+> false.
+
+
+But much as Jeffreys had anticipated with his criticism that frequentist
+definitions of probability couldn’t provide “results of the kind that we need”,
+a majority of practitioners filled in the blank by misinterpreting frequentist
+results as providing belief probabilities. Goodman coined the term *P value
+fallacy* to refer to this common error and described just how prevalent it is
+
+> In my experience teaching many academic physicians, when physicians are
+> presented with a single-sentence summary of a study that produced a surprising
+> result with P = 0.05, the overwhelming majority will confidently state that
+> there is a 95% or greater chance that the null hypothesis is incorrect. [[14](https://pubmed.ncbi.nlm.nih.gov/10383371/)]
+
+James Berger and Thomas Sellke established theoretical and simulation results
+that show how *spectacularly wrong* this notion is
+
+> it is shown that actual evidence against a null (as measured, say, by posterior
+> probability or comparative likelihood) can differ by an order of magnitude from
+> the P value. For instance, data that yield a P value of .05, when testing a
+> normal mean, result in a posterior probability of the null of at least .30 for
+> any objective prior distribution. [[15](https://www2.stat.duke.edu/courses/Spring07/sta215/lec/BvP/BergSell1987.pdf)]
+
+They concluded
+
+> for testing “precise” hypotheses, p values should not be used directly, because
+> they are too easily misinterpreted. The standard approach in teaching–of
+> stressing the formal definition of a p value while warning against its
+> misinterpretation–has simply been an abysmal failure. [[16](https://hannig.cloudapps.unc.edu/STOR654/handouts/SellkeBayarriBerger2001.pdf)]
+
+In this post, we’ll look closer at how priors for objective Bayesian analysis
+can be justified by matching coverage; and we’ll reexamine the problems Bayes
+and Laplace studied to see how they might be approached with a more modern
+methodology.
+
+# Priors and Frequentist Matching
+The idea of matching priors intuitively aligns with how we might think about
+probability in the absence of prior knowledge. We can think of the frequentist
+coverage matching metric as a way to provide an answer to the question “How
+accurate are the Bayesian credible sets produced with a given prior?”.
+
+> Note: For more background on frequentist coverage matching and its relation to
+> objective Bayesian analysis, see [[17](https://www.uv.es/~bernardo/OBayes.pdf)] and [[12](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903), ch. 5].
+
+
+
+CONSIDER A PROBABILITY model with a single parameter $\theta$. If we're given a prior,
+$\pi(\theta)$, how do we test if the prior reasonably expresses
+Bayes' requirement of knowing nothing? Let's pick a size $n$, a value $\theta_\textrm{true}$, and
+randomly sample observations $\boldsymbol{y}=(y_1$, $\ldots$, $y_n)^\top$ from the distribution
+$P(\cdot\vert\theta_\textrm{true})$. Then let's compute the two-tailed credible set $[\theta_a, \theta_b]$
+that contains 95% of the probability mass of the posterior,
+```math
+ \pi(\theta\mid\boldsymbol{y}) \propto P(\boldsymbol{y}\mid \theta) \times \pi(\theta),
+```
+and record whether or not the credible set contains $\theta_\textrm{true}$. Now
+suppose we repeat the experiment many times and vary $n$ and $\theta_\textrm{true}$. If $\pi(θ)$ is
+a good prior, then the fraction of trials where $\theta_\textrm{true}$ is contained within the
+credible set will consistently be close to 95%.
+
+Here’s how we might express this experiment as an algorithm:
+```
+function coverage-test(n, θ_true, α):
+ cnt ← 0
+ N ← a large number
+ for i ← 1 to N do
+ y ← sample from P(·|θ_true)
+ t ← integrate_{-∞}^θ_true π(θ | y)dθ
+ if (1 - α)/2 < t < 1 - (1 - α)/2:
+ cnt ← cnt + 1
+ end if
+ end for
+ return cnt / N
+```
+## Example 1: a normal distribution with unknown mean
+Suppose we observe n normally distributed values, y, with variance 1 and unknown mean, μ. Let’s consider the prior
+```math
+\pi(\mu) \propto 1
+```
+> Note: In this case Jeffreys prior and the constant prior in μ are the same.
+
+Then
+```math
+\begin{align*}
+ P(\boldsymbol{y}\mid\mu)
+ &\propto \exp\left\{-\frac{1}{2}\left(\boldsymbol{y} - \mu \boldsymbol{1}\right)'\left(\boldsymbol{y} - \mu \boldsymbol{1}\right)\right\} \\
+ &\propto \exp\left\{-\frac{1}{2}\left(n \mu^2 - 2 \mu n \bar{y}\right)\right\} \\
+ &\propto \exp\left\{-\frac{n}{2}\left(\mu - \bar{y}\right)^2\right\}.
+\end{align*}
+```
+Thus,
+```math
+ \int_{-\infty}^t \pi(\mu\mid\boldsymbol{y})d\mu =
+ \frac{1}{2} \left[1 + \textrm{erf}\left(\frac{t - \bar{y}}{\sqrt{2/n}}\right)\right].
+```
+I ran a 95% coverage test with 10000 trials and various values of μ and n. As
+the table below shows, the results are all close to 95%, indicating the
+constant prior is a good choice in this case. [Source code for example](https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb).
+
+| $\mu_\textrm{true}$ | n=5 | n=10 | n=20 |
+| ---- | -------- | -------- | ------- |
+| 0.1 | 0.9502 | 0.9486 | 0.9485 |
+| 0.5 | 0.9519 | 0.9478 | 0.9487 |
+| 1.0 | 0.9516 | 0.9495 | 0.9519 |
+| 2.0 | 0.9514 | 0.9521 | 0.9512 |
+| 5.0 | 0.9489 | 0.9455 | 0.9497 |
+
+## Example 2: a normal distribution with unknown variance
+Now suppose we observe n normally distributed values, y, with unknown variance
+and zero mean, μ. Let’s test the constant prior and Jeffreys prior,
+```math
+ \pi_C(\sigma^2) \propto 1 \quad\textrm{and}\quad\pi_J(\sigma^2)\propto\frac{1}{\sigma^2}.
+```
+We have
+```math
+ P(\boldsymbol{y}\mid\sigma^2) \propto \left(\frac{1}{\sigma^2}\right)^{n/2} \exp\left\{-\frac{n s^2}{2 \sigma^2}\right\}
+```
+where $s^2 = \frac{\boldsymbol{y}' \boldsymbol{y}}{n}$. Put $u = \frac{n s^2}{2 \sigma^2}$. Then
+```math
+ \begin{align*}
+ \int_0^t
+ \left(\frac{1}{\sigma^2}\right)^{n/2} \exp\left\{-\frac{n s^2}{2 \sigma^2}\right\} d\sigma^2
+ &\propto \int_{\frac{n s^2}{2 t}}^\infty u^{n/2-2} \exp\left\{-u\right\} du \\
+ &= \Gamma(\frac{n - 2}{2}, \frac{n s^2}{2 t}).
+ \end{align*}
+```
+Thus,
+```math
+ \int_0^t \pi_C(\sigma^2\mid\boldsymbol{y}) d\sigma^2 =
+ \frac{1}{\Gamma(\frac{n - 2}{2})}
+ \Gamma(\frac{n - 2}{2}, \frac{n s^2}{2 t}).
+```
+Similarly,
+```math
+ \int_0^t \pi_J(\sigma^2\mid\boldsymbol{y}) d\sigma^2 =
+ \frac{1}{\Gamma(\frac{n}{2})}
+ \Gamma(\frac{n}{2}, \frac{n s^2}{2 t}).
+```
+The table below shows the results for a 95% coverage test with the constant
+prior. We can see that coverage is notably less than 95% for smaller values of
+n.
+| sigma_true^2 | n = 5 | n = 10 | n = 20 |
+| ------------- | -------- | -------- | ------- |
+| 0.1 | 0.9014 | 0.9288 | 0.9445 |
+| 0.5 | 0.9035 | 0.9309 | 0.9429 |
+| 1.0 | 0.9048 | 0.9303 | 0.9417 |
+| 2.0 | 0.9079 | 0.9331 | 0.9418 |
+| 5.0 | 0.9023 | 0.9295 | 0.9433 |
+
+In comparison, coverage is consistently close to 95% for all values of n if we
+use Jeffreys prior. [Source code for example](https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb).
+| sigma_true^2 | n = 5 | n = 10 | n = 20 |
+| ------------- | -------- | -------- | ------- |
+| 0.1 | 0.9516 | 0.9503 | 0.9533 |
+| 0.5 | 0.9501 | 0.9490 | 0.9537 |
+| 1.0 | 0.9505 | 0.9511 | 0.9519 |
+| 2.0 | 0.9480 | 0.9514 | 0.9498 |
+| 5.0 | 0.9506 | 0.9497 | 0.9507 |
+
+# The Binomial Distribution Prior
+Let’s apply Jeffreys approach to inverse probability to the binomial distribution.
+
+Suppose we observe n values from the binomial distribution. Let y denote the
+number of successes and θ denote the probability of success. The likelihood
+function is given by
+```math
+ L(\theta; y) \propto \theta^y (1 - \theta)^{n - y}.
+```
+Taking the log and differentiating, we have
+```math
+\begin{align*}
+ \frac{\partial}{\partial \theta} \log L(\theta;y) &= \frac{y}{\theta} - \frac{n - y}{1 - \theta} \\
+ &= \frac{y - n \theta}{\theta(1 - \theta)}.
+\end{align*}
+```
+Thus, the Fisher information matrix for the binomial distribution is
+```math
+\begin{align*}
+ \mathcal{I}(\theta) &= \mathbb{E}_{y} \left\{
+ \left(\frac{\partial}{\partial\theta} \log L(\theta;y) \right)^2\mid\theta\right\} \\
+ &= \mathbb{E}_{y}\left\{\left(\frac{y - n \theta}{\theta(1 - \theta)}\right)^2\mid\theta\right\} \\
+ &= \frac{n \theta (1 - \theta)}{\theta^2 (1 - \theta)^2} \\
+ &= \frac{n}{\theta (1 - \theta)},
+\end{align*}
+```
+and Jeffreys prior is
+```math
+\begin{align*}
+ \pi(\theta) &\propto \mathcal{I}(\theta)^{1/2} \\
+ &\propto \theta^{-1/2} (1 - \theta)^{-1/2}.
+\end{align*}
+```
+
+![Jeffreys prior and Laplace prior for the binomial model](./binomial-priors.png)
+
+The posterior is then
+```math
+ \pi(\theta\mid y) \propto \theta^{y-1/2} (1 - \theta)^{n - y - 1/2},
+```
+which we can recognize as the [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) with parameters y+1/2 and n-y+1/2.
+
+To test frequentist coverages, we can use an exact algorithm.
+```
+function binomial-coverage-test(n, θ_true, α):
+ cov ← 0
+ for y ← 0 to n do
+ t ← integrate_0^θ_true π(θ | y)dθ
+ if (1 - α)/2 < t < 1 - (1 - α)/2:
+ cov ← cov + binomial_coefficient(n, y) * θ_true^y * (1 - θ_true)^(n-y)
+ end if
+ end for
+ return cov
+```
+Here are the coverage results for α=0.95 and various values of p and n using the Bayes-Laplace uniform prior:
+| θ_true | n = 5 | n = 10 | n = 20 | n = 100 |
+| ----------- | -------- | -------- | -------- | -------- |
+| 0.0001 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
+| 0.0010 | 0.0000 | 0.0000 | 0.0000 | 0.9048 |
+| 0.0100 | 0.9510 | 0.9044 | 0.8179 | 0.9206 |
+| 0.1000 | 0.9185 | 0.9298 | 0.9568 | 0.9364 |
+| 0.2500 | 0.9844 | 0.9803 | 0.9348 | 0.9513 |
+| 0.5000 | 0.9375 | 0.9785 | 0.9586 | 0.9431 |
+
+and here are the coverage results using Jeffreys prior:
+
+| θ_true | n = 5 | n = 10 | n = 20 | n = 100 |
+| ----------- | -------- | -------- | -------- | -------- |
+| 0.0001 | 0.9995 | 0.9990 | 0.9980 | 0.9900 |
+| 0.0010 | 0.9950 | 0.9900 | 0.9802 | 0.9048 |
+| 0.0100 | 0.9510 | 0.9044 | 0.9831 | 0.9816 |
+| 0.1000 | 0.9914 | 0.9872 | 0.9568 | 0.9557 |
+| 0.2500 | 0.9844 | 0.9240 | 0.9348 | 0.9513 |
+| 0.5000 | 0.9375 | 0.9785 | 0.9586 | 0.9431 |
+
+We can see coverage is identical for many table entries. For smaller values of
+n and p_true, though, the uniform prior gives no coverage while Jeffreys prior
+provides decent results. [source code for experiment](https://github.com/rnburn/bbai/blob/master/example/15-binomial-coverage.ipynb)
+
+# Applications from Bayes and Laplace
+Let’s now revisit some applications Bayes and Laplace studied. Given that the
+goal in all of these problems is to assign a belief probability to an interval
+of the parameter space, I think that we can make a strong argument that
+Jeffreys prior is a better choice than the uniform prior since it has
+asymptotically optimal frequentist coverage performance. This also addresses
+Fisher’s criticism of arbitrariness.
+
+> Note: See [12, p. 105–106] for a more through discussion of the uniform prior vs Jeffreys prior for the binomial distribution
+
+In each of these problems, I’ll show both the answer given by Jeffreys prior
+and the original uniform prior that Bayes and Laplace used. One theme we’ll see
+is that many of the results are not that different. A lot of fuss is often made
+over minor differences in how objective priors can be derived. The differences
+can be important, but often the data dominates and different reasonable choices
+will lead to nearly the same result.
+
+## Example 3: Observing Only 1s
+In an appendix Richard Price added to Bayes’ essay, he considers the following problem:
+> Let us then first suppose, of such an event as that called M in the essay, or
+> an event about the probability of which, antecedently to trials, we know
+> nothing, that it has happened once, and that it is enquired what conclusion we
+> may draw from hence with respct to the probability of it’s happening on a
+> second trial. [[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 16]
+
+Specifically, Price asks, “what’s the probability that θ is greater than 1/2?”
+Using the uniform prior in Bayes’ essay, we derive the posterior distribution
+```math
+ \pi_\textrm{B}(\theta\mid y) = 2 x.
+```
+
+Integrating gives us the answer
+```math
+ \int_{\frac{1}{2}}^1 2x dx = \left. x^2 \right\rvert_{\frac{1}{2}}^1 = \frac{3}{4}.
+```
+Using Jeffreys prior, we derive a beta distribution for the posterior
+```math
+ \pi_J(\theta\mid y) = \frac{\Gamma(2)}{\Gamma(3/2) \Gamma(1/2)} \theta^{1/2} (1 - \theta)^{-1/2},
+```
+
+and the answer
+```math
+ \int_{\frac{1}{2}}^1 \pi_J(\theta\mid y) dx = \frac{1}{2} + \frac{1}{\pi} \approx 0.818
+```
+
+Price then continues with the same problem but supposes we see two 1s, three
+1s, etc. The table below shows the result we’d get up to ten 1s. [source code](https://github.com/rnburn/bbai/blob/master/example/16-bayes-examples.ipynb)
+
+| 1s observed | p > 0.5 (Bayes) | p > 0.5 (Jeffreys) |
+| ------------ | ---------------- | ------------------- |
+| 1 | 0.7500 | 0.8183 |
+| 2 | 0.8750 | 0.9244 |
+| 3 | 0.9375 | 0.9669 |
+| 4 | 0.9688 | 0.9850 |
+| 5 | 0.9844 | 0.9931 |
+| 6 | 0.9922 | 0.9968 |
+| 7 | 0.9961 | 0.9985 |
+| 8 | 0.9980 | 0.9993 |
+| 9 | 0.9990 | 0.9997 |
+| 10 | 0.9995 | 0.9998 |
+
+## Example 4: A Lottery
+Price also considers a lottery with an unknown chance of winning:
+
+> Let us then imagine a person present at the drawing of a lottery, who knows
+> nothing of its scheme or of the proportion of Blanks to Prizes in it. Let it
+> further be supposed, that he is obliged to infer this from the number of blanks
+> he hears drawn compared with the number of prizes; and that it is enquired what
+> conclusions in these circumstances he may reasonably make. [3, p. 19–20]
+
+He asks this specific question:
+
+> Let him first hear ten blanks drawn and one prize, and let it be enquired what
+> chance he will have for being right if he gussses that the proportion of blanks
+> to prizes in the lottery lies somewhere between the proportions of 9 to 1 and
+> 11 to 1. [[3](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf), p. 20]
+
+With Bayes prior and θ representing the probability of drawing a blank, we derive the posterior distribution
+```math
+ \pi_B(\theta\mid y) = \frac{\Gamma(13)}{\Gamma(11) \Gamma(2)} \theta^{10} (1 - \theta)^1,
+```
+and the answer
+```math
+ \int_{\frac{9}{10}}^{\frac{11}{12}} \pi_B(\theta\mid y) d\theta \approx 0.0770.
+```
+Using Jeffreys prior, we get the posterior
+```math
+ \pi_J(\theta\mid y) = \frac{\Gamma(12)}{\Gamma(21/2) \Gamma(3/2)} \theta^{19/2} (1 - \theta)^{1/2}
+```
+and the answer
+```math
+ \int_{\frac{9}{10}}^{\frac{11}{12}} \pi_J(\theta\mid y) d\theta \approx 0.0804.
+```
+
+Price then considers the same question (what’s the probability that θ lies
+between 9/10 and 11/12) for different cases where an observer of the lottery
+sees w prizes and 10w blanks. Below I show posterior probabilities using both
+Bayes’ uniform prior and Jeffreys prior for various values of w. [source code](https://github.com/rnburn/bbai/blob/master/example/16-bayes-examples.ipynb)
+| Blanks | Prizes | 9/10 < p < 11/12 (Bayes) | 9/10 < p < 11/12 (Jeffreys) |
+| ------- | -------- | -------------------------- | ----------------------------- |
+| 10 | 1 | 0.0770 | 0.0804 |
+| 20 | 2 | 0.1084 | 0.1107 |
+| 40 | 4 | 0.1527 | 0.1541 |
+| 100 | 10 | 0.2390 | 0.2395 |
+| 1000 | 100 | 0.6628 | 0.6618 |
+
+## Example 5: Birth Rates
+Let’s now turn to a problem that fascinated Laplace and his contemporaries: The
+relative birth rate of boys-to-girls. Laplace introduces the problem,
+
+> The consideration of the [influence of past events on the probability of future
+> events] leads me to speak of births: as this matter is one of the most
+> interesting in which we are able to apply the Calculus of probabilities, I
+> manage so to treat with all care owing to its importance, by determining what
+> is, in this case, the influence of the observed events on those which must take
+> place, and how, by its multiplying, they uncover for us the true ratio of the
+> possibilities of the births of a boy and of a girl. [[18](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf), p. 1]
+
+Like Bayes, Laplace approaches the problem using a uniform prior, writing
+
+> When we have nothing given a priori on the possibility of an event, it is
+> necessary to assume all the possibilities, from zero to unity, equally
+> probable; thus, observation can alone instruct us on the ratio of the births of
+> boys and of girls, we must, considering the thing only in itself and setting
+> aside the events, to assume the law of possibility of the births of a boy or of
+> a girl constant from zero to unity, and to start from this hypothesis into the
+> different problems that we can propose on this object. [[18](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf), p. 26]
+
+Using data collection from Paris between 1745 and 1770, where 251527 boys and
+241945 girls had been born, Laplace asks, what is “the probability that the
+possibility of the birth of a boy is equal or less than 1/2“?
+
+With a uniform prior, B = 251527, G = 241945, and θ representing the probability that a boy is born, we obtain the posterior
+```math
+ \pi_L(\theta\mid y) = \frac{\Gamma(B + G + 2)}{\Gamma(B + 1) \Gamma(G + 1)}
+ \theta^B (1 - \theta)^G
+```
+and the answer
+```math
+ \int_0^{1/2} \pi_L(\theta\mid y) d\theta \approx 1.1460\times 10^{-42}.
+```
+With Jeffreys prior, we similarly derive the posterior
+```math
+ \pi_J(\theta\mid y) = \frac{\Gamma(B + G + 1)}{\Gamma(B + 1/2) \Gamma(G + 1/2)}
+ \theta^{B -1/2} (1 - \theta)^{G-1/2}
+```
+and the answer
+```math
+ \int_0^{1/2} \pi_J(\theta\mid y) d\theta \approx 1.1458\times 10^{-42}.
+```
+Here’s some simulated data using p_true = B / (B + G) that shows how the
+answers might evolve as more births are observed.
+| Boys | Girls | p < 0.5 (Laplace) | p < 0.5 (Jeffreys) |
+| ------ | ------- | ------------------ | ------------------- |
+| 0 | 0 | 0.5000 | 0.5000 |
+| 749 | 751 | 0.5206 | 0.5206 |
+| 1511 | 1489 | 0.3440 | 0.3440 |
+| 2263 | 2237 | 0.3492 | 0.3492 |
+| 3081 | 2919 | 0.0182 | 0.0182 |
+| 3810 | 3690 | 0.0829 | 0.0829 |
+| 4514 | 4486 | 0.3839 | 0.3839 |
+| 5341 | 5159 | 0.0379 | 0.0379 |
+| 6139 | 5861 | 0.0056 | 0.0056 |
+| 6792 | 6708 | 0.2349 | 0.2349 |
+| 7608 | 7392 | 0.0389 | 0.0389 |
+| 8308 | 8192 | 0.1833 | 0.1832 |
+| 9145 | 8855 | 0.0153 | 0.0153 |
+| 9957 | 9543 | 0.0015 | 0.0015 |
+| 10618 | 10382 | 0.0517 | 0.0517 |
+
+# Discussion
+## Q1: Where does objective Bayesian analysis belong in statistics?
+I think Jeffreys was right and standard statistical procedures should deliver
+“results of the kind we need”. While Bayes and Laplace might not have been
+fully justified in their choice of a uniform prior, they were correct in their
+objective of quantifying results in terms of degree of belief. The approach
+Jeffreys outlined (and was later evolved with reference priors) gives us a
+pathway to provide “results of the kind we need” while addressing the
+arbitrariness of a uniform prior. Jeffreys approach isn’t the only way to get
+to results as degrees of belief, and a more subjective approach can also be
+valid if the situation allows, but his approach give us good answers for the
+common case “of an event concerning the probability of which we absolutely know
+nothing” and can be used as a drop-in replacement for frequentist methods.
+
+To answer more concretely, I think when you open up a standard
+introduction-to-statistics textbook and look up a basic procedure such as a
+hypothesis test of whether the mean of normally distributed data with unknown
+variance is non-zero, you should see a method built on objective priors and
+Bayes factor like [[19](https://www.jstor.org/stable/2670175)] rather than a method based on P values.
+
+## Q2: But aren’t there multiple ways of deriving good priors in the absence of prior knowledge?
+I highlighted frequentist coverage matching as a benchmark to gauge whether a
+prior is a good candidate for objective analysis, but coverage matching isn’t
+the only valid metric we could use and it may be possible to derive multiple
+priors with good coverage. Different priors with good frequentist properties,
+though, will likely be similar, and any results will be determined more by
+observations than the prior. If we are in a situation where multiple good
+priors lead to significantly differing results, then that’s an indicator we
+need to provide subjective input to get a useful answer. Here’s how Berger
+addresses this issue:
+
+Inventing a new criterion for finding “the optimal objective prior” has proven
+to be a popular research pastime, and the result is that many competing priors
+are now available for many situations. This multiplicity can be bewildering to
+the casual user. I have found the reference prior approach to be the most
+successful approach, sometimes complemented by invariance considerations as
+well as study of frequentist properties of resulting procedures. Through such
+considerations, a particular prior usually emerges as the clear winner in many
+scenarios, and can be put forth as the recommended objective prior for the
+situation. [[20](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf)]
+
+### Q3. Doesn’t that make inverse probability subjective, whereas frequentist methods provide an objective approach to statistics?
+
+It’s a common misconception that frequentist methods are objective. Berger and
+Berry provides this example to demonstrate [[21](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)]:
+Suppose we watch a research
+study a coin for bias. We see the researcher flip the coin 17 times. Heads
+comes up 13 times and tails comes up 4 times. Suppose θ represents the
+probability of heads and the researcher is doing a standard P-value test with
+the null hypothesis that the coin is not bias, θ=0.5. What P-value would they
+get? We can’t answer the question because the researcher would get remarkably
+different results depending on their experimental intentions.
+
+If the researcher intended to flip the coin 17 times, then the probability of
+seeing a value *less extreme* than 13 heads under the null hypothesis is given by
+summing binomial distribution terms representing the probabilities of getting 5
+to 12 heads,
+```math
+\sum_{k=5}^{12} \binom{17}{k} 0.5^{17} \approx 0.951
+```
+which gives us a P-value of 1–0.951=0.049.
+
+If, however, the researcher intended to continue flipping until they got at
+least 4 heads and 4 tails, then the probability of seeing a value *less extreme*
+than 17 total flips under the null hypothesis is given by summing negative
+binomial distribution terms representing the probabilities of getting 8 to 16
+total flips,
+```math
+\sum_{k=7}^{15} 2 \binom{k}{3} 0.5^{k+1} \approx 0.979
+```
+which gives us a P-value of 1–0.979=0.021
+
+The result is dependent on not just the data but also on the hidden intentions
+of the researcher. As Berger and Berry argue “objectivity is not generally
+possible in statistics and … standard statistical methods can produce
+misleading inferences.” [[21](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf)] [source code for example](https://github.com/rnburn/bbai/blob/master/example/14-p-value-objectivity.ipynb)
+
+## Q4. If subjectivity is unavoidable, why not just use subjective priors?
+
+When subjective input is possible, we should incorporate it. But we should also
+acknowledge that Bayes’ “event concerning the probability of which we
+absolutely know nothing” is an important fundamental problem of inference that
+needs good solutions. As Edwin Jaynes writes
+
+> To reject the question, [how do we find the prior representing “complete
+> ignorance”?], as some have done, on the grounds that the state of complete
+> ignorance does not “exist” would be just as absurd as to reject Euclidean
+> geometry on the grounds that a physical point does not exist. In the study of
+> inductive inference, the notion of complete ignorance intrudes itself into the
+> theory just as naturally and inevitably as the concept of zero in arithmetic.
+> If one rejects the consideration of complete ignorance on the grounds that the
+> notion is vague and ill-defined, the reply is that the notion cannot be evaded
+> in any full theory of inference. So if it is still ill-defined, then a major
+> and immediate objective must be to find a precise definition which will agree
+> with intuitive requirements and be of constructive use in a mathematical
+> theory. [[22](https://bayes.wustl.edu/etj/articles/prior.pdf)]
+
+Moreover, systematic approaches such as reference priors can certainly do much
+better than pseudo-Bayesian techniques such as choosing a uniform prior over a
+truncated parameter space or a vague proper prior such as a Gaussian over a
+region of the parameter space that looks interesting. Even when subjective
+information is available, using reference priors as building blocks is often
+the best way to incorporate it. For instance, if we know that a parameter is
+restricted to a certain range but don’t know anything more, we can simply adapt
+a reference prior by restricting and renormalizing it [[12](https://www.worldscientific.com/worldscibooks/10.1142/13640#t=aboutBook), p. 256].
+
+> Note: The term pseudo-Bayesian comes from [[20](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf)]. See that paper for a more through discussion and comparison with objective Bayesian analysis.
+
+# Conclusion
+The common and repeated misinterpretation of statistical results such as P
+values or confidence intervals as belief probabilities shows us that there is a
+strong natural tendency to want to think about inference in terms of inverse
+probability. It’s no wonder that the method dominated for 150 years.
+
+Fisher and others were certainly correct to criticize naive use of a uniform
+prior as arbitrary, but this is largely addressed by reference priors and
+adopting metrics like frequentist matching coverage that quantify what it means
+for a prior to represent ignorance. As Berger puts it,
+
+> We would argue that noninformative prior Bayesian analysis is the single most
+> powerful method of statistical analysis, in the sense of being the ad hoc
+> method most likely to yield a sensible answer for a given investment of effort.
+> And the answers so obtained have the added feature of being, in some sense, the
+> most “objective” statistical answers obtainable [[23](https://www.amazon.com/Statistical-Decision-Bayesian-Analysis-Statistics/dp/0387960988), p. 90]
+
+
+
+## References
+[1] Bernoulli, J. (1713). [On the Law of Large Numbers, Part Four of Ars Conjectandi.](http://www.sheynin.de/download/bernoulli.pdf) Translated by Oscar Sheynin.
+
+[2] De Moivre, A. (1756). [The Doctrine of Chances](https://www.ime.usp.br/~walterfm/cursos/mac5796/DoctrineOfChances.pdf).
+
+[3] Bayes, T. (1763). [An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s](https://web.archive.org/web/20110410085940/http://www.stat.ucla.edu/history/essay.pdf) . Philosophical Transactions of the Royal Society of London 53, 370–418.
+
+[4] Stigler, S. (1990). The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press.
+
+[5] Laplace, P. (1774). [Memoir on the probability of the causes of events](https://www.york.ac.uk/depts/maths/histstat/memoir1774.pdf)
+. Translated by S. M. Stigler.
+
+[6] De Morgan, A. (1838). [An Essay On Probabilities: And On Their Application To Life Contingencies And Insurance Offices](https://archive.org/details/134257988/page/n7/mode/2up).
+
+[7] Fisher, R. (1930). [Inverse probability](https://errorstatistics.com/wp-content/uploads/2016/02/fisher-1930-inverse-probability.pdf). Mathematical Proceedings of the Cambridge Philosophical Society 26(4), 528–535.
+
+[8] Fisher, R. (1922). [On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Fisher1922.pdf). Series A, Containing Papers of a Mathematical or Physical Character 222, 309–368.
+
+[9] Jeffreys, H. (1961). [Theory of Probability (3 ed.)](https://archive.org/details/in.ernet.dli.2015.2608). Oxford Classic Texts in the Physical Sciences.
+
+[10]: Welch, B. L. and H. W. Peers (1963). [On formulae for confidence points based on integrals of weighted likelihoods](https://academic.oup.com/jrsssb/article-abstract/25/2/318/7035245?redirectedFrom=PDF). Journal of the Royal Statistical Society Series B-methodological 25, 318–329.
+
+[11]: Berger, J. O., J. M. Bernardo, and D. Sun (2009). [The formal definition of reference priors](https://projecteuclid.org/journals/annals-of-statistics/volume-37/issue-2/The-formal-definition-of-reference-priors/10.1214/07-AOS587.full). The Annals of Statistics 37 (2), 905–938.
+
+[12]: Berger, J., J. Bernardo, and D. Sun (2024). [Objective Bayesian Inference](https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903). World Scientific.
+
+[13]: Berger, J. O. and D. A. Berry (1988). [Statistical analysis and the illusion of objectivity](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf). American Scientist 76(2), 159–165.
+
+[14]: Goodman, S. (1999, June). Toward evidence-based medical statistics. 1: [The p value fallacy](https://pubmed.ncbi.nlm.nih.gov/10383371/). Annals of Internal Medicine 130 (12), 995–1004.
+
+[15]: Berger, J. and T. Sellke (1987). [Testing a point null hypothesis: The irreconcilability of p values and evidence](https://www2.stat.duke.edu/courses/Spring07/sta215/lec/BvP/BergSell1987.pdf). Journal of the American Statistical Association 82(397), 112–22.
+
+[16]: Selke, T., M. J. Bayarri, and J. Berger (2001). [Calibration of p values for testing precise null hypotheses](https://hannig.cloudapps.unc.edu/STOR654/handouts/SellkeBayarriBerger2001.pdf). The American Statistician 855(1), 62–71.
+
+[17]: Berger, J., J. Bernardo, and D. Sun (2022). [Objective bayesian inference and its relationship to frequentism](https://www.uv.es/~bernardo/OBayes.pdf).
+
+[18]: Laplace, P. (1778). [Mémoire sur les probabilités](http://www.probabilityandfinance.com/pulskamp/Laplace/memoir_probabilities.pdf). Translated by Richard J. Pulskamp.
+
+[19]: Berger, J. and J. Mortera (1999). [Default bayes factors for nonnested hypothesis testing](https://www.jstor.org/stable/2670175). Journal of the American Statistical Association 94 (446), 542–554.
+
+[20]: Berger, J. (2006). [The case for objective Bayesian analysis](https://www2.stat.duke.edu/~berger/papers/obayes-debate.pdf). Bayesian Analysis 1(3), 385–402.
+
+[21]: Berger, J. O. and D. A. Berry (1988). [Statistical analysis and the illusion of objectivity](https://si.biostat.washington.edu/sites/default/files/modules/BergerBerry.pdf). American Scientist 76(2), 159–165.
+
+[22]: Jaynes, E. T. (1968). [Prior probabilities](https://bayes.wustl.edu/etj/articles/prior.pdf). Ieee Transactions on Systems and Cybernetics (3), 227–241.
+
+[23]: Berger, J. (1985). [Statistical Decision Theory and Bayesian Analysis](https://www.amazon.com/Statistical-Decision-Bayesian-Analysis-Statistics/dp/0387960988). Springer.