diff --git a/2_mathematical_spaces/spaces.qmd b/2_mathematical_spaces/spaces.qmd
index 95b5a05..d955dd3 100644
--- a/2_mathematical_spaces/spaces.qmd
+++ b/2_mathematical_spaces/spaces.qmd
@@ -36,7 +36,7 @@ sophisticated sets but also sets that are equipped with additional structure.
 These combinations of sets and structure are also known as _spaces_.
 
 In this chapter we will discuss the basic conceptual features of general spaces
-before reviewing some of prototypical spaces that are particularly common in
+before reviewing some of the prototypical spaces that are particularly common in
 practical applications.  This presentation will include not only the properties
 of sets with an arbitrary number of elements but also a survey of some of the
 most fundamental structures that we can endow onto those sets.
@@ -104,7 +104,7 @@ In many circumstances we will need to distinguish between variables that refer
 to _arbitrary_ elements and variables that refer to _particular_ but unspecified
 elements.  Following the computer science canon I will refer to these as
 **unbound variables** and **bound variables**, respectively.  To distinguish
-between the two I will decorate bound variables with a tilde; in words $x$
+between the two I will decorate bound variables with a tilde; in other words $x$
 denotes any element of the space $X$ while $\tilde{x}$ denotes a fixed but
 unspecified element.
 
@@ -154,7 +154,7 @@ single element, and a **full set** consisting of the entire set (@fig-subsets).
 Most subsets, however, contain an intermediate number of elements.  One of the
 key features of uncountable spaces is that most subsets also contain an
 uncountable number of elements.  To visually represent subsets containing an
-uncountable number of elements I will used filled shapes to contrast against
+uncountable number of elements I will use filled shapes to contrast against
 individual points.
 
 ::: {#fig-subsets layout="[ [-5, 45, 45, -5], [-5, 45, 45, -5]]"}
@@ -787,7 +787,7 @@ most subsets will be neither open nor closed.
 
 Unlike open balls these metric-derived open subsets are _closed_ under unions
 and intersections.  If $\mathsf{x}_{1}$ and $\mathsf{x}_{2}$ are both open
-subsets then $\mathsf{x}_{1} \cup \mathsf{x}_{2}$ will also an open subset.  In
+subsets then $\mathsf{x}_{1} \cup \mathsf{x}_{2}$ will also be an open subset.  In
 fact the union of _any_ number of open subsets will be open.  Likewise if
 $\mathsf{x}_{1}$ and $\mathsf{x}_{2}$ are both open subsets then
 $\mathsf{x}_{1} \cap \mathsf{x}_{2}$ will also be an open subset.  Indeed the
@@ -964,7 +964,7 @@ figures/structures/general_topology/convergence/convergence){
 width=90% #fig-general-convergence}
 
 A subtle benefit of this topological definition of convergence is that because
-it doesn't require a metric is also doesn't require us to define the positive
+it doesn't require a metric, it also doesn't require us to define the positive
 real numbers.  This can be helpful for avoiding circular logic in more technical
 mathematical analyses.
 
@@ -1021,10 +1021,10 @@ algebra, or metric just builds on top of that foundation.
 Mathematically it is much easier to work with structures that are _compatible_
 with each other.  For example if we want to equip a set with both a topology
 and a metric then the resulting space will be particularly well-behaved if
-the we use a metric topology.  At the same time ambient structure can also
+we use a metric topology.  At the same time ambient structure can also
 distinguish certain compatible subsets.
 
-For example is a set is equipped with an ordering then we can define **interval**
+For example if a set is equipped with an ordering then we can define **interval**
 subsets that contain all elements above and below two boundary elements.  An
 **open interval** excludes both boundary elements,
 $$
@@ -1045,7 +1045,7 @@ intervals that contain only one boundary,
 \end{align*}
 Note that these notions of "open" and "closed" subsets are in general distinct
 from the open and closed subsets defined by a topology.  Only when an ordering
-is compatible with a topology will these the open and closed intervals also be
+is compatible with a topology will these open and closed intervals also be
 topologically open and closed.
 
 As we saw in [Section 1.2.4.1](@sec:open-balls) a metric distinguishes subsets
@@ -1086,8 +1086,8 @@ $\mathsf{x}_{1} \subset X$ is smaller than a subset $\mathsf{x}_{2} \subset X$
 if $\mathsf{x}_{1} \subset \mathsf{x}_{2}$, and larger if
 $\mathsf{x}_{2} \subset \mathsf{x}_{1}$.  Two subsets that only partially
 overlap are incomparable, and hence fall into the same place in the sequential
-ordering.  For any set the empty set will always the smallest subset and the
-full set will always the largest.
+ordering.  For any set the empty set will always be the smallest subset and the
+full set will always be the largest.
 
 The union and intersection operations introduce algebraic structure, known as a
 **Boolean algebraic structure**, to the power set.  They are both commutative,
@@ -1311,6 +1311,10 @@ figures/real_line_grid/real_line_grid){width=50% #fig-real-line-grid}
 
 ## Extended Real Lines
 
+<!---
+Two weird formulations in the next 2 sentences.
+-->
+
 One limitation of a real line is that it does not contains points that
 _approach_ either negative or positive infinity, but not points that represent
 those limits directly.  An **extended real line** resolves introduces two new
@@ -1567,7 +1571,7 @@ Using this notation we can define an inverse function a bit more compactly as
 $$
 \text{Id} = f^{-1} \circ f.
 $$
-In words the composition of a bijective function with its inverse function is
+In other words the composition of a bijective function with its inverse function is
 the identify function.
 
 ## Relating Structures
@@ -1852,7 +1856,7 @@ Pushforward and pullback functions allow us to _lift_ a transformation between
 sets into a transformation between spaces.  For structure that can be pushed
 forward along the function $f : X \rightarrow Y$ any input space
 $(X, \mathfrak{x})$ automatically defines a compatible output space
-$(Y, f_{*}(\mathfrak{x}))$.  Similarly for structure that can be pulled back
+$(Y, f_{*}(\mathfrak{x}))$.  Similarly for a structure that can be pulled back
 against $f$ any output space $(Y, \mathfrak{y})$ automatically defines a
 compatible input space $(X, f^{*}(\mathfrak{y}))$.
 
@@ -1923,7 +1927,7 @@ latter not (@fig-monoticity).
 
 Monotonically increasing functions preserve orderings so that larger inputs
 always imply larger outputs. The function (a) $f_{1} : x \mapsto x^{3}$ is
-monotonic but the function (b) $f_{1} : x \mapsto -x^{3}$ is not.
+monotonic but the function (b) $f : x \mapsto -x^{2}$ is not.
 :::
 
 #### Algebra-Preserving Relations
diff --git a/3_product_spaces/product_spaces.qmd b/3_product_spaces/product_spaces.qmd
index f30f8f9..91d68af 100644
--- a/3_product_spaces/product_spaces.qmd
+++ b/3_product_spaces/product_spaces.qmd
@@ -181,7 +181,7 @@ $X_{2}$.
 
 Each element of the product set is uniquely specified by one element of $X_{1}$
 and one element of $X_{2}$.  Consequently every variable taking values in the
-product set $x \in X_{1} \times X_{2}$ is compromised of an ordered pair of
+product set $x \in X_{1} \times X_{2}$ is comprised of an ordered pair of
 variables from each component space,
 $$
 x = (x_{1}, x_{2}),
@@ -282,7 +282,7 @@ X_{1} \times \ldots \times X_{i} \times \ldots \times X_{I}
 =
 \times_{i = 1}^{I} X_{i}
 $$
-where every product variable $x \in \times_{i = 1}^{I} X_{i}$ is compromised of
+where every product variable $x \in \times_{i = 1}^{I} X_{i}$ is comprised of
 a ordered collection of component variables
 $$
 x = ( x_{1}, \ldots, x_{i}, \ldots, x_{I})
@@ -609,7 +609,7 @@ component full sets we will always be able to construct the product empty set
 and product full set from this procedure.
 
 Moreover because the component open subsets are finite intersections these
-productsubsets will be as well.  For example given any finite collection of open
+product subsets will be as well.  For example given any finite collection of open
 component subsets
 $$
 \{ \mathsf{x}_{1, i}, \ldots, \mathsf{x}_{j, i}, \ldots \mathsf{x}_{J, i} \}
@@ -870,7 +870,7 @@ $$
 ( 1, \ldots, i_{1} - 1, i_{1} + 1, \ldots, i_{J} - 1, i_{J} + 1, \ldots, J )
 $$
 define yet another product set.  Replicating this second product set once for
-each element of first product set defines a collection of **cross sections sets**
+each element of the first product set defines a collection of **cross sections sets**
 (@fig-conditioning),
 \begin{align*}
 \times_{i' = 1}^{I} X_{i'} \mid (x_{i_{1}}, \ldots, x_{i_{J}})
diff --git a/4_probability_on_general_spaces/probability_on_general_spaces.qmd b/4_probability_on_general_spaces/probability_on_general_spaces.qmd
index 47f893e..ba997b9 100644
--- a/4_probability_on_general_spaces/probability_on_general_spaces.qmd
+++ b/4_probability_on_general_spaces/probability_on_general_spaces.qmd
@@ -205,7 +205,7 @@ sufficiently useful for practical application or if we need to consider
 countably additive measures, let alone measures that might be additive
 over even larger collections of subsets.
 
-For example a common problem that arises is practice is reconstructing
+For example a common problem that arises in practice is reconstructing
 the measure allocated to a general subset from the measures allocated to
 particularly nice subsets that are easier with which to work.  If we
 could always decompose a generic subset into the disjoint union of a
@@ -217,7 +217,7 @@ Potentially some subsets might be decomposable only into an uncountably
 infinite number of subsets in which case we would need even stronger
 notions of additivity!
 
-Fortunately for us we don't have to go to that last extreme.  In turns
+Fortunately for us we don't have to go to that last extreme.  It turns
 out that on most spaces that we'll encounter in practice, and typical
 notions of "nice" subsets, countable additivity is sufficient for
 reconstructing the measure allocated to more general subsets.
@@ -228,9 +228,9 @@ allocations to **rectangular** subsets (@fig-disk-decomposition).  In
 general a non-rectangular subset, in this case a disk, can be crudely
 approximated by a single rectangular subset.  The disk can be
 approximated more precisely as the disjoint union of many different
-rectangular subsets, but that will never exact reconstruct the disk.
+rectangular subsets, but that will never exactly reconstruct the disk.
 Only when we incorporate a countably infinite number of rectangular
-subsets can be reconstruct the disk without any error.
+subsets can we reconstruct the disk without any error.
 
 ![On a two-dimensional real plane $\mathbb{R}^{2}$ a non-rectangular
 disc can be approximated, but not exactly reconstructed, by the finite
@@ -339,7 +339,7 @@ Similarly the elements of a $\sigma$-algebra are known as
 **measurable subsets** while any subsets in the power set but not in the
 $\sigma$-algebra are referred to as **non-measurable** subsets.
 
-When non-measurable subsets are misbehaving subsets they reveals the
+When non-measurable subsets are misbehaving subsets they reveal the
 subtle, and often counterintuitive, pathologies inherent to that space.
 By working with $\sigma$-algebras directly we can avoid these awkward
 pathologies entirely.
@@ -415,7 +415,7 @@ behaviors that we have to avoid at all!  I will refer to any measurable
 space $(X, 2^{X})$ compatible with a discrete topology as
 **discrete measurable spaces**.
 
-On the the other hand the Borel $\sigma$-algebra derived from the
+On the other hand the Borel $\sigma$-algebra derived from the
 topology that defines the real line filters out all of the
 non-constructive subsets and their undesired behaviors while keeping all
 of the interval subsets and the subsets that we can derive from them.
@@ -973,7 +973,7 @@ figures/interval_partitions/interval_partitions){
 width=90% #fig-equal-length-intervals}
 
 The easiest way to accomplish this uniformity is to allocate to each
-interval a measure directly equal to the its length,
+interval a measure directly equal to its length,
 $$
 \lambda( \, [x_{1}, x_{2}] \, )
 = L( \, [x_{1}, x_{2}] \, )
diff --git a/5_expectation_values/expectation_values.qmd b/5_expectation_values/expectation_values.qmd
index de1d65f..3375244 100644
--- a/5_expectation_values/expectation_values.qmd
+++ b/5_expectation_values/expectation_values.qmd
@@ -42,7 +42,7 @@ from calculus.  This measure-informed integration operation summarizes
 the interaction between a measure and a given function, allowing us to
 use one to learn about the other.
 
-We will being our exploration of measure-informed integration with a
+We will begin our exploration of measure-informed integration with a
 heuristic construction on finite measure spaces before considering a
 more formal, but also more abstract, construction that applies to any
 measure space.  Next we'll investigate how the specification of
@@ -56,7 +56,7 @@ exceptional measures whose integrals can be computed algorithmically.
 # Integration on Finite Measure Spaces {#sec:finite_integration}
 
 To start our discussion of measure-informed integration as simply as
-possible let's begin by considering a finite measure space compromised
+possible let's begin by considering a finite measure space comprised
 of the finite set
 $$
 X = \{ \Box, \clubsuit, \diamondsuit, \heartsuit, \spadesuit \},
@@ -409,9 +409,9 @@ corresponding simple function decomposition,
 In general a non-negative, measurable function can be represented by
 more than one simple function decomposition.  Fortunately the
 measure-informed integral derived from any of them will always be the
-same.  Consequently there's no worry for ambiguous of otherwise
+same.  Consequently there's no worry for ambiguous or otherwise
 inconsistent answers, and measure-informed integrals for non-negative,
-measurable function are completely well-behaved.
+measurable functions are completely well-behaved.
 
 This procedure for defining measure-informed integrals through simple
 functions representations is known as **Lebesgue integration** in the
@@ -722,7 +722,7 @@ this form.
 Because $L(X, \mathcal{X}, \mu)$ contains all of the indicator functions
 this functional relationship between $L(X, \mathcal{X}, \mu)$ and
 $\mathbb{R}$ determines the allocations to every measurable subset,
-and hence full determines the measure $\mu$.  At the same time
+and hence fully determines the measure $\mu$.  At the same time
 $L(X, \mathcal{X}, \mu)$ also contains many integrands that are not
 indicator functions, and hence quite a bit of redundant information
 about $\mu$.
@@ -875,7 +875,7 @@ which is not, in general, equal to $1$.  In other words scaling a
 probability distribution results not in another probability distribution
 but rather a generic measure.
 
-If we want transform one probability distribution into another then we
+If we want to transform one probability distribution into another then we
 need to correct for the modified normalization, defining
 \begin{align*}
 \mathbb{E}_{g \ast \pi} [ f ]
@@ -1039,8 +1039,8 @@ important in practice.
 
 ### The Mean
 
-If an embedding function is an integrand than we can evaluate its
-measure-informed integral, $\mathbb{I}_{\mu}[\iota]$.  The ultimately
+If an embedding function is an integrand then we can evaluate its
+measure-informed integral, $\mathbb{I}_{\mu}[\iota]$.  The ultimate
 utility of this measure-informed integral, however, depends on what
 information about the ambient measure it extracts.
 
@@ -1272,7 +1272,7 @@ working with spaces like circles, spheres, torii, and more.  Many
 analyses on these spaces have been undermined by attempts to summarize
 measures with moments that don't actually exist!
 
-All of this said we still to take care with the necessary conditions
+All of this said we still have to take care with the necessary conditions
 when working with more familiar spaces as well.  For example in
 [Section 5.2.2](@sec:practical_lebesgue) we'll learn that the identify
 function from a real line into itself is not integrable with respect to
@@ -1354,7 +1354,7 @@ skewed towards smaller or larger values.
 
 ![](figures/histograms/varying_behaviors/multimodal/multimodal){#fig-hist-multimodal}
 
-Histogram are extremely effective at communicating the basic features of
+Histograms are extremely effective at communicating the basic features of
 a measure.  The measure in (a) is diffuse but decaying, allocating more
 measure at smaller points than larger points.  Conversely the measure in
 (b) concentrates around a single point while the measure in (c)
@@ -1409,7 +1409,7 @@ M :\; & X & &\rightarrow& \; &[0, \mu(X)]&
 \\
 & x & &\mapsto& & M_{\mu}(x) = \mu(\mathsf{I}_{x}) = \mathbb{I}_{\mu}[I_{\mathsf{I}_{x}}] &.
 \end{alignat*}
-According this mapping is known as a
+This mapping is known as a
 **cumulative distribution function** (@fig-cdf-basics).
 
 ![A cumulative distribution function quantifies how measure is allocated
@@ -1451,7 +1451,7 @@ are any gaps in the allocation, intermediate intervals with zero
 allocated measure, then the cumulative distribution function will
 flatten out completely (@fig-cdf-gap).
 
-::: {#fig-hist-examples layout="[-5, 30, 30, 30, -5]"}
+::: {#fig-cdf-examples layout="[-5, 30, 30, 30, -5]"}
 ![](figures/cdfs/cdf_behaviors/unimodal/unimodal){#fig-cdf-unimodal}
 
 ![](figures/cdfs/cdf_behaviors/narrow_unimodal/narrow_unimodal){#fig-cdf-narrow-unimodal}
@@ -1461,7 +1461,7 @@ flatten out completely (@fig-cdf-gap).
 A careful survey of a cumulative distribution function can communicate a
 wealth of information about the ambient measure.  (a) Here the ambient
 measure is unimodal with the cumulative distribution function
-appreciably increasingly only one we reach the central neighborhood
+appreciably increasing only once we reach the central neighborhood
 where the measure allocation is concentrated.  (b)  A narrower
 concentration results in a steeper cumulative distribution function.
 (c)  A cumulative distribution function flattens if there are any gaps
@@ -1607,7 +1607,7 @@ accumulated measure below $m$,
 $$
 x_{m-} = \underset{x \in X}{\mathrm{argmax}} M(x) < m,
 $$
-and bounded above by the point $x_{+}$ that achieves the smallest
+and bounded above by the point $x_{m+}$ that achieves the smallest
 accumulated measure above $m$ (@fig-quantile-inverse-problems),
 $$
 x_{m+} = \underset{x \in X}{\mathrm{argmin}} M(x) > m.
@@ -1715,7 +1715,7 @@ $$
 $$
 
 The integral of any real-valued function $f: X \rightarrow \mathbb{R}$
-with respect to counting measure is given by over summing all of the
+with respect to counting measure is given by summing over all of the
 output values,
 \begin{align*}
 \mathbb{I}_{\chi}[f]
@@ -1934,7 +1934,7 @@ When a real-valued function has a well-defined Riemann integral then we
 can apply the tools of calculus to evaluate Lebesgue integrals.  The
 exceptional Riemann integrals that can be evaluated analytically allow
 us to compute the corresponding Lebesgue integrals exactly.  More
-generally we can use to numerical integration techniques to approximate
+generally we can use numerical integration techniques to approximate
 the Riemann integrals, and hence approximately evaluate Lebesgue
 integrals.
 
@@ -1959,7 +1959,7 @@ the sign of the Riemann integral.  In order to properly relate Lebesgue
 integrals to Riemann integrals we have to fix the _orientation_ of the
 intervals.
 
-Similarly the mean of a Lebesgue measure would by given by the integral
+Similarly the mean of a Lebesgue measure would be given by the integral
 of the identity function,
 \begin{align*}
 \mathbb{I}_{\lambda}[\iota]
diff --git a/6_density_functions/density_functions.qmd b/6_density_functions/density_functions.qmd
index 25cda19..660fa2a 100644
--- a/6_density_functions/density_functions.qmd
+++ b/6_density_functions/density_functions.qmd
@@ -322,7 +322,7 @@ in a crude measurable partition might also be infinite.  If we break up
 those subsets into finer and finer pieces, however, then the infinite
 allocations might spread out into finite allocations.  When we can
 construct a fine enough measurable partition such that _all_ of the
-subset allocations are finite we will always able to avoid infinity
+subset allocations are finite we will always be able to avoid infinity
 entirely by working with small enough subsets.  Moreover if that fine
 enough measurable partition is also countable then we will always be
 able to aggregate those smaller-subset allocations into any general
@@ -404,7 +404,7 @@ behaviors.
 
 For example on a discrete space every measure with non-zero atomic
 allocations is absolutely continuous with respect to every other
-measure with non-zero atomic allocations.  Moreover Lebesgue meaures
+measure with non-zero atomic allocations.  Moreover Lebesgue measures
 defined with respect to different metrics are always absolutely
 continuous with each other.
 
@@ -683,7 +683,7 @@ Admittedly I'm being a bit mathematically sloppy here because
 Radon-Nikodym derivatives are defined only up to $\nu$-null subsets;
 technically this mapping doesn't yield a single function but rather
 a collection of all functions that are equal $\nu$-almost everywhere.
-In order to achieve a unique outptu function we need to introduce
+In order to achieve a unique output function we need to introduce
 additional constraints, such as continuity or even smoothness.  This
 sloppy notation, however, does allow us to investigate many of the
 useful properties of the operation.
@@ -861,7 +861,7 @@ their limitations.  In particular probability density functions are
 defined only relative to the given reference measure.  If the reference
 measure is at all ambiguous then a density function will not completely
 determine a probability distribution!  At the same time if the reference
-measure every changes then probability density functions will also have
+measure ever changes then the probability density functions will also have
 to change if we want them to represent the same probability
 distributions.
 
@@ -951,7 +951,7 @@ expectation value to an integral informed by the counting measure gives
 &=
 \mathbb{I}_{\chi} [ \pi \cdot f ],
 \end{align*}
-where $\pi$ in the last term denotes a function maps each element of
+where $\pi$ in the last term denotes a function that maps each element of
 $X$ to its atomic allocation,
 \begin{alignat*}{6}
 \pi :\; & X & &\rightarrow& \; & [0, \infty] &
@@ -1097,7 +1097,7 @@ $$
 On the other hand we can evaluate the cumulative distribution
 function at each boundary and subtract,
 $$
-\mathrm{Poisson}( \, (n_{1}, n_{2}] \, ; \lambda) f
+\mathrm{Poisson}( \, (n_{1}, n_{2}] \, ; \lambda)
 =
 \Pi_{\mathrm{Poisson}}(n_{2}) - \Pi_{\mathrm{Poisson}}(n_{1}).
 $$
@@ -1141,7 +1141,7 @@ accommodated but the atomic subsets are the most practically relevant.
 
 Given a particular $D$-dimensional real space, that is a particular
 rigid real space or particular parameterization of a flexible real
-space, and a compatible a compatible probability distribution $\pi$ we
+space, and a compatible probability distribution $\pi$ we
 can define a Lebesgue probability density function
 $$
 \frac{ \mathrm{d} \pi \hphantom{ {}^{D} } }{ \mathrm{d} \lambda^{D} } :
@@ -1359,7 +1359,7 @@ $$
 
 Integrating a probability density function by eye is not always
 straightforward.  In particular bounds between probability densities do
-not always imply bounds between interval probabilities.  (a)  Here
+not always imply bounds between interval probabilities.  (a)  Here the
 largest probability density in the first interval, $p_{1}$, is smaller
 than the smallest probability density in the second interval, $p_{2}$.
 Because the two intervals are the same length the probability allocated
@@ -1685,7 +1685,7 @@ $$
 $$
 that define expectation values are particularly nice, at least as far as
 integrals go.  That isn't to say that the integrals are easy to evaluate
-but rather that they many of them actually admit closed-form solutions,
+but rather that many of them actually admit closed-form solutions,
 which is pretty miraculous when it comes to integrals.  For those
 twisted individuals who fancy a good integral calculation, myself
 included, I've included those calculations in the
@@ -1787,7 +1787,7 @@ probability distributions,
 \end{align*}
 where
 $$
-\mathrm{erf} (x)
+\mathrm{erf} (x) =
 \frac{2}{\sqrt{\pi}}
 \int_{0}^{ x } \mathrm{d} t \, \exp \left( -t^{2} \right)
 $$
@@ -1795,7 +1795,7 @@ is known as the **error function**.
 
 Conveniently the error function, if not the normal cumulative
 distribution functions themselves, are available in most programming
-languages.  This allows us directly compute interval probabilities by
+languages.  This allows us to directly compute interval probabilities by
 subtracting cumulative probabilities (@fig-normal-interval-prob),
 $$
 \text{normal}( \, (x_{1}, x_{2} ] \, ; \mu, \sigma )
@@ -1816,7 +1816,7 @@ width=90% #fig-normal-interval-prob}
 
 # Other Useful Probability Density Functions
 
-Most of applications of probability theory that we will tackle in this
+Most applications of probability theory that we will tackle in this
 book will use probability distributions that are absolutely continuous
 with respect to a counting measure or a Lebesgue measure and implemented
 with appropriate probability density functions.  There are a few
@@ -1878,7 +1878,7 @@ As they become narrower and narrower normal probability density
 functions start to behave like a hypothetical singular density function.
 (a) In the limit $\sigma \rightarrow 0$ the normal probability density
 functions centered at $\mu = x'$ converge to an infinitely narrow spike
-at $x'$.  (b) At the same the expectation values of all expectands $f$
+at $x'$.  (b) At the same time the expectation values of all expectands $f$
 coverge to the point evaluations $f(x')$.
 :::
 
@@ -2041,12 +2041,12 @@ is often worth the added subtlety.
 Real spaces adequately model many phenomena that arise in practical
 applications, but by no means all of them.  In some cases we will need
 to consider continuous spaces that look like a real spaces _locally_ but
-exhibit different shapes _globally_ (@fig-circle).  These include for
+exhibit different shapes _globally_ (@fig-circle-line).  These include for
 example spheres, torii, and even more foreign spaces.  Mathematically
 these spaces, along with real spaces, are collectively known as
 **manifolds**.
 
-::: {#fig-circle layout="[ [-20, 60, -20], [-20, 60, -20] ]"}
+::: {#fig-circle-line layout="[ [-20, 60, -20], [-20, 60, -20] ]"}
 ![](figures/circle_vs_line/far/far){#fig-circle-far}
 
 ![](figures/circle_vs_line/close/close){#fig-circle-close}
@@ -2112,6 +2112,10 @@ are not implemented with classic Riemann integration but rather a more
 general **manifold integration** that is not implemented in the same
 way.
 
+<!---
+The setminus Latex instruction is not rendered on pdf for the following paragraphs
+-->
+
 That said sometimes there are work arounds.  For example removing a
 point $x' \in \mathbb{S}^{1}$ from the circle defines a new space
 $\mathbb{S}^{1} \setminus x'$.  Circular probability distributions,