betanalpha · Leob000 · Jul 24, 2024 · Jul 24, 2024 · Jul 24, 2024 · Jul 24, 2024
diff --git a/2_mathematical_spaces/spaces.qmd b/2_mathematical_spaces/spaces.qmd
@@ -36,7 +36,7 @@ sophisticated sets but also sets that are equipped with additional structure.
 These combinations of sets and structure are also known as _spaces_.
 
 In this chapter we will discuss the basic conceptual features of general spaces
-before reviewing some of prototypical spaces that are particularly common in
+before reviewing some of the prototypical spaces that are particularly common in
 practical applications.  This presentation will include not only the properties
 of sets with an arbitrary number of elements but also a survey of some of the
 most fundamental structures that we can endow onto those sets.
@@ -104,7 +104,7 @@ In many circumstances we will need to distinguish between variables that refer
 to _arbitrary_ elements and variables that refer to _particular_ but unspecified
 elements.  Following the computer science canon I will refer to these as
 **unbound variables** and **bound variables**, respectively.  To distinguish
-between the two I will decorate bound variables with a tilde; in words $x$
+between the two I will decorate bound variables with a tilde; in other words $x$
 denotes any element of the space $X$ while $\tilde{x}$ denotes a fixed but
 unspecified element.
 
@@ -154,7 +154,7 @@ single element, and a **full set** consisting of the entire set (@fig-subsets).
 Most subsets, however, contain an intermediate number of elements.  One of the
 key features of uncountable spaces is that most subsets also contain an
 uncountable number of elements.  To visually represent subsets containing an
-uncountable number of elements I will used filled shapes to contrast against
+uncountable number of elements I will use filled shapes to contrast against
 individual points.
 
 ::: {#fig-subsets layout="[ [-5, 45, 45, -5], [-5, 45, 45, -5]]"}
@@ -787,7 +787,7 @@ most subsets will be neither open nor closed.
 
 Unlike open balls these metric-derived open subsets are _closed_ under unions
 and intersections.  If $\mathsf{x}_{1}$ and $\mathsf{x}_{2}$ are both open
-subsets then $\mathsf{x}_{1} \cup \mathsf{x}_{2}$ will also an open subset.  In
+subsets then $\mathsf{x}_{1} \cup \mathsf{x}_{2}$ will also be an open subset.  In
 fact the union of _any_ number of open subsets will be open.  Likewise if
 $\mathsf{x}_{1}$ and $\mathsf{x}_{2}$ are both open subsets then
 $\mathsf{x}_{1} \cap \mathsf{x}_{2}$ will also be an open subset.  Indeed the
@@ -964,7 +964,7 @@ figures/structures/general_topology/convergence/convergence){
 width=90% #fig-general-convergence}
 
 A subtle benefit of this topological definition of convergence is that because
-it doesn't require a metric is also doesn't require us to define the positive
+it doesn't require a metric, it also doesn't require us to define the positive
 real numbers.  This can be helpful for avoiding circular logic in more technical
 mathematical analyses.
 
@@ -1021,10 +1021,10 @@ algebra, or metric just builds on top of that foundation.
 Mathematically it is much easier to work with structures that are _compatible_
 with each other.  For example if we want to equip a set with both a topology
 and a metric then the resulting space will be particularly well-behaved if
-the we use a metric topology.  At the same time ambient structure can also
+we use a metric topology.  At the same time ambient structure can also
 distinguish certain compatible subsets.
 
-For example is a set is equipped with an ordering then we can define **interval**
+For example if a set is equipped with an ordering then we can define **interval**
 subsets that contain all elements above and below two boundary elements.  An
 **open interval** excludes both boundary elements,
 $$
@@ -1045,7 +1045,7 @@ intervals that contain only one boundary,
 \end{align*}
 Note that these notions of "open" and "closed" subsets are in general distinct
 from the open and closed subsets defined by a topology.  Only when an ordering
-is compatible with a topology will these the open and closed intervals also be
+is compatible with a topology will these open and closed intervals also be
 topologically open and closed.
 
 As we saw in [Section 1.2.4.1](@sec:open-balls) a metric distinguishes subsets
@@ -1086,8 +1086,8 @@ $\mathsf{x}_{1} \subset X$ is smaller than a subset $\mathsf{x}_{2} \subset X$
 if $\mathsf{x}_{1} \subset \mathsf{x}_{2}$, and larger if
 $\mathsf{x}_{2} \subset \mathsf{x}_{1}$.  Two subsets that only partially
 overlap are incomparable, and hence fall into the same place in the sequential
-ordering.  For any set the empty set will always the smallest subset and the
-full set will always the largest.
+ordering.  For any set the empty set will always be the smallest subset and the
+full set will always be the largest.
 
 The union and intersection operations introduce algebraic structure, known as a
 **Boolean algebraic structure**, to the power set.  They are both commutative,
@@ -1311,6 +1311,10 @@ figures/real_line_grid/real_line_grid){width=50% #fig-real-line-grid}
 
 ## Extended Real Lines
 
+<!---
+Two weird formulations in the next 2 sentences.
+-->
+
 One limitation of a real line is that it does not contains points that
 _approach_ either negative or positive infinity, but not points that represent
 those limits directly.  An **extended real line** resolves introduces two new
@@ -1567,7 +1571,7 @@ Using this notation we can define an inverse function a bit more compactly as
 $$
 \text{Id} = f^{-1} \circ f.
 $$
-In words the composition of a bijective function with its inverse function is
+In other words the composition of a bijective function with its inverse function is
 the identify function.
 
 ## Relating Structures
@@ -1852,7 +1856,7 @@ Pushforward and pullback functions allow us to _lift_ a transformation between
 sets into a transformation between spaces.  For structure that can be pushed
 forward along the function $f : X \rightarrow Y$ any input space
 $(X, \mathfrak{x})$ automatically defines a compatible output space
-$(Y, f_{*}(\mathfrak{x}))$.  Similarly for structure that can be pulled back
+$(Y, f_{*}(\mathfrak{x}))$.  Similarly for a structure that can be pulled back
 against $f$ any output space $(Y, \mathfrak{y})$ automatically defines a
 compatible input space $(X, f^{*}(\mathfrak{y}))$.
 
@@ -1923,7 +1927,7 @@ latter not (@fig-monoticity).
 
 Monotonically increasing functions preserve orderings so that larger inputs
 always imply larger outputs. The function (a) $f_{1} : x \mapsto x^{3}$ is
-monotonic but the function (b) $f_{1} : x \mapsto -x^{3}$ is not.
+monotonic but the function (b) $f : x \mapsto -x^{2}$ is not.
 :::
 
 #### Algebra-Preserving Relations

diff --git a/3_product_spaces/product_spaces.qmd b/3_product_spaces/product_spaces.qmd
@@ -181,7 +181,7 @@ $X_{2}$.
 
 Each element of the product set is uniquely specified by one element of $X_{1}$
 and one element of $X_{2}$.  Consequently every variable taking values in the
-product set $x \in X_{1} \times X_{2}$ is compromised of an ordered pair of
+product set $x \in X_{1} \times X_{2}$ is comprised of an ordered pair of
 variables from each component space,
 $$
 x = (x_{1}, x_{2}),
@@ -282,7 +282,7 @@ X_{1} \times \ldots \times X_{i} \times \ldots \times X_{I}
 =
 \times_{i = 1}^{I} X_{i}
 $$
-where every product variable $x \in \times_{i = 1}^{I} X_{i}$ is compromised of
+where every product variable $x \in \times_{i = 1}^{I} X_{i}$ is comprised of
 a ordered collection of component variables
 $$
 x = ( x_{1}, \ldots, x_{i}, \ldots, x_{I})
@@ -609,7 +609,7 @@ component full sets we will always be able to construct the product empty set
 and product full set from this procedure.
 
 Moreover because the component open subsets are finite intersections these
-productsubsets will be as well.  For example given any finite collection of open
+product subsets will be as well.  For example given any finite collection of open
 component subsets
 $$
 \{ \mathsf{x}_{1, i}, \ldots, \mathsf{x}_{j, i}, \ldots \mathsf{x}_{J, i} \}
@@ -870,7 +870,7 @@ $$
 ( 1, \ldots, i_{1} - 1, i_{1} + 1, \ldots, i_{J} - 1, i_{J} + 1, \ldots, J )
 $$
 define yet another product set.  Replicating this second product set once for
-each element of first product set defines a collection of **cross sections sets**
+each element of the first product set defines a collection of **cross sections sets**
 (@fig-conditioning),
 \begin{align*}
 \times_{i' = 1}^{I} X_{i'} \mid (x_{i_{1}}, \ldots, x_{i_{J}})

diff --git a/4_probability_on_general_spaces/probability_on_general_spaces.qmd b/4_probability_on_general_spaces/probability_on_general_spaces.qmd
@@ -205,7 +205,7 @@ sufficiently useful for practical application or if we need to consider
 countably additive measures, let alone measures that might be additive
 over even larger collections of subsets.
 
-For example a common problem that arises is practice is reconstructing
+For example a common problem that arises in practice is reconstructing
 the measure allocated to a general subset from the measures allocated to
 particularly nice subsets that are easier with which to work.  If we
 could always decompose a generic subset into the disjoint union of a
@@ -217,7 +217,7 @@ Potentially some subsets might be decomposable only into an uncountably
 infinite number of subsets in which case we would need even stronger
 notions of additivity!
 
-Fortunately for us we don't have to go to that last extreme.  In turns
+Fortunately for us we don't have to go to that last extreme.  It turns
 out that on most spaces that we'll encounter in practice, and typical
 notions of "nice" subsets, countable additivity is sufficient for
 reconstructing the measure allocated to more general subsets.
@@ -228,9 +228,9 @@ allocations to **rectangular** subsets (@fig-disk-decomposition).  In
 general a non-rectangular subset, in this case a disk, can be crudely
 approximated by a single rectangular subset.  The disk can be
 approximated more precisely as the disjoint union of many different
-rectangular subsets, but that will never exact reconstruct the disk.
+rectangular subsets, but that will never exactly reconstruct the disk.
 Only when we incorporate a countably infinite number of rectangular
-subsets can be reconstruct the disk without any error.
+subsets can we reconstruct the disk without any error.
 
 ![On a two-dimensional real plane $\mathbb{R}^{2}$ a non-rectangular
 disc can be approximated, but not exactly reconstructed, by the finite
@@ -339,7 +339,7 @@ Similarly the elements of a $\sigma$-algebra are known as
 **measurable subsets** while any subsets in the power set but not in the
 $\sigma$-algebra are referred to as **non-measurable** subsets.
 
-When non-measurable subsets are misbehaving subsets they reveals the
+When non-measurable subsets are misbehaving subsets they reveal the
 subtle, and often counterintuitive, pathologies inherent to that space.
 By working with $\sigma$-algebras directly we can avoid these awkward
 pathologies entirely.
@@ -415,7 +415,7 @@ behaviors that we have to avoid at all!  I will refer to any measurable
 space $(X, 2^{X})$ compatible with a discrete topology as
 **discrete measurable spaces**.
 
-On the the other hand the Borel $\sigma$-algebra derived from the
+On the other hand the Borel $\sigma$-algebra derived from the
 topology that defines the real line filters out all of the
 non-constructive subsets and their undesired behaviors while keeping all
 of the interval subsets and the subsets that we can derive from them.
@@ -973,7 +973,7 @@ figures/interval_partitions/interval_partitions){
 width=90% #fig-equal-length-intervals}
 
 The easiest way to accomplish this uniformity is to allocate to each
-interval a measure directly equal to the its length,
+interval a measure directly equal to its length,
 $$
 \lambda( \, [x_{1}, x_{2}] \, )
 = L( \, [x_{1}, x_{2}] \, )

diff --git a/5_expectation_values/expectation_values.qmd b/5_expectation_values/expectation_values.qmd
@@ -42,7 +42,7 @@ from calculus.  This measure-informed integration operation summarizes
 the interaction between a measure and a given function, allowing us to
 use one to learn about the other.
 
-We will being our exploration of measure-informed integration with a
+We will begin our exploration of measure-informed integration with a
 heuristic construction on finite measure spaces before considering a
 more formal, but also more abstract, construction that applies to any
 measure space.  Next we'll investigate how the specification of
@@ -56,7 +56,7 @@ exceptional measures whose integrals can be computed algorithmically.
 # Integration on Finite Measure Spaces {#sec:finite_integration}
 
 To start our discussion of measure-informed integration as simply as
-possible let's begin by considering a finite measure space compromised
+possible let's begin by considering a finite measure space comprised
 of the finite set
 $$
 X = \{ \Box, \clubsuit, \diamondsuit, \heartsuit, \spadesuit \},
@@ -409,9 +409,9 @@ corresponding simple function decomposition,
 In general a non-negative, measurable function can be represented by
 more than one simple function decomposition.  Fortunately the
 measure-informed integral derived from any of them will always be the
-same.  Consequently there's no worry for ambiguous of otherwise
+same.  Consequently there's no worry for ambiguous or otherwise
 inconsistent answers, and measure-informed integrals for non-negative,
-measurable function are completely well-behaved.
+measurable functions are completely well-behaved.
 
 This procedure for defining measure-informed integrals through simple
 functions representations is known as **Lebesgue integration** in the
@@ -722,7 +722,7 @@ this form.
 Because $L(X, \mathcal{X}, \mu)$ contains all of the indicator functions
 this functional relationship between $L(X, \mathcal{X}, \mu)$ and
 $\mathbb{R}$ determines the allocations to every measurable subset,
-and hence full determines the measure $\mu$.  At the same time
+and hence fully determines the measure $\mu$.  At the same time
 $L(X, \mathcal{X}, \mu)$ also contains many integrands that are not
 indicator functions, and hence quite a bit of redundant information
 about $\mu$.
@@ -875,7 +875,7 @@ which is not, in general, equal to $1$.  In other words scaling a
 probability distribution results not in another probability distribution
 but rather a generic measure.
 
-If we want transform one probability distribution into another then we
+If we want to transform one probability distribution into another then we
 need to correct for the modified normalization, defining
 \begin{align*}
 \mathbb{E}_{g \ast \pi} [ f ]
@@ -1039,8 +1039,8 @@ important in practice.
 
 ### The Mean
 
-If an embedding function is an integrand than we can evaluate its
-measure-informed integral, $\mathbb{I}_{\mu}[\iota]$.  The ultimately
+If an embedding function is an integrand then we can evaluate its
+measure-informed integral, $\mathbb{I}_{\mu}[\iota]$.  The ultimate
 utility of this measure-informed integral, however, depends on what
 information about the ambient measure it extracts.
 
@@ -1272,7 +1272,7 @@ working with spaces like circles, spheres, torii, and more.  Many
 analyses on these spaces have been undermined by attempts to summarize
 measures with moments that don't actually exist!
 
-All of this said we still to take care with the necessary conditions
+All of this said we still have to take care with the necessary conditions
 when working with more familiar spaces as well.  For example in
 [Section 5.2.2](@sec:practical_lebesgue) we'll learn that the identify
 function from a real line into itself is not integrable with respect to
@@ -1354,7 +1354,7 @@ skewed towards smaller or larger values.
 
 ![](figures/histograms/varying_behaviors/multimodal/multimodal){#fig-hist-multimodal}
 
-Histogram are extremely effective at communicating the basic features of
+Histograms are extremely effective at communicating the basic features of
 a measure.  The measure in (a) is diffuse but decaying, allocating more
 measure at smaller points than larger points.  Conversely the measure in
 (b) concentrates around a single point while the measure in (c)
@@ -1409,7 +1409,7 @@ M :\; & X & &\rightarrow& \; &[0, \mu(X)]&
 \\
 & x & &\mapsto& & M_{\mu}(x) = \mu(\mathsf{I}_{x}) = \mathbb{I}_{\mu}[I_{\mathsf{I}_{x}}] &.
 \end{alignat*}
-According this mapping is known as a
+This mapping is known as a
 **cumulative distribution function** (@fig-cdf-basics).
 
 ![A cumulative distribution function quantifies how measure is allocated
@@ -1451,7 +1451,7 @@ are any gaps in the allocation, intermediate intervals with zero
 allocated measure, then the cumulative distribution function will
 flatten out completely (@fig-cdf-gap).
 
-::: {#fig-hist-examples layout="[-5, 30, 30, 30, -5]"}
+::: {#fig-cdf-examples layout="[-5, 30, 30, 30, -5]"}
 ![](figures/cdfs/cdf_behaviors/unimodal/unimodal){#fig-cdf-unimodal}
 
 ![](figures/cdfs/cdf_behaviors/narrow_unimodal/narrow_unimodal){#fig-cdf-narrow-unimodal}
@@ -1461,7 +1461,7 @@ flatten out completely (@fig-cdf-gap).
 A careful survey of a cumulative distribution function can communicate a
 wealth of information about the ambient measure.  (a) Here the ambient
 measure is unimodal with the cumulative distribution function
-appreciably increasingly only one we reach the central neighborhood
+appreciably increasing only once we reach the central neighborhood
 where the measure allocation is concentrated.  (b)  A narrower
 concentration results in a steeper cumulative distribution function.
 (c)  A cumulative distribution function flattens if there are any gaps
@@ -1607,7 +1607,7 @@ accumulated measure below $m$,
 $$
 x_{m-} = \underset{x \in X}{\mathrm{argmax}} M(x) < m,
 $$
-and bounded above by the point $x_{+}$ that achieves the smallest
+and bounded above by the point $x_{m+}$ that achieves the smallest
 accumulated measure above $m$ (@fig-quantile-inverse-problems),
 $$
 x_{m+} = \underset{x \in X}{\mathrm{argmin}} M(x) > m.
@@ -1715,7 +1715,7 @@ $$
 $$
 
 The integral of any real-valued function $f: X \rightarrow \mathbb{R}$
-with respect to counting measure is given by over summing all of the
+with respect to counting measure is given by summing over all of the
 output values,
 \begin{align*}
 \mathbb{I}_{\chi}[f]
@@ -1934,7 +1934,7 @@ When a real-valued function has a well-defined Riemann integral then we
 can apply the tools of calculus to evaluate Lebesgue integrals.  The
 exceptional Riemann integrals that can be evaluated analytically allow
 us to compute the corresponding Lebesgue integrals exactly.  More
-generally we can use to numerical integration techniques to approximate
+generally we can use numerical integration techniques to approximate
 the Riemann integrals, and hence approximately evaluate Lebesgue
 integrals.
 
@@ -1959,7 +1959,7 @@ the sign of the Riemann integral.  In order to properly relate Lebesgue
 integrals to Riemann integrals we have to fix the _orientation_ of the
 intervals.
 
-Similarly the mean of a Lebesgue measure would by given by the integral
+Similarly the mean of a Lebesgue measure would be given by the integral
 of the identity function,
 \begin{align*}
 \mathbb{I}_{\lambda}[\iota]