diff --git a/gaussian-processes/gaussian_processes_sparse.ipynb b/gaussian-processes/gaussian_processes_sparse.ipynb
index ef60c00..15aecaa 100644
--- a/gaussian-processes/gaussian_processes_sparse.ipynb
+++ b/gaussian-processes/gaussian_processes_sparse.ipynb
@@ -54,11 +54,11 @@
     "p(\\mathbf{f}_* \\mid \\mathbf{y}) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}) p(\\mathbf{f} \\mid \\mathbf{y}) d\\mathbf{f} \\tag{4}\n",
     "$$\n",
     "\n",
-    "where the conditioning on inputs $\\mathbf{X}$ and $\\mathbf{X}_*$ has been made implicit. The second term inside the integral is the posterior over the training latent variables $\\mathbf{f}$ conditioned on observations $\\mathbf{y}$, the first term is the posterior over predictions $\\mathbf{f}_*$ conditioned on latent training variables $\\mathbf{f}$ (see also equation $(3)$ in [this article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb)). Both terms are intractable to compute for larger training datasets for reasons explained above.\n",
+    "where the conditioning on inputs $\\mathbf{X}$ and $\\mathbf{X}_*$ has been made implicit. The second term inside the integral is the posterior over the training latent variables $\\mathbf{f}$ conditioned on observations $\\mathbf{y}$, the first term is the posterior over predictions $\\mathbf{f}_*$ conditioned on latent training variables $\\mathbf{f}$ (see also equation $(3)$ in [this article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb)). Both terms are intractable to compute for larger training datasets for reasons explained above.\n",
     "\n",
     "### Sparse Gaussian processes\n",
     "\n",
-    "Suppose there is a small set of $m$ inducing variables $\\mathbf{f}_m$ evaluated at inputs $\\mathbf{X}_m$ that describe the function to be modeled \"sufficiently well\" then we could use them as approximation to $\\mathbf{f}$ and $\\mathbf{X}$ and define an approximate posterior:\n",
+    "Suppose there is a small set of $m$ inducing variables $\\mathbf{f}_m$ evaluated at inputs $\\mathbf{X}_m$ that describe the function to be modeled \"sufficiently well\" then we could use them as approximation to $\\mathbf{f}$ and $\\mathbf{X}$ and define an approximate posterior\n",
     "\n",
     "$$\n",
     "q(\\mathbf{f}_*) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}_m) \\phi(\\mathbf{f}_m) d\\mathbf{f}_m \\tag{5}\n",
@@ -258,7 +258,7 @@
     "\\end{align*}\n",
     "$$\n",
     "\n",
-    "where $\\mathbf{c} = \\mathbf{L_B}^{-1} \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$, $\\mathbf{B} = \\mathbf{I} + \\mathbf{A}\\mathbf{A}^T$ and $\\mathbf{A} = \\mathbf{L}^{-1} \\mathbf{K}_{mn} \\sigma_y^{-1}$. Lower-triangular matrices $\\mathbf{L_B}$ and $\\mathbf{L}$ are obtained from a [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\mathbf{B} = \\mathbf{L_B} \\mathbf{L_B}^T$ and $\\mathbf{K}_{mm} = \\mathbf{L} \\mathbf{L}^T$, respectively. $\\mathbf{c}$ and $\\mathbf{A}$ are obtained by solving the equations $\\mathbf{L_B} \\mathbf{c} = \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$ and $\\mathbf{L} \\mathbf{A} = \\mathbf{K}_{mn} \\sigma_y^{-1}$, respectively. Using these definitions, a numerically stable implementation of a negative lower bound (`nlb`) is straightforward."
+    "where $\\mathbf{c} = \\mathbf{L_B}^{-1} \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$, $\\mathbf{B} = \\mathbf{I} + \\mathbf{A}\\mathbf{A}^T$ and $\\mathbf{A} = \\mathbf{L}^{-1} \\mathbf{K}_{mn} \\sigma_y^{-1}$. Lower-triangular matrices $\\mathbf{L_B}$ and $\\mathbf{L}$ are obtained from a [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\mathbf{B} = \\mathbf{L_B} \\mathbf{L_B}^T$ and $\\mathbf{K}_{mm} = \\mathbf{L} \\mathbf{L}^T$, respectively. $\\mathbf{c}$ and $\\mathbf{A}$ are obtained by solving the equations $\\mathbf{L_B} \\mathbf{c} = \\mathbf{A} \\mathbf{y} \\sigma_y^{-1}$ and $\\mathbf{L} \\mathbf{A} = \\mathbf{K}_{mn} \\sigma_y^{-1}$, respectively. The log determinant of $\\mathbf{B}$ is $2 \\sum_{i=1}^m \\log {L_B}_{ii}$ as explained in [this post](https://math.stackexchange.com/a/3211219/648651), for example. Using these definitions, a numerically stable implementation of a negative lower bound (`nlb`) is straightforward."
    ]
   },
   {