diff --git a/README.md b/README.md
index c3defa3..df2f618 100644
--- a/README.md
+++ b/README.md
@@ -4,46 +4,46 @@
This repository is a collection of notebooks about *Bayesian Machine Learning*. The following links display
some of the notebooks via [nbviewer](https://nbviewer.jupyter.org/) to ensure a proper rendering of formulas.
+Dependencies are specified in `requirements.txt` files in subdirectories.
- [Bayesian regression with linear basis function models](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-linear-regression/bayesian_linear_regression.ipynb).
- Introduction to Bayesian linear regression. Implementation from scratch with plain NumPy as well as usage of scikit-learn
- for comparison. See also
- [PyMC4 implementation](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-linear-regression/bayesian_linear_regression_pymc4.ipynb) and
+ Introduction to Bayesian linear regression. Implementation with plain NumPy and scikit-learn. See also
[PyMC3 implementation](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-linear-regression/bayesian_linear_regression_pymc3.ipynb).
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb)
[Gaussian processes](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes.ipynb?flush_cache=true).
- Introduction to Gaussian processes for regression. Example implementations with plain NumPy/SciPy as well as with libraries
- scikit-learn and GPy ([requirements.txt](gaussian-processes/requirements.txt)).
+ Introduction to Gaussian processes for regression. Implementation with plain NumPy/SciPy as well as with scikit-learn and GPy.
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb)
[Gaussian processes for classification](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb).
- Introduction to Gaussian processes for classification. Example implementations with plain NumPy/SciPy as well as with
- scikit-learn ([requirements.txt](gaussian-processes/requirements.txt)).
+ Introduction to Gaussian processes for classification. Implementation with plain NumPy/SciPy as well as with scikit-learn.
+
+- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_sparse.ipynb)
+ [Sparse Gaussian processes](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_sparse.ipynb).
+ Introduction to sparse Gaussian processes using a variational approach. Example implementation with JAX.
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-optimization/bayesian_optimization.ipynb)
[Bayesian optimization](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-optimization/bayesian_optimization.ipynb).
- Introduction to Bayesian optimization. Example implementations with plain NumPy/SciPy as well as with libraries
- scikit-optimize and GPyOpt. Hyper-parameter tuning as application example.
+ Introduction to Bayesian optimization. Implementation with plain NumPy/SciPy as well as with libraries scikit-optimize
+ and GPyOpt. Hyper-parameter tuning as application example.
- [Variational inference in Bayesian neural networks](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-neural-networks/bayesian_neural_networks.ipynb).
- Demonstrates how to implement a Bayesian neural network and variational inference of network parameters. Example implementation
- with Keras ([requirements.txt](bayesian-neural-networks/requirements.txt)). See also
- [PyMC4 implementation](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/bayesian-neural-networks/bayesian_neural_networks_pymc4.ipynb).
+ Demonstrates how to implement a Bayesian neural network and variational inference of weights. Example implementation
+ with Keras.
- [Reliable uncertainty estimates for neural network predictions](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/noise-contrastive-priors/ncp.ipynb).
- Uses noise contrastive priors in Bayesian neural networks to get more reliable uncertainty estimates for OOD data.
- Implemented with Tensorflow 2 and Tensorflow Probability ([requirements.txt](noise-contrastive-priors/requirements.txt)).
+ Uses noise contrastive priors for Bayesian neural networks to get more reliable uncertainty estimates for OOD data.
+ Implemented with Tensorflow 2 and Tensorflow Probability.
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/latent-variable-models/latent_variable_models_part_1.ipynb)
[Latent variable models, part 1: Gaussian mixture models and the EM algorithm](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/latent-variable-models/latent_variable_models_part_1.ipynb).
- Introduction to the expectation maximization (EM) algorithm and its application to Gaussian mixture models. Example
- implementation with plain NumPy/SciPy and scikit-learn for comparison. See also
+ Introduction to the expectation maximization (EM) algorithm and its application to Gaussian mixture models.
+ Implementation with plain NumPy/SciPy and scikit-learn. See also
[PyMC3 implementation](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/latent-variable-models/latent_variable_models_part_1_pymc3.ipynb).
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/latent-variable-models/latent_variable_models_part_2.ipynb)
[Latent variable models, part 2: Stochastic variational inference and variational autoencoders](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/latent-variable-models/latent_variable_models_part_2.ipynb).
- Introduction to stochastic variational inference with variational autoencoder as application example. Implementation
+ Introduction to stochastic variational inference with a variational autoencoder as application example. Implementation
with Tensorflow 2.x.
- [Deep feature consistent variational autoencoder](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/autoencoder-applications/variational_autoencoder_dfc.ipynb).
diff --git a/gaussian-processes/gaussian_processes.ipynb b/gaussian-processes/gaussian_processes.ipynb
index fa60a46..2888fb3 100644
--- a/gaussian-processes/gaussian_processes.ipynb
+++ b/gaussian-processes/gaussian_processes.ipynb
@@ -400,7 +400,8 @@
}
],
"source": [
- "from numpy.linalg import cholesky, det, lstsq\n",
+ "from numpy.linalg import cholesky, det\n",
+ "from scipy.linalg import solve_triangular\n",
"from scipy.optimize import minimize\n",
"\n",
"def nll_fn(X_train, Y_train, noise, naive=True):\n",
@@ -437,14 +438,15 @@
" # in http://www.gaussianprocess.org/gpml/chapters/RW2.pdf, Section\n",
" # 2.2, Algorithm 2.1.\n",
" \n",
- " def ls(a, b):\n",
- " return lstsq(a, b, rcond=-1)[0]\n",
- " \n",
" K = kernel(X_train, X_train, l=theta[0], sigma_f=theta[1]) + \\\n",
" noise**2 * np.eye(len(X_train))\n",
" L = cholesky(K)\n",
+ " \n",
+ " S1 = solve_triangular(L, Y_train, lower=True)\n",
+ " S2 = solve_triangular(L.T, S1, lower=False)\n",
+ " \n",
" return np.sum(np.log(np.diagonal(L))) + \\\n",
- " 0.5 * Y_train.dot(ls(L.T, ls(L, Y_train))) + \\\n",
+ " 0.5 * Y_train.dot(S2) + \\\n",
" 0.5 * len(X_train) * np.log(2*np.pi)\n",
"\n",
" if naive:\n",
diff --git a/gaussian-processes/gaussian_processes_sparse.ipynb b/gaussian-processes/gaussian_processes_sparse.ipynb
new file mode 100644
index 0000000..ef60c00
--- /dev/null
+++ b/gaussian-processes/gaussian_processes_sparse.ipynb
@@ -0,0 +1,51250 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_sparse.ipynb)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " # Check if notebook is running in Google Colab\n",
+ " import google.colab\n",
+ " # Get additional files from Github\n",
+ " !wget https://raw.githubusercontent.com/krasserm/bayesian-machine-learning/dev/gaussian-processes/gaussian_processes_util.py\n",
+ "except:\n",
+ " pass"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Sparse Gaussian processes\n",
+ "\n",
+ "## Introduction\n",
+ "\n",
+ "Exact Gaussian processes cannot be applied to larger training datasets because their time complexity scales with $O(n^3)$ where $n$ is the size of the training set. Approximate or sparse Gaussian processes are based on a small set of $m$ inducing variables that reduce the time complexity to $O(nm^2)$. \n",
+ "\n",
+ "In this article I give an introduction to sparse Gaussian processes as described in \\[1\\] and provide a simple implementation with [JAX](https://github.com/google/jax). The main reason for using JAX instead of plain NumPy was the need to compute gradients of the variational lower bound. The mathematical descriptions in this article are kept on a rather high level so that the focus is on intuition rather than on a detailed derivation of equations. \n",
+ "\n",
+ "### Exact Gaussian processes\n",
+ "\n",
+ "In a [previous article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb) I introduced exact Gaussian processes for regression. A Gaussian process is a [random process](https://en.wikipedia.org/wiki/Stochastic_process) where any point $\\mathbf{x} \\in \\mathbb{R}^d$ is assigned a random variable $f(\\mathbf{x})$ and where the joint distribution of a finite number of these variables $p(f(\\mathbf{x}_1),...,f(\\mathbf{x}_N)) = p(\\mathbf{f} \\mid \\mathbf{X}) = \\mathcal{N}(\\mathbf{f} \\mid \\boldsymbol\\mu, \\mathbf{K})$ is itself Gaussian. Covariance matrix $\\mathbf{K}$ is defined by a kernel function $\\kappa$ where $\\mathbf{K} = \\kappa(\\mathbf{X},\\mathbf{X})$. Mean $\\boldsymbol\\mu$ is often set to $\\mathbf{0}$. A GP can be used to define a prior over functions.\n",
+ "\n",
+ "A GP prior can be converted into a posterior by conditioning on a training dataset $\\mathbf{X}, \\mathbf{y}$ where $\\mathbf{y}$ are noisy realizations of function values $\\mathbf{f}$. For independent Gaussian noise we have $y_i = f(\\mathbf{x}_i) + \\epsilon_i$ where $\\epsilon_i \\sim \\mathcal{N}(0, \\sigma_y^2)$. Noise-free training function values $\\mathbf{f}$ are not directly observed i.e. are latent variables. The posterior over function values $\\mathbf{f}_*$ at inputs $\\mathbf{X}_*$ conditioned on training data is given by\n",
+ "\n",
+ "$$\n",
+ "\\begin{align*}\n",
+ "p(\\mathbf{f}_* \\mid \\mathbf{X}_*,\\mathbf{X},\\mathbf{y}) &= \\mathcal{N}(\\mathbf{f}_* \\mid \\boldsymbol{\\mu}_*, \\boldsymbol{\\Sigma}_*)\\tag{1} \\\\\n",
+ "\\boldsymbol{\\mu}_* &= \\mathbf{K}_*^T \\mathbf{K}_y^{-1} \\mathbf{y}\\tag{2} \\\\\n",
+ "\\boldsymbol{\\Sigma}_* &= \\mathbf{K}_{**} - \\mathbf{K}_*^T \\mathbf{K}_y^{-1} \\mathbf{K}_*\\tag{3}\n",
+ "\\end{align*}\n",
+ "$$\n",
+ "\n",
+ "where $\\mathbf{K}_y = \\mathbf{K} + \\sigma_y^2\\mathbf{I}$, $\\mathbf{K}_* = \\kappa(\\mathbf{X},\\mathbf{X}_*)$ and $\\mathbf{K}_{**} = \\kappa(\\mathbf{X}_*,\\mathbf{X}_*)$. It can be used to predict function values $\\mathbf{f}_*$ at new inputs $\\mathbf{X}_*$. Computation of $\\boldsymbol{\\mu}_*$ and $\\boldsymbol{\\Sigma}_*$ requires the inversion of $\\mathbf{K}_y$, an $n \\times n$ matrix, where $n$ is the size of the training set. Matrix inversion becomes computationally intractable for larger $n$. To outline a solution to this problem, we note that the posterior can also be defined as\n",
+ "\n",
+ "$$\n",
+ "p(\\mathbf{f}_* \\mid \\mathbf{y}) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}) p(\\mathbf{f} \\mid \\mathbf{y}) d\\mathbf{f} \\tag{4}\n",
+ "$$\n",
+ "\n",
+ "where the conditioning on inputs $\\mathbf{X}$ and $\\mathbf{X}_*$ has been made implicit. The second term inside the integral is the posterior over the training latent variables $\\mathbf{f}$ conditioned on observations $\\mathbf{y}$, the first term is the posterior over predictions $\\mathbf{f}_*$ conditioned on latent training variables $\\mathbf{f}$ (see also equation $(3)$ in [this article](https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/dev/gaussian-processes/gaussian_processes_classification.ipynb)). Both terms are intractable to compute for larger training datasets for reasons explained above.\n",
+ "\n",
+ "### Sparse Gaussian processes\n",
+ "\n",
+ "Suppose there is a small set of $m$ inducing variables $\\mathbf{f}_m$ evaluated at inputs $\\mathbf{X}_m$ that describe the function to be modeled \"sufficiently well\" then we could use them as approximation to $\\mathbf{f}$ and $\\mathbf{X}$ and define an approximate posterior:\n",
+ "\n",
+ "$$\n",
+ "q(\\mathbf{f}_*) = \\int p(\\mathbf{f}_* \\mid \\mathbf{f}_m) \\phi(\\mathbf{f}_m) d\\mathbf{f}_m \\tag{5}\n",
+ "$$\n",
+ "\n",
+ "where $\\phi(\\mathbf{f}_m)$ is an approximation to the intractable $p(\\mathbf{f}_m \\mid \\mathbf{y})$:\n",
+ "\n",
+ "$$\n",
+ "\\phi(\\mathbf{f}_m) = \\mathcal{N}(\\mathbf{f}_m \\mid \\boldsymbol{\\mu}_m, \\mathbf{A}_m) \\tag{6}\n",
+ "$$\n",
+ "\n",
+ "Goal is to find optimal values for mean $\\boldsymbol{\\mu}_m$ and covariance matrix $\\mathbf{A}_m$. The quality of $\\phi(\\mathbf{f}_m)$ also depends on the location of the inducing inputs $\\mathbf{X}_m$, hence, our goal is to find their optimal values as well. The mean and covariance matrix of the Gaussian approximate posterior $q(\\mathbf{f}_*)$ are defined in terms of $\\boldsymbol{\\mu}_m$, $\\mathbf{A}_m$ and $\\mathbf{X}_m$:\n",
+ "\n",
+ "$$\n",
+ "\\begin{align*}\n",
+ "q(\\mathbf{f}_*) &= \\mathcal{N}(\\mathbf{f}_* \\mid \\boldsymbol{\\mu}_*^q, \\boldsymbol{\\Sigma}_*^q) \\tag{7} \\\\\n",
+ "\\boldsymbol{\\mu}_*^q &= \\mathbf{K}_{*m} \\mathbf{K}_{mm}^{-1} \\boldsymbol{\\mu}_m \\tag{8} \\\\\n",
+ "\\boldsymbol{\\Sigma}_*^q &= \\mathbf{K}_{**} - \\mathbf{K}_{*m} \\mathbf{K}_{mm}^{-1} \\mathbf{K}_{m*} + \\mathbf{K}_{*m} \\mathbf{K}_{mm}^{-1} \\mathbf{A}_m \\mathbf{K}_{mm}^{-1} \\mathbf{K}_{m*} \\tag{9}\n",
+ "\\end{align*}\n",
+ "$$\n",
+ "\n",
+ "where $\\mathbf{K}_{mm} = \\kappa(\\mathbf{X}_m, \\mathbf{X}_m)$, $\\mathbf{K}_{*m} = \\kappa(\\mathbf{X}_*, \\mathbf{X}_m)$ and $\\mathbf{K}_{m*} = \\mathbf{K}_{*m}^T$. For a single test input, this approximate posterior can be computed in $O(nm^2)$ after having found optimal values for $\\boldsymbol{\\mu}_m$, $\\mathbf{A}_m$ and $\\mathbf{X}_m$. \\[1\\] uses a variational approach for optimizing $\\boldsymbol{\\mu}_m$, $\\mathbf{A}_m$ and $\\mathbf{X}_m$ by minimizing the Kullback-Leibler (KL) divergence between the approximate posterior $q(\\mathbf{f})$ and the exact posterior $p(\\mathbf{f} \\mid \\mathbf{y})$ over training latent variables $\\mathbf{f}$.\n",
+ "\n",
+ "Minimization of this KL divergence is equivalent to maximization of a lower bound $\\mathcal{L}(\\boldsymbol{\\mu}_m, \\mathbf{A}_m, \\mathbf{X}_m)$ on the true log marginal likelihood $\\log p(\\mathbf{y})$. This lower bound can be optimized by analytically solving for $\\boldsymbol{\\mu}_m$ and $\\mathbf{A}_m$. The resulting lower bound after optimization is a function of $\\mathbf{X}_m$:\n",
+ "\n",
+ "$$\n",
+ "\\mathcal{L}(\\mathbf{X}_m) = \\log \\mathcal{N} (\\mathbf{y} \\mid \\mathbf{0}, \\sigma_y^2 \\mathbf{I} + \\mathbf{Q}_{nn}) - \\frac{1}{2 \\sigma_y^2} \\mathrm{Tr}(\\mathbf{K}_{nn} - \\mathbf{Q}_{nn}) \\tag{10}\n",
+ "$$\n",
+ "\n",
+ "\n",
+ "where $\\mathbf{Q}_{nn} = \\mathbf{K}_{nm} \\mathbf{K}_{mm}^{-1} \\mathbf{K}_{mn}$, $\\mathbf{K}_{nn} = \\kappa(\\mathbf{X},\\mathbf{X})$, $\\mathbf{K}_{nm} = \\kappa(\\mathbf{X},\\mathbf{X}_m)$ and $\\mathbf{K}_{mn} = \\mathbf{K}_{nm}^T$. Equation $(10)$ can be computed in $O(nm^2)$, as shown in section [Optimization](#Optimization), and used to optimize inducing inputs $\\mathbf{X}_m$ jointly with kernel hyperparamaters using a numeric optimization method. The first term on the RHS is an approximate log likelihood term. Optimizing this term alone could lead to overfitting.\n",
+ "\n",
+ "The second term is a regularization term which is a result of using a variational approach. This term can be interpreted as minimizing the error predicting $\\mathbf{f}$ from inducing variables $\\mathbf{f}_m$. The better the variables $\\mathbf{f}_m$ represent the function to be modeled the smaller this error will be. Hence, optimization will try to find optimal positions for inducing inputs $\\mathbf{X}_m$. With optimal values for $\\mathbf{X}_m$, we can optimize $\\boldsymbol{\\mu}_m$ and $\\mathbf{A}_m$ i.e. the parameters of $\\phi(\\mathbf{f}_m)$ analytically with \n",
+ "\n",
+ "$$\n",
+ "\\begin{align*}\n",
+ "\\boldsymbol{\\mu}_m &= \\frac{1}{\\sigma_y^2} \\mathbf{K}_{mm} \\boldsymbol{\\Sigma} \\mathbf{K}_{mn} \\mathbf{y} \\tag{11} \\\\\n",
+ "\\mathbf{A}_m &= \\mathbf{K}_{mm} \\boldsymbol{\\Sigma} \\mathbf{K}_{mm} \\tag{12}\n",
+ "\\end{align*}\n",
+ "$$\n",
+ "\n",
+ "where $\\boldsymbol{\\Sigma} = (\\mathbf{K}_{mm} + \\sigma_y^{-2} \\mathbf{K}_{mn} \\mathbf{K}_{nm})^{-1}$. $\\boldsymbol{\\mu}_m$ and $\\mathbf{A}_m$ are then substituted into equations $(8)$ and $(9)$ to compute the approximate posterior $q(\\mathbf{f}_*)$ at new inputs $\\mathbf{X}_*$. This is the minimum we need to know for implementing sparse Gaussian processes for regression. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Implementation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import jax.numpy as jnp\n",
+ "import jax.scipy as jsp\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "from jax import random, jit, value_and_grad\n",
+ "from jax.config import config\n",
+ "from scipy.optimize import minimize\n",
+ "\n",
+ "config.update(\"jax_enable_x64\", True)\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Training dataset\n",
+ "\n",
+ "The training dataset is taken from [this example](https://gpflow.readthedocs.io/en/master/notebooks/advanced/gps_for_big_data.html#Generating-data) of the [GPflow](https://gpflow.readthedocs.io/)\\[2\\] project. We'll use `n` noisy training examples drawn from `func` with Gaussian noise `sigma_y`. The number of inducing variables is `m`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def func(x):\n",
+ " \"\"\"Latent function.\"\"\"\n",
+ " return 1.0 * jnp.sin(x * 3 * jnp.pi) + \\\n",
+ " 0.3 * jnp.cos(x * 9 * jnp.pi) + \\\n",
+ " 0.5 * jnp.sin(x * 7 * jnp.pi)\n",
+ "\n",
+ "\n",
+ "# Number of training examples\n",
+ "n = 1000\n",
+ "\n",
+ "# Number of inducing variables\n",
+ "m = 30\n",
+ "\n",
+ "# Noise\n",
+ "sigma_y = 0.2\n",
+ "\n",
+ "# Noisy training data\n",
+ "X = jnp.linspace(-1.0, 1.0, n).reshape(-1, 1)\n",
+ "y = func(X) + sigma_y * random.normal(random.PRNGKey(0), shape=(n, 1))\n",
+ "\n",
+ "# Test data\n",
+ "X_test = np.linspace(-1.5, 1.5, 1000).reshape(-1, 1)\n",
+ "f_true = func(X_test)\n",
+ "\n",
+ "# Inducing inputs\n",
+ "X_m = jnp.linspace(-0.4, 0.4, m).reshape(-1, 1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ "