From afa3b668bf12ad5ecd3c12206daad1b2e53ed484 Mon Sep 17 00:00:00 2001
From: Morten Hjorth-Jensen <morten.hjorth-jensen@fys.uio.no>
Date: Thu, 14 Mar 2024 09:18:25 +0100
Subject: [PATCH] Update week9.do.txt

---
 doc/src/week9/week9.do.txt | 443 +++++++++----------------------------
 1 file changed, 101 insertions(+), 342 deletions(-)

diff --git a/doc/src/week9/week9.do.txt b/doc/src/week9/week9.do.txt
index 2c8c6e75..072352a9 100644
--- a/doc/src/week9/week9.do.txt
+++ b/doc/src/week9/week9.do.txt
@@ -1,15 +1,17 @@
 TITLE: Week 11, March 11-15: Resampling Techniques, Bootstrap and Blocking
 AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} Email morten.hjorth-jensen@fys.uio.no at Department of Physics and Center fo Computing in Science Education, University of Oslo, Oslo, Norway & Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, East Lansing, Michigan, USA
-DATE: March 18-22
+DATE: March 11-15
 
 
 !split
 ===== Overview of week 11, March 11-15 =====
 !bblock  Topics
-* Resampling Techniques and statistics: Bootstrap and Blocking 
-* Discussion of onebody densities
-* "Video of lecture TBA":"https://youtu.be/"
-* "Handwritten notes":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/HandWrittenNotes/2024/NotesMarch22.pdf"
+o Reminder from last week about statistical observables, the central limit theorem and bootstrapping, see notes from last week
+o Resampling TechniquesL Blocking 
+o Discussion of onebody densities
+o Start discussion on optimization and parallelization
+#* "Video of lecture TBA":"https://youtu.be/"
+#* "Handwritten notes":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/HandWrittenNotes/2024/NotesMarch22.pdf"
 !eblock
 
 !bblock Teaching Material, videos and written material
@@ -25,20 +27,106 @@ DATE: March 18-22
 * Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods
 * The results can be analysed with the same statistical tools as we would use analysing experimental data.
 * As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
-
-
 !eblock    
 
 !split
 =====  Statistical analysis ===== 
 !bblock 
 * As in other experiments, many numerical  experiments have two classes of errors:
-  o Statistical errors
-  o Systematical errors
+ o Statistical errors
+ o Systematical errors
 * Statistical errors can be estimated using standard tools from statistics
 * Systematical errors are method specific and must be treated differently from case to case. 
 !eblock    
 
+!split
+===== And why do we use such methods? =====
+
+As you will see below, due to correlations between various
+measurements, we need to evaluate the so-called covariance in order to
+establish a proper evaluation of the total variance and the thereby
+the standard deviation of a given expectation value.
+
+The covariance however, leads to an evaluation of a double sum over the various stochastic variables. This becomes computationally too expensive to evaluate.
+
+!split
+===== Central limit theorem =====
+
+Last week we derived the central limit theorem with the following assumptions:
+
+!bblock Measurement $i$
+We assumed that each individual measurement $x_{ij}$ is represented by stochastic variables which independent and identically distributed (iid).
+This defined the sample mean of of experiment $i$ with $n$ samples as
+!bt
+\[
+\overline{x}_i=\frac{1}{n}\sum_{j} x_{ij}.
+\]
+!et
+and the sample variance
+!bt
+\[
+\sigma^2_i=\frac{1}{n}\sum_{j} \left( (x_{ij}-\overline{x}_i\right)^2.
+\]
+!et
+!eblock
+Note that we use $n$ instead of $n-1$ in the definition of
+variance. The sample variance and mean are not necessarily equal to
+the exact values we would get if we knew the corresponding probability
+distribution.
+
+!split
+===== Running many measurements =====
+
+!bblock Adding $m$ measurements $i$
+With the assumption that the average measurements $i$ are also defined as  iid stochastic variables and have the same probability function $p$,
+we defined the total average over $m$ experiments as
+!bt
+\[
+\overline{X}=\frac{1}{m}\sum_{i} \overline{x}_{i}.
+\]
+!et
+and the total variance
+!bt
+\[
+\sigma^2_{m}=\frac{1}{m}\sum_{i} \left( \overline{x}_{i}-\overline{X}\right)^2.
+\]
+!et
+!eblock
+These are the quantities we used in showing that if the individual mean values are iid stochastic variables, then in the limit $m\rightarrow \infty$, the distribution for $\overline{X}$ is given by a Gaussian distribution with variance $\sigma^2_m$.
+
+!split
+===== Adding more definitions =====
+
+The total sample variance over the $mn$ measurements is defined as
+!bt
+\[
+\sigma^2=\frac{1}{mn}\sum_{i=1}^{m} \sum_{j=1}^{n}\left(x_{ij}-\overline{X}\right)^2.
+\]
+!et
+We have from the equation for $\sigma_m^2$ 
+!bt
+\[
+\overline{x}_i-\overline{X}=\frac{1}{n}\sum_{j=1}^{n}\left(x_{i}-\overline{X}\right),
+\]
+!et
+and introducing the centered value $\tilde{x}_{ij}=x_{ij}-\overline{X}, we can rewrite $\sigma_m^2$ as
+!bt
+\[
+\sigma^2_{m}=\frac{1}{m}\sum_{i} \left( \overline{x}_{i}-\overline{X}\right)^2=\frac{1}{m}\sum_{i=1}^{m}\left[ \frac{i}{n}\sum_{j=1}^{n}\tilde{x}_{ij}\right]^2.
+\]
+!et
+
+!split
+===== Further rewriting =====
+We can rewrite the latter in terms of a sum over diagonal elements only and another sum which contains the non-diagonal elements
+!bt
+\begin{align*}
+\sigma^2_{m}& =\frac{1}{m}\sum_{i=1}^{m}\left[ \frac{i}{n}\sum_{j=1}^{n}\tilde{x}_{ij}\right]^2 \\
+            & = \frac{1}{mn^2}\sum_{i=1}^{m} \sum_{j=1}^{n}\tilde{x}_ij}^2+\frac{2}{mn^2}\sum_{i=1}^{m} \sum_{j<k}^{n}\tilde{x}_{ij}\tilde{x}_{ik}.
+\end{align*}
+!et
+The first term on the last rhs is nothing but the total sample variance $\sigma^2$ divided by $m$
+
 
 
 
@@ -342,224 +430,6 @@ called the integrated correlation time.
 
 
 
-
-
-!split
-===== Resampling methods: Jackknife and Bootstrap =====
-
-Two famous
-resampling methods are the _independent bootstrap_ and _the jackknife_. 
-
-The jackknife is a special case of the independent bootstrap. Still, the jackknife was made
-popular prior to the independent bootstrap. And as the popularity of
-the independent bootstrap soared, new variants, such as _the dependent bootstrap_.
-
-The Jackknife and independent bootstrap work for
-independent, identically distributed random variables.
-If these conditions are not
-satisfied, the methods will fail.  Yet, it should be said that if the data are
-independent, identically distributed, and we only want to estimate the
-variance of $\overline{X}$ (which often is the case), then there is no
-need for bootstrapping. 
-
-!split
-===== Resampling methods: Jackknife =====
-
-The Jackknife works by making many replicas of the estimator $\widehat{\theta}$. 
-The jackknife is a resampling method, we explained that this happens by scrambling the data in some way. When using the jackknife, this is done by systematically leaving out one observation from the vector of observed values $\hat{x} = (x_1,x_2,\cdots,X_n)$. 
-Let $\hat{x}_i$ denote the vector
-!bt
-\[
-\hat{x}_i = (x_1,x_2,\cdots,x_{i-1},x_{i+1},\cdots,x_n),
-\]
-!et
-
-which equals the vector $\hat{x}$ with the exception that observation
-number $i$ is left out. Using this notation, define
-$\widehat{\theta}_i$ to be the estimator
-$\widehat{\theta}$ computed using $\vec{X}_i$. 
-
-!split
-===== Resampling methods: Jackknife estimator =====
-
-To get an estimate for the bias and
-standard error of $\widehat{\theta}$, use the following
-estimators for each component of $\widehat{\theta}$
-
-!bt
-\[
-\widehat{\mathrm{Bias}}(\widehat \theta,\theta) = (n-1)\left( - \widehat{\theta} + \frac{1}{n}\sum_{i=1}^{n} \widehat \theta_i \right) \qquad \text{and} \qquad \widehat{\sigma}^2_{\widehat{\theta} } = \frac{n-1}{n}\sum_{i=1}^{n}( \widehat{\theta}_i - \frac{1}{n}\sum_{j=1}^{n}\widehat \theta_j )^2.
-\]
-!et
-
-
-!split
-=====  Jackknife code example =====
-!bc pycod
-from numpy import *
-from numpy.random import randint, randn
-from time import time
-
-def jackknife(data, stat):
-    n = len(data);t = zeros(n); inds = arange(n); t0 = time()
-    ## 'jackknifing' by leaving out an observation for each i                                                                                                                      
-    for i in range(n):
-        t[i] = stat(delete(data,i) )
-
-    # analysis                                                                                                                                                                     
-    print("Runtime: %g sec" % (time()-t0)); print("Jackknife Statistics :")
-    print("original           bias      std. error")
-    print("%8g %14g %15g" % (stat(data),(n-1)*mean(t)/n, (n*var(t))**.5))
-
-    return t
-
-
-# Returns mean of data samples                                                                                                                                                     
-def stat(data):
-    return mean(data)
-
-
-mu, sigma = 100, 15
-datapoints = 10000
-x = mu + sigma*random.randn(datapoints)
-# jackknife returns the data sample                                                                                                                                                
-t = jackknife(x, stat)
-
-!ec
-
-
-!split
-===== Resampling methods: Bootstrap =====
-!bblock
-Bootstrapping is a nonparametric approach to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
-o The bootstrap is quite general, although there are some cases in which it fails.  
-o Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  
-o It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. 
-o It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
-!eblock
-
-
-!split
-===== Resampling methods: Bootstrap background =====
-
-Since $\widehat{\theta} = \widehat{\theta}(\hat{X})$ is a function of random variables,
-$\widehat{\theta}$ itself must be a random variable. Thus it has
-a pdf, call this function $p(\hat{t})$. The aim of the bootstrap is to
-estimate $p(\hat{t})$ by the relative frequency of
-$\widehat{\theta}$. You can think of this as using a histogram
-in the place of $p(\hat{t})$. If the relative frequency closely
-resembles $p(\vec{t})$, then using numerics, it is straight forward to
-estimate all the interesting parameters of $p(\hat{t})$ using point
-estimators.  
-
-
-!split
-===== Resampling methods: More Bootstrap background =====
-
-In the case that $\widehat{\theta}$ has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of $X_i$, $p(x)$, had been known, then it would have
-been straight forward to do this by: 
-o Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \cdots, X_n^*)$. 
-o Then using these numbers, we could compute a replica of $\widehat{\theta}$ called $\widehat{\theta}^*$. 
-
-By repeated use of (1) and (2), many
-estimates of $\widehat{\theta}$ could have been obtained. The
-idea is to use the relative frequency of $\widehat{\theta}^*$
-(think of a histogram) as an estimate of $p(\hat{t})$.
-
-!split
-===== Resampling methods: Bootstrap approach =====
-
-But
-unless there is enough information available about the process that
-generated $X_1,X_2,\cdots,X_n$, $p(x)$ is in general
-unknown. Therefore, "Efron in 1979":"https://projecteuclid.org/euclid.aos/1176344552"  asked the
-question: What if we replace $p(x)$ by the relative frequency
-of the observation $X_i$; if we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
-
-
-Instead of generating the histogram for the relative
-frequency of the observation $X_i$, just draw the values
-$(X_1^*,X_2^*,\cdots,X_n^*)$ with replacement from the vector
-$\hat{X}$. 
-
-!split
-===== Resampling methods: Bootstrap steps =====
-
-The independent bootstrap works like this: 
-
-o Draw with replacement $n$ numbers for the observed variables $\hat{x} = (x_1,x_2,\cdots,x_n)$. 
-o Define a vector $\hat{x}^*$ containing the values which were drawn from $\hat{x}$. 
-o Using the vector $\hat{x}^*$ compute $\widehat{\theta}^*$ by evaluating $\widehat \theta$ under the observations $\hat{x}^*$. 
-o Repeat this process $k$ times. 
-
-When you are done, you can draw a histogram of the relative frequency of $\widehat \theta^*$. This is your estimate of the probability distribution $p(t)$. Using this probability distribution you can estimate any statistics thereof. In principle you never draw the histogram of the relative frequency of $\widehat{\theta}^*$. Instead you use the estimators corresponding to the statistic of interest. For example, if you are interested in estimating the variance of $\widehat \theta$, apply the etsimator $\widehat \sigma^2$ to the values $\widehat \theta ^*$.
-
-
-!split
-===== Code example for the Bootstrap method =====
-
-The following code starts with a Gaussian distribution with mean value $\mu =100$ and variance $\sigma=15$. We use this to generate the data used in the bootstrap analysis. The bootstrap analysis returns a data set after a given number of bootstrap operations (as many as we have data points). This data set consists of estimated mean values for each bootstrap operation. The histogram generated by the bootstrap method shows that the distribution for these mean values is also a Gaussian, centered around the mean value $\mu=100$ but with standard deviation $\sigma/\sqrt{n}$, where $n$ is the number of bootstrap samples (in this case the same as the number of original data points). The value of the standard deviation is what we expect from the central limit theorem. 
-!bc pycod
-
-%matplotlib inline
-
-from numpy import *
-from numpy.random import randint, randn
-from time import time
-from scipy.stats import norm
-import matplotlib.pyplot as plt
-
-# Returns mean of bootstrap samples                                                                                                                                                
-def stat(data):
-    return mean(data)
-
-# Bootstrap algorithm                                                                                                                                                              
-def bootstrap(data, statistic, R):
-    t = zeros(R); n = len(data); inds = arange(n); t0 = time()
-
-    # non-parametric bootstrap                                                                                                                                                     
-    for i in range(R):
-        t[i] = statistic(data[randint(0,n,n)])
-
-    # analysis                                                                                                                                                                     
-    print("Runtime: %g sec" % (time()-t0)); print("Bootstrap Statistics :")
-    print("original           bias      std. error")
-    print("%8g %8g %14g %15g" % (statistic(data), std(data),\
-                             mean(t), \
-                             std(t)))
-    return t
-
-
-mu, sigma = 100, 15
-datapoints = 10000
-x = mu + sigma*random.randn(datapoints)
-# bootstrap returns the data sample                                                                                                          t = bootstrap(x, stat, datapoints)
-# the histogram of the bootstrapped  data  
-t = bootstrap(x, stat, datapoints)
-# the histogram of the bootstrapped  data                                            
-n, binsboot, patches = plt.hist(t, bins=50, density='true',histtype='bar', color='red', alpha=0.75)
-
-# add a 'best fit' line                                                                                                                                                          
-y = norm.pdf( binsboot, mean(t), std(t))
-lt = plt.plot(binsboot, y, 'r--', linewidth=1)
-plt.xlabel('Smarts')
-plt.ylabel('Probability')
-plt.axis([99.5, 100.6, 0, 3.0])
-plt.grid(True)
-
-plt.show()
-
-!ec
-
-
 !split
 ===== Resampling methods: Blocking  =====
 
@@ -677,42 +547,8 @@ For an elegant solution and proof of the blocking method, see the recent article
 
 
 
-We can improve upon this by using the algorithms provided by the _optimize_ package in Python.
-One of these algorithms is  Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. 
-
-The optimization problem is to minimize $f(\mathbf {x} )$ where
-$\mathbf {x}$ is a vector in $R^{n}$, and $f$ is a differentiable
-scalar function. There are no constraints on the values that $\mathbf{x}$ can take.
-
-The algorithm begins at an initial estimate for the optimal value
-$\mathbf {x}_{0}$ and proceeds iteratively to get a better estimate at
-each stage.
-
-The search direction $p_k$ at stage $k$ is given by the solution of the analogue of the Newton equation
-
-!bt
-\[
-B_{k}\mathbf {p} _{k}=-\nabla f(\mathbf {x}_{k}),
-\]
-!et
-
-where $B_{k}$ is an approximation to the Hessian matrix, which is
-updated iteratively at each stage, and $\nabla f(\mathbf {x} _{k})$
-is the gradient of the function
-evaluated at $x_k$. 
-A line search in the direction $p_k$ is then used to
-find the next point $x_{k+1}$ by minimising 
-!bt
-\[
-f(\mathbf {x}_{k}+\alpha \mathbf {p}_{k}),
-\]
-!et
-over the scalar $\alpha > 0$.
-
-
-The modified code here uses the BFGS algorithm but performs now a
-production run and writes to file all average values of the
-energy. 
+!split
+===== Example code form last week =====
 !bc pycod
 # 2-electron VMC code for 2dim quantum dot with importance sampling
 # Using gaussian rng for new positions and Metropolis- Hastings 
@@ -933,89 +769,12 @@ print(frame)
 !ec
 
 
-Note that the _minimize_ function returns the final values for the
-variable $\alpha=x0[0]$ and $\beta=x0[1]$ in the array $x$.
-
-When we have found the minimum, we use these optimal parameters to perform a production run of energies.
-The output is in turn written to file and is used, together with resampling methods like the _blocking method_,
-to obtain the best possible estimate for the standard deviation.   The optimal minimum is, even with our guess, rather close to the exact value of $3.0$ a.u.
-
-The "sampling
-functions":"https://github.com/CompPhysics/ComputationalPhysics2/tree/gh-pages/doc/Programs/Resampling"
-can be used to perform both a blocking analysis, or a standard
-bootstrap and jackknife analysis.
-
-===== How do we proceed? =====
-
-There are several paths which can be chosen. One is to extend the
-brute force gradient descent method with an adapative stochastic
-gradient. There are several examples of this. A recent approach based
-on "the Langevin equations":"https://arxiv.org/pdf/1805.09416.pdf"
-seems like a promising approach for general and possibly non-convex
-optimization problems.
-
-Here we would like to point out that our next step is now to use the
-optimal values for our variational parameters and use these as inputs
-to a production run. Here we would output values of the energy and
-perform for example a blocking analysis of the results in order to get
-a best possible estimate of the standard deviation.
-
-
 
+!split
 ===== Resampling analysis =====
 
 The next step is then to use the above data sets and perform a
-resampling analysis, either using say the Bootstrap method or the
-Blocking method. Since the data will be correlated, we would recommend
-to use the non-iid Bootstrap code here. The theoretical background for these resampling methods is found in the "statistical analysis lecture notes":"http://compphysics.github.io/ComputationalPhysics2/doc/pub/statanalysis/html/statanalysis.html"
-
-Here we have tailored the codes to the output file from the previous example. We present first the bootstrap resampling with non-iid stochastic event.
-
-!bc pycod 
-# Common imports
-import os
-
-# Where to save the figures and data files
-DATA_ID = "Results/EnergyMin"
-
-def data_path(dat_id):
-    return os.path.join(DATA_ID, dat_id)
-
-infile = open(data_path("Energies.dat"),'r')
-
-from numpy import std, mean, concatenate, arange, loadtxt, zeros, ceil
-from numpy.random import randint
-from time import time
-
-
-def tsboot(data,statistic,R,l):
-    t = zeros(R); n = len(data); k = int(ceil(float(n)/l));
-    inds = arange(n); t0 = time()
-    
-    # time series bootstrap
-    for i in range(R):
-        # construct bootstrap sample from
-        # k chunks of data. The chunksize is l
-        _data = concatenate([data[j:j+l] for j in randint(0,n-l,k)])[0:n];
-        t[i] = statistic(_data)
-
-    # analysis
-    print ("Runtime: %g sec" % (time()-t0)); print ("Bootstrap Statistics :")
-    print ("original           bias      std. error")
-    print ("%8g %14g %15g" % (statistic(data), \
-                             mean(t) - statistic(data), \
-                             std(t) ))
-    return t
-# Read in data
-X = loadtxt(infile)
-# statistic to be estimated. Takes two args.
-# arg1: the data
-def stat(data):
-    return mean(data)
-t = tsboot(X, stat, 2**12, 2**10)
-
-!ec
-
+resampling analysis using the blocking method
 The blocking code, based on the article of "Marius Jonsson":"https://journals.aps.org/pre/abstract/10.1103/PhysRevE.98.043304" is given here
 
 !bc pycod