A while ago I was giving lectures at a course about Monte Carlo (MC) simulations in Medical Physics using EGSnrc. One of my lectures was on Variance Reduction Techniques in the context of radiation transport simulations, and I was looking for a good (but simple) example to demonstrate to the students how one can increase the efficiency of MC simulations by orders of magnitude by some modest amount of cleverness. One classic introductory example for MC is the computation of x = 2*rndm() - 1, y = 2*rndm() - 1
, where rndm()
is a function returning random values uniformly distributed in 0...1
), and count the number of points
In analogy to the circle, the simplest implementation to pick a random point in a 12D unit sphere is
- Pick 12 random numbers
$x_i$ uniformly distributed in[-1, 1]
. - Compute
$r^2 = \sum x_i^2$ . - If
$r^2 \le 1$ deliver${x_i}$ - Goto step 1
Let's see how this goes. The code is in simple12d.cpp
. I'm using a Xorshift random number generator (RNG), specifically xoshiro256+
, which is currently my favorite RNG (passes all known tests, very fast). The code is in randomGenerator.h
and randomGenerator.cpp
. So
cmake --build .
./bin/simple12d
It took 306201633 attempts to sample 100000 points in a 12d sphere
Sampling efficiency: 0.000326582
<r^2> = 0.856875
Run time: 5675.12 ms
This is on a Ryzen-7950X CPU. We need to sample ~3,000 random 12d points in a 12d cube to find one that is inside the 12d sphere! This is in stark contrast to 2d (circle, fraction of points inside is
We need to do some math. The probability distribution function (pdf) for points in a 12d unit sphere is
where
which results in
Now apply the trivial variable transformation
which results in
I.e., all we need to do to get the [0, 1]
, sort them in ascending order, and assign the first to
as expected from the Wikipedia article. The sampling algorithm is then as follows
- Pick 6 random numbers in
[0, 1]
- Sort in ascending order and assign the lowest to
$z_1$ , second lowest to$z_2$ , etc. - Set
$r_i = \sqrt{z_i - z_{i-1}}$ where we use the convention$z_0 = 0$ . - Pick 6 azimuthal angles randomly in
$[0, 2 \pi]$ - Deliver
$x_{2i + 0} = r_i \cos \phi_i,~x_{2i + 1} = r_i \sin \phi_i,~i = 1 \cdots 6$
without the need for any rejections. This is implemented in smart12d.cpp
and we get
./bin/smart12d
It took 100000 attempts to sample 100000 points in a 12d sphere
Sampling efficiency: 1
<r^2> = 0.85702
Run time: 10.627 ms
Nice. 5675 / 10.6 = 537
times faster than the simple method.
I'm computing and reporting the average <r^2> = 0.85702
actually correct? As
so yes, the result is correct (within statistical uncertainty).
Obviously one can easily generalize the code here to sample random points in any
We started by talking about picking random points inside a circle and using a rejection method. Obviously one can go to polar coordinates and have a rejection free method:
- Pick random
$r$ in[0...1]
using$r {\rm d}r$ as pdf. I.e.,$r = \sqrt{\eta}$ where$\eta$ is a random number uniformly distributed in[0...1]
. - Pick a random angle in
$[0, 2 \pi]$ - Deliver point
$(r \cos \phi, r \sin \phi)$
But is this faster? circle.cpp
has several methods for picking random points in a (unit) circle. We get
./bin/circle
===================== Rejection(1000000 points)
<r2> = 0.500039
Time: 8.54295 ms
===================== Direct1(1000000 points)
<r2> = 0.499589
Time: 27.7229 ms
===================== Direct2(1000000 points)
<r2> = 0.50024
Time: 9.45063 ms
===================== Direct3(1000000 points)
<r2> = 0.499968
Time: 10.5975 ms
Clearly no. Rejection is still fastest. Direct1
is the above algorithm, so more than 3 times slower (despite hardware implementations for sqrt
and cos/sin
). Direct2
uses a different method for obtaining randomAzimuth()
function in circle.cpp
). This is much faster than Direct1
, but still slower than rejection. Direct3
uses a trick to avoid the evaluation of sqrt
: one picks 2 random numbers in [0, 1]
and uses the larger of the two for max(rndm(), rndm()
used to be faster than sqrt(rndm())
, but the sqrt
hardware implementation on the Ryzen is apparently fast enough to beat max(rndm(), rndm())
.
What about 3D?
./bin/sphere
===================== Rejection(1000000 points)
<r2> = 0.600087
Time: 23.8875 ms
===================== Direct1(1000000 points)
<r2> = 0.600181
Time: 20.7947 ms
===================== Direct2(1000000 points)
<r2> = 0.600106
Time: 13.8537 ms
Here rejection becomes slower. The fastest method (o my CPU) is this:
- Set
r = max(rndm(), max(rndm(), rndm()))
(maximum of 3 random numbers uniformly distributed in[0, 1]
. The resulting pdf forr
is$r^2 {\rm d}r$ - Set
cost = 2*rndm()-1, sint = sqrt(1 - cost*cost)
(cosine and sine of the polar angle) - Pick
cphi, sphi
usingrandomAzimuth()
- Deliver 3D point
(r*sint*cphi, r*sint*sphi, r*cost)