Skip to content

Commit

Permalink
docs tutorials merged to master by hand
Browse files Browse the repository at this point in the history
  • Loading branch information
ahbarnett committed Jun 5, 2020
1 parent d8f49f3 commit 5b92daf
Show file tree
Hide file tree
Showing 9 changed files with 275 additions and 48 deletions.
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ only once.
matlab
pythoninterface
juliainterface
examples
tut
related
issues
trouble
users
ackn
refs
Expand Down
46 changes: 0 additions & 46 deletions docs/issues.rst

This file was deleted.

14 changes: 14 additions & 0 deletions docs/numft.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _numft:

Numerical computation of continuous Fourier transforms of functions
===================================================================

It is common to assume that the FFT is the right tool to compute
*continuous Fourier transforms*, but this is not so, unless you are
content with very poor accuracy.
The reason is that the FFT applies to equispaced data samples,
that is, a quadrature scheme with only equispaced nodes.




File renamed without changes.
Binary file added docs/pics/fser1d.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pics/fser2d.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
136 changes: 136 additions & 0 deletions docs/serieseval.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
.. _serieseval:

Fast evaluation of Fourier series at arbitrary points
=====================================================

This is a simple demo of using type 2 NUFFTs to evaluate a given
1D and then 2D Fourier series rapidly (close to optimal scaling)
at arbitrary points.
For conciseness of code, we use the MATLAB interface.
The series we use are vaguely boring random ones relating
to Gaussian random fields---please
insert Fourier series coefficient vectors you care about.

1D Fourier series
~~~~~~~~~~~~~~~~~~

Let our periodic domain be $[0,L)$, so that we get to see how to
rescale from the fixed period of $2\pi$ in FINUFFT.
We set up a random Fourier series with Gaussian decaying coefficients
(this in fact is a sample from a stationary *Gaussian random field*,
or *Gaussian process* with covariance kernel itself a periodized Gaussian):

.. code-block:: matlab
L = 10; % period
kmax = 500; % bandlimit
k = -kmax:kmax-1; % freq indices (negative up through positive mode ordering)
N = 2*kmax; % # modes
rng(0); % make some convenient Fourier coefficients...
fk = randn(N,1)+1i*randn(N,1); % iid random complex data, column vec
k0 = 100; % a freq scale parameter
fk = fk .* exp(-(k/k0).^2).'; % scale the amplitudes, kills high freqs
Now we use a 1D type 2 to evaluate this series at a large number of
points very quickly:

.. code-block:: matlab
M = 1e6; x = L*rand(1,M); % make random target points in [0,L)
tol = 1e-12;
x_scaled = x * (2*pi/L); % don't forget to scale to 2pi-periodic!
tic; c = finufft1d2(x_scaled,+1,tol,fk); toc % evaluate Fourier series at x
::

Elapsed time is 0.026038 seconds.

Compare this to a naive calculation (which serves to remind us exactly what sum FINUFFT approximates):

.. code-block:: matlab
tic; cn = 0*c; for m=k, cn = cn + fk(m+N/2+1)*exp(1i*m*x_scaled.'); end, toc
norm(c-cn,inf)
::
Elapsed time is 11.679265 seconds.
ans =
1.76508266507874e-11

Thus, with only $10^3$ modes, FINUFFT is 500 times faster than naive
multithreaded summation. (Naive summation with reversed loop order is even worse, taking 29 seconds.) We plot $1\%$ of the resulting values and get the smooth but randomly-sampled graph below:

.. code-block:: matlab
Mp = 1e4; % how many pts to plot
jplot = 1:Mp; % indices to plot
plot(x(jplot),real(c(jplot)),'b.'); axis tight; xlabel('x'); ylabel('Re f(x)');
.. image:: pics/fser1d.png
:width: 90%

See the full code `matlab/examples/serieseval1d.m <https://github.com/flatironinstitute/finufft/blob/master/matlab/examples/serieseval1d.m>`_ which also shows
how to evaluate the same series on a uniform grid via the plain FFT.

2D Fourier series
~~~~~~~~~~~~~~~~~

Since we already know how to rescale to periodicity $L$, let's revert to
the natural period and work in the square $[0,2\pi)^2$. We create a random
2D Fourier series, which happens to be for a Gaussian random field with
(doubly periodized) isotropic Matérn kernel of arbitrary power:

.. code-block:: matlab
kmax = 500; % bandlimit per dim
k = -kmax:kmax-1; % freq indices in each dim
N1 = 2*kmax; N2 = N1; % # modes in each dim
[k1 k2] = ndgrid(k,k); % grid of freq indices
rng(0); fk = randn(N1,N2)+1i*randn(N1,N2); % iid random complex modes
k0 = 30; % freq scale parameter
alpha = 3.7; % power; alpha>2 to converge in L^2
fk = fk .* ((k1.^2+k2.^2)/k0^2 + 1).^(-alpha/2); % sqrt of spectral density
We then simply call a 2D type 2 to evaluate this double series at whatever
target points you like:

.. code-block:: matlab
M = 1e6; x = 2*pi*rand(1,M); y = 2*pi*rand(1,M); % random targets in square
tol = 1e-9;
tic; c = finufft2d2(x,y,+1,tol,fk); toc % evaluate Fourier series at (x,y)'s
::

Elapsed time is 0.092743 seconds.
1 million modes to 1 million points in 92 milliseconds on a laptop is decent.
We check the math (using a relative error measure) at just one (generic) point:

.. code-block:: matlab
j = 1; % do math check on 1st target...
c1 = sum(sum(fk.*exp(1i*(k1*x(j)+k2*y(j)))));
abs(c1-c(j)) / norm(c,inf)
::
ans =
2.30520830208365e-10
Finally we use a colored scatter plot to show the first $10\%$ of the points in the square, and see samples of the underlying random field (reminiscent of WMAP microwave background data):

.. code-block:: matlab
jplot = 1:1e5; % indices to plot
scatter(x(jplot),y(jplot),1.0,real(c(jplot)),'filled'); axis tight equal
xlabel('x'); ylabel('y'); colorbar; title('Re f(x,y)');
.. image:: pics/fser2d.png
:width: 70%

See the full code `matlab/examples/serieseval2d.m <https://github.com/flatironinstitute/finufft/blob/master/matlab/examples/serieseval2d.m>`_.

For background on Gaussian random fields, aka, Gaussian processes,
see, eg, C. E. Rasmussen & C. K. I. Williams, *Gaussian Processes for Machine Learning*, the MIT Press, 2006. http://www.GaussianProcess.org/gpml
102 changes: 102 additions & 0 deletions docs/trouble.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
.. _trouble:

Troubleshooting
===============

If you are having issues (segfaults, slowness, "wrong" answers, etc),
there is a high probability it is something we already know about, so
please first read all of the advice below in the section relevant
to your problem:
math, speed, or segfaults.


Mathematical "issues" and advice
********************************

- When requested tolerance is around $10^{-14}$ or less in double-precision,
or $10^{-6}$ or less in single-precision, it
will most likely be impossible for FINUFFT (or any other NUFFT library)
to achieve this, due to inevitable round-off error.
Here, "error" is to be understood relative to the norm of the returned vector
of values.
This is especially true when there is a large number of modes in
any single dimension ($N_1$, $N_2$ or $N_3$), since this empirically
scales the round-off error (fortunately, round-off does not appear to scale
with the total $N$ or $M$).
Such round-off error is analysed and measured in Section 4.2 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_.

- If you request a tolerance that FINUFFT knows it cannot achieve, it will return ``ier=1`` after performing transforms as accurately as it can. However, the status ``ier=0`` does not imply that the requested accuracy *was* achieved, merely that parameters were chosen to give this estimated accuracy, if possible. As our SISC paper shows, for typical situations, relative $\ell_2$ errors match the requested tolerances over a wide range.
Users should always check *convergence* (by, for instance, varying ``tol`` and measuring any changes in their results); this is generally true in scientific computing.

- The type 1 and type 2 transforms are adjoints but **not inverses of each other** (unlike in the plain FFT case, where, up to a constant $N$, the adjoint is the inverse). Therefore, if you are not getting the expected answers, please check that you have not made this assumption. In the :ref:`tutorials <tut>` we will add examples showing how to invert the NUFFT; also see `NFFT3 inverse transforms <https://www-user.tu-chemnitz.de/~potts/nfft/infft.php>`_.


Speed issues and advice
***********************

If FINUFFT is slow (eg, less than $10^6$ nonuniform points per second), here is some advice:

- Try printing debug output to see step-by-step progress by FINUFFT.
Set ``opts.debug`` to 1 or 2 and look at the timing information.

- Try reducing the number of threads (recall as of v1.2 this is controlled externally to FINUFFT), perhaps down to 1 thread, to make sure you are not having collisions between threads. The latter is possible if large problems are run with a large number of (say more than 30) threads. We added the constant ``MAX_USEFUL_NTHREADS`` in ``include/defs.h`` to catch this case. Another corner case causing slowness is very many repetitions of small problems; see ``test/manysmallprobs`` which exceeds $10^7$ points/sec with one thread via the guru interface, but can get ridiculously slower with many threads; see https://github.com/flatironinstitute/finufft/issues/86

- Try setting a crude tolerance, eg ``tol=1e-3``. How many digits do you actually need? This has a big effect in higher dimensions, since the number of flops scales like $(\log 1/\epsilon)^d$, but not quite as big an effect as this scaling would suggest, because in higher dimensions the flops/RAM ratio is higher.

- If type 3, make sure your choice of points does not have a massive *space-bandwidth product* (ie, product of the volumes of the smallest $d$-dimension axes-aligned cuboids enclosing the nonuniform source and the target points); see Remark 5 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_.
In short, if the spreads of $\mathbf{x}_j$ and of $\mathbf{s}_k$ are both big, you may be in trouble.
This can lead to enormous fine grids and hence slow FFTs. Set ``opts.debug=1`` to examine the ``nf1``, etc, fine grid sizes being chosen, and the array allocation sizes. If they are huge, consider direct summation, as discussed :ref:`here <need>`.

- The timing of the first FFTW call is complicated, depending on the FFTW flags (plan mode) used. This is really an
`FFTW planner flag usage <http://www.fftw.org/fftw3_doc/Planner-Flags.html#Planner-Flags>`_ question.
Such issues are known, and modes benchmarked in other documentation, eg for 2D in `poppy <https://poppy-optics.readthedocs.io/en/stable/fft_optimization.html>`_. In short, using more expensive FFTW planning modes like ``FFTW_MEASURE`` can give better performance for repeated FFTW calls, but be **much** more expensive in the first (planning) call. This is why we choose ``FFTW_ESTIMATE`` as our default ``opts.fftw`` option.

- Make sure you did not override ``opts.spread_sort``, which if set to zero
does no sorting, which can give very slow RAM access if the nonuniform points
are ordered poorly (eg randomly) in larger 2D or 3D problems.

- Are you calling the simple interface a huge number of times for small problems, but these tasks have something in common (number of modes, or locations of nonuniform points)? If so, try the "many vector" or guru interface, which removes overheads in repeated FFTW plan look-up, and in bin-sorting. They can be 10-100x faster.


Crash (segfault) issues and advice
****************************************

- The most common problem is passing in pointers to the wrong size of object,
eg, single vs double precision, or int32 vs int64. Equivalently, make sure you linked against the correct precision version of the library (``-lfinufft`` vs ``-lfinufftf``). Also make sure, if you changed the precision when compiling FINUFFT, that you did at least ``make objclean`` otherwise you'll get mixed precision object files and a guaranteed segfault!

- Currently you cannot link to both precisions at the same time, because
the namespaces will collide. Did you do this?

- If you use C++/C/Fortran and tried to change options, did you forget to call ``finufft_default_opts`` first?

- To isolate where a crash is occurring, set ``opts.debug`` to 1 or 2, and check the text output of the various stages. With a debug setting of 2 or above, when ``ntrans>1`` a large amount of text can be generated.
To diagnose problems with the spread/interpolation stage, similarly setting ``opts.spread_debug`` to 1 or 2 will print even more output. Here the setting 2 generates a large amount of output even for a single transform.




Other known issues with library and interfaces
**********************************************

The master list is the github issues for the project page,
https://github.com/flatironinstitute/finufft/issues

A secondary and more speculative list is in the ``TODO`` text file.

Please look through those issue topics, since sometimes workarounds
are discussed before the problem is fixed in a release.



Bug reports
***********

If you think you have found a new bug, and have read the above, please
file a new issue on the github project page,
https://github.com/flatironinstitute/finufft/issues .
Include a minimal code which reproduces the bug, along with
details about your machine, operating system, compiler, version of FINUFFT, and output with ``opts.debug`` at least 1.
If you have a known bug and have ideas, please add to the comments for that issue.

You may also contact Alex Barnett (``abarnett``
at-sign ``flatironinstitute.org``) with FINUFFT in the subject line.
21 changes: 21 additions & 0 deletions docs/tut.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.. _tut:

Tutorials and application demos
==================================

The following are instructive demos of using FINUFFT for a variety of
spectrally-related tasks in
numerical computing and signal/image processing. We will slowly grow the
list (contact us to add one).
For conciseness of code, and ease of writing, they are currently
in MATLAB (tested on R2017a or later).

.. toctree::

serieseval
numft
peripois2d

For further applications, see :ref:`references <refs>`, and these
`PDF slides <http://users.flatironinstitute.org/~ahb/notes/wam19.pdf>`_.

0 comments on commit 5b92daf

Please sign in to comment.