diff --git a/docs/index.rst b/docs/index.rst
index 5c637f9d3..b28a1c6bc 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -75,9 +75,9 @@ only once.
    matlab
    pythoninterface
    juliainterface
-   examples           
+   tut          
    related
-   issues
+   trouble
    users
    ackn
    refs
diff --git a/docs/issues.rst b/docs/issues.rst
deleted file mode 100644
index 922bcbddd..000000000
--- a/docs/issues.rst
+++ /dev/null
@@ -1,46 +0,0 @@
-Known Issues
-============
-
-One should also check the github issues for the project page,
-https://github.com/flatironinstitute/finufft/issues
-
-Also see notes in the ``TODO`` file.
-
-Issues with library
-*******************
-
-- When requested accuracy is ``1e-14`` or less, it is sometimes not possible to match this, especially when there are a large number of input and/or output points. This is believed to be unavoidable round-off error.
-
-- Currently in Mac OSX, ``make lib`` fails to make the shared object library (.so).
-
-- The timing of the first FFTW call is complicated, depending on whether FFTW_ESTIMATE (the default) or FFTW_MEASURE is used. Such issues are known, and discussed in other documentation, eg https://pythonhosted.org/poppy/fft_optimization.html .
-  We would like to find a way of pre-storing some Intel-specific FFTW plans (as MATLAB does) to avoid the large FFTW planning times.
-  
-- Currently, a single library name is used for single- and multi-threaded versions. Thus, i) you need to ``make clean`` before changing such make options, and ii) if you wish to maintain multiple such versions you need to move them around and maintain them yourself, eg by duplicating the directory.
-
-- The overhead for small problem sizes (<10000 data points) is too high, due to things such as the delay in FFTW looking up pre-stored wisdom. A unified advanced interface with a plan stage is in the works.
-    
-  
-Issues with interfaces
-**********************
-
-- MATLAB, octave and python cannot exceed input or output data sizes of 2^31.
-
-- MATLAB, octave and python interfaces do not handle single precision.
-    
-- Fortran interface does not allow control of options, nor data sizes exceeding 2^31.
-  
-
-
-Bug reports
-***********
-  
-If you think you have found a bug, please
-file an issue on the github project page,
-https://github.com/flatironinstitute/finufft/issues .
-Include a minimal code which reproduces the bug, along with
-details about your machine, operating system, compiler, and version of FINUFFT.
-
-You may also contact Alex Barnett (``abarnett``
-at-sign ``flatironinstitute.org``) with FINUFFT in the subject line.
-
diff --git a/docs/numft.rst b/docs/numft.rst
new file mode 100644
index 000000000..cbf063de3
--- /dev/null
+++ b/docs/numft.rst
@@ -0,0 +1,14 @@
+.. _numft:
+
+Numerical computation of continuous Fourier transforms of functions
+===================================================================
+
+It is common to assume that the FFT is the right tool to compute
+*continuous Fourier transforms*, but this is not so, unless you are
+content with very poor accuracy.
+The reason is that the FFT applies to equispaced data samples,
+that is, a quadrature scheme with only equispaced nodes.
+
+
+
+
diff --git a/docs/examples.rst b/docs/peripois2d.rst
similarity index 100%
rename from docs/examples.rst
rename to docs/peripois2d.rst
diff --git a/docs/pics/fser1d.png b/docs/pics/fser1d.png
new file mode 100644
index 000000000..d26f56914
Binary files /dev/null and b/docs/pics/fser1d.png differ
diff --git a/docs/pics/fser2d.png b/docs/pics/fser2d.png
new file mode 100644
index 000000000..4f161e7b5
Binary files /dev/null and b/docs/pics/fser2d.png differ
diff --git a/docs/serieseval.rst b/docs/serieseval.rst
new file mode 100644
index 000000000..08cd484ff
--- /dev/null
+++ b/docs/serieseval.rst
@@ -0,0 +1,136 @@
+.. _serieseval:
+
+Fast evaluation of Fourier series at arbitrary points
+=====================================================
+
+This is a simple demo of using type 2 NUFFTs to evaluate a given
+1D and then 2D Fourier series rapidly (close to optimal scaling)
+at arbitrary points.
+For conciseness of code, we use the MATLAB interface.
+The series we use are vaguely boring random ones relating
+to Gaussian random fields---please
+insert Fourier series coefficient vectors you care about.
+
+1D Fourier series
+~~~~~~~~~~~~~~~~~~
+
+Let our periodic domain be $[0,L)$, so that we get to see how to
+rescale from the fixed period of $2\pi$ in FINUFFT.
+We set up a random Fourier series with Gaussian decaying coefficients
+(this in fact is a sample from a stationary *Gaussian random field*,
+or *Gaussian process* with covariance kernel itself a periodized Gaussian):
+
+.. code-block:: matlab
+
+  L = 10;             % period
+  kmax = 500;         % bandlimit
+  k = -kmax:kmax-1;   % freq indices (negative up through positive mode ordering)
+  N = 2*kmax;         % # modes
+  rng(0);             % make some convenient Fourier coefficients...
+  fk = randn(N,1)+1i*randn(N,1);       % iid random complex data, column vec
+  k0 = 100;           % a freq scale parameter
+  fk = fk .* exp(-(k/k0).^2).';        % scale the amplitudes, kills high freqs
+
+Now we use a 1D type 2 to evaluate this series at a large number of
+points very quickly:
+
+.. code-block:: matlab
+
+  M = 1e6; x = L*rand(1,M);        % make random target points in [0,L)
+  tol = 1e-12;
+  x_scaled = x * (2*pi/L);         % don't forget to scale to 2pi-periodic!
+  tic; c = finufft1d2(x_scaled,+1,tol,fk); toc   % evaluate Fourier series at x
+
+::
+
+  Elapsed time is 0.026038 seconds.
+
+Compare this to a naive calculation (which serves to remind us exactly what sum FINUFFT approximates):
+
+.. code-block:: matlab
+
+  tic; cn = 0*c; for m=k, cn = cn + fk(m+N/2+1)*exp(1i*m*x_scaled.'); end, toc
+  norm(c-cn,inf)
+
+::
+                
+  Elapsed time is 11.679265 seconds.
+  ans =
+        1.76508266507874e-11
+
+Thus, with only $10^3$ modes, FINUFFT is 500 times faster than naive
+multithreaded summation. (Naive summation with reversed loop order is even worse, taking 29 seconds.) We plot $1\%$ of the resulting values and get the smooth but randomly-sampled graph below:
+
+.. code-block:: matlab
+
+  Mp = 1e4;               % how many pts to plot
+  jplot = 1:Mp;           % indices to plot
+  plot(x(jplot),real(c(jplot)),'b.'); axis tight; xlabel('x'); ylabel('Re f(x)');
+
+.. image:: pics/fser1d.png
+   :width: 90%
+ 
+See the full code `matlab/examples/serieseval1d.m <https://github.com/flatironinstitute/finufft/blob/master/matlab/examples/serieseval1d.m>`_ which also shows
+how to evaluate the same series on a uniform grid via the plain FFT.
+           
+2D Fourier series
+~~~~~~~~~~~~~~~~~
+
+Since we already know how to rescale to periodicity $L$, let's revert to
+the natural period and work in the square $[0,2\pi)^2$. We create a random
+2D Fourier series, which happens to be for a Gaussian random field with
+(doubly periodized) isotropic Matérn kernel of arbitrary power:
+
+.. code-block:: matlab
+
+  kmax = 500;                   % bandlimit per dim
+  k = -kmax:kmax-1;             % freq indices in each dim
+  N1 = 2*kmax; N2 = N1;         % # modes in each dim
+  [k1 k2] = ndgrid(k,k);        % grid of freq indices
+  rng(0); fk =  randn(N1,N2)+1i*randn(N1,N2);         % iid random complex modes
+  k0 = 30;                     % freq scale parameter
+  alpha = 3.7;                 % power; alpha>2 to converge in L^2
+  fk = fk .* ((k1.^2+k2.^2)/k0^2 + 1).^(-alpha/2);    % sqrt of spectral density
+
+We then simply call a 2D type 2 to evaluate this double series at whatever
+target points you like:
+
+.. code-block:: matlab
+
+   M = 1e6; x = 2*pi*rand(1,M); y = 2*pi*rand(1,M);   % random targets in square
+   tol = 1e-9;
+   tic; c = finufft2d2(x,y,+1,tol,fk); toc   % evaluate Fourier series at (x,y)'s
+
+::
+
+  Elapsed time is 0.092743 seconds.
+     
+1 million modes to 1 million points in 92 milliseconds on a laptop is decent.
+We check the math (using a relative error measure) at just one (generic) point:
+
+.. code-block:: matlab
+
+  j = 1;                        % do math check on 1st target...
+  c1 = sum(sum(fk.*exp(1i*(k1*x(j)+k2*y(j)))));
+  abs(c1-c(j)) / norm(c,inf)
+                
+::
+                    
+  ans =
+        2.30520830208365e-10
+            
+Finally we use a colored scatter plot to show the first $10\%$ of the points in the square, and see samples of the underlying random field (reminiscent of WMAP microwave background data):
+
+.. code-block:: matlab
+
+  jplot = 1:1e5;           % indices to plot
+  scatter(x(jplot),y(jplot),1.0,real(c(jplot)),'filled'); axis tight equal
+  xlabel('x'); ylabel('y'); colorbar; title('Re f(x,y)');
+                
+.. image:: pics/fser2d.png
+   :width: 70%
+ 
+See the full code `matlab/examples/serieseval2d.m <https://github.com/flatironinstitute/finufft/blob/master/matlab/examples/serieseval2d.m>`_.
+
+For background on Gaussian random fields, aka, Gaussian processes,
+see, eg, C. E. Rasmussen & C. K. I. Williams, *Gaussian Processes for Machine Learning*, the MIT Press, 2006. http://www.GaussianProcess.org/gpml
diff --git a/docs/trouble.rst b/docs/trouble.rst
new file mode 100644
index 000000000..fdcbd64b0
--- /dev/null
+++ b/docs/trouble.rst
@@ -0,0 +1,102 @@
+.. _trouble:
+
+Troubleshooting
+===============
+
+If you are having issues (segfaults, slowness, "wrong" answers, etc),
+there is a high probability it is something we already know about, so
+please first read all of the advice below in the section relevant
+to your problem:
+math, speed, or segfaults.
+
+
+Mathematical "issues" and advice
+********************************
+
+- When requested tolerance is around $10^{-14}$ or less in double-precision,
+  or $10^{-6}$ or less in single-precision, it
+  will most likely be impossible for FINUFFT (or any other NUFFT library)
+  to achieve this, due to inevitable round-off error.
+  Here, "error" is to be understood relative to the norm of the returned vector
+  of values.
+  This is especially true when there is a large number of modes in
+  any single dimension ($N_1$, $N_2$ or $N_3$), since this empirically
+  scales the round-off error (fortunately, round-off does not appear to scale
+  with the total $N$ or $M$).
+  Such round-off error is analysed and measured in Section 4.2 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_.
+
+- If you request a tolerance that FINUFFT knows it cannot achieve, it will return ``ier=1`` after performing transforms as accurately as it can. However, the status ``ier=0`` does not imply that the requested accuracy *was* achieved, merely that parameters were chosen to give this estimated accuracy, if possible. As our SISC paper shows, for typical situations, relative $\ell_2$ errors match the requested tolerances over a wide range.
+  Users should always check *convergence* (by, for instance, varying ``tol`` and measuring any changes in their results); this is generally true in scientific computing.
+
+- The type 1 and type 2 transforms are adjoints but **not inverses of each other** (unlike in the plain FFT case, where, up to a constant $N$, the adjoint is the inverse). Therefore, if you are not getting the expected answers, please check that you have not made this assumption. In the :ref:`tutorials <tut>` we will add examples showing how to invert the NUFFT; also see `NFFT3 inverse transforms <https://www-user.tu-chemnitz.de/~potts/nfft/infft.php>`_.
+
+
+Speed issues and advice
+***********************
+
+If FINUFFT is slow (eg, less than $10^6$ nonuniform points per second), here is some advice:
+
+- Try printing debug output to see step-by-step progress by FINUFFT.
+  Set ``opts.debug`` to 1 or 2 and look at the timing information.
+
+- Try reducing the number of threads (recall as of v1.2 this is controlled externally to FINUFFT), perhaps down to 1 thread, to make sure you are not having collisions between threads. The latter is possible if large problems are run with a large number of (say more than 30) threads. We added the constant ``MAX_USEFUL_NTHREADS`` in ``include/defs.h`` to catch this case. Another corner case causing slowness is very many repetitions of small problems; see ``test/manysmallprobs`` which exceeds $10^7$ points/sec with one thread via the guru interface, but can get ridiculously slower with many threads; see https://github.com/flatironinstitute/finufft/issues/86
+
+- Try setting a crude tolerance, eg ``tol=1e-3``. How many digits do you actually need? This has a big effect in higher dimensions, since the number of flops scales like $(\log 1/\epsilon)^d$, but not quite as big an effect as this scaling would suggest, because in higher dimensions the flops/RAM ratio is higher.
+
+- If type 3, make sure your choice of points does not have a massive *space-bandwidth product* (ie, product of the volumes of the smallest $d$-dimension axes-aligned cuboids enclosing the nonuniform source and the target points); see Remark 5 of our `SISC paper <https://arxiv.org/abs/1808.06736>`_.
+  In short, if the spreads of $\mathbf{x}_j$ and of $\mathbf{s}_k$ are both big, you may be in trouble.
+  This can lead to enormous fine grids and hence slow FFTs. Set ``opts.debug=1`` to examine the ``nf1``, etc, fine grid sizes being chosen, and the array allocation sizes. If they are huge, consider direct summation, as discussed :ref:`here <need>`.
+  
+- The timing of the first FFTW call is complicated, depending on the FFTW flags (plan mode) used. This is really an
+  `FFTW planner flag usage <http://www.fftw.org/fftw3_doc/Planner-Flags.html#Planner-Flags>`_ question.
+  Such issues are known, and modes benchmarked in other documentation, eg for 2D in `poppy <https://poppy-optics.readthedocs.io/en/stable/fft_optimization.html>`_. In short, using more expensive FFTW planning modes like ``FFTW_MEASURE`` can give better performance for repeated FFTW calls, but be **much** more expensive in the first (planning) call. This is why we choose ``FFTW_ESTIMATE`` as our default ``opts.fftw`` option.
+
+- Make sure you did not override ``opts.spread_sort``, which if set to zero
+  does no sorting, which can give very slow RAM access if the nonuniform points
+  are ordered poorly (eg randomly) in larger 2D or 3D problems.
+
+- Are you calling the simple interface a huge number of times for small problems, but these tasks have something in common (number of modes, or locations of nonuniform points)? If so, try the "many vector" or guru interface, which removes overheads in repeated FFTW plan look-up, and in bin-sorting. They can be 10-100x faster.
+
+
+Crash (segfault) issues and advice
+****************************************
+
+- The most common problem is passing in pointers to the wrong size of object,
+  eg, single vs double precision, or int32 vs int64. Equivalently, make sure you linked against the correct precision version of the library (``-lfinufft`` vs ``-lfinufftf``). Also make sure, if you changed the precision when compiling FINUFFT, that you did at least ``make objclean`` otherwise you'll get mixed precision object files and a guaranteed segfault!
+
+- Currently you cannot link to both precisions at the same time, because
+  the namespaces will collide. Did you do this?
+
+- If you use C++/C/Fortran and tried to change options, did you forget to call ``finufft_default_opts`` first?
+  
+- To isolate where a crash is occurring, set ``opts.debug`` to 1 or 2, and check the text output of the various stages. With a debug setting of 2 or above, when ``ntrans>1`` a large amount of text can be generated.
+  To diagnose problems with the spread/interpolation stage, similarly setting ``opts.spread_debug`` to 1 or 2 will print even more output. Here the setting 2 generates a large amount of output even for a single transform.
+
+
+
+  
+Other known issues with library and interfaces
+**********************************************
+
+The master list is the github issues for the project page,
+https://github.com/flatironinstitute/finufft/issues
+
+A secondary and more speculative list is in the ``TODO`` text file.
+
+Please look through those issue topics, since sometimes workarounds
+are discussed before the problem is fixed in a release.
+
+
+
+Bug reports
+***********
+  
+If you think you have found a new bug, and have read the above, please
+file a new issue on the github project page,
+https://github.com/flatironinstitute/finufft/issues .
+Include a minimal code which reproduces the bug, along with
+details about your machine, operating system, compiler, version of FINUFFT, and output with ``opts.debug`` at least 1.
+If you have a known bug and have ideas, please add to the comments for that issue.
+
+You may also contact Alex Barnett (``abarnett``
+at-sign ``flatironinstitute.org``) with FINUFFT in the subject line.
diff --git a/docs/tut.rst b/docs/tut.rst
new file mode 100644
index 000000000..39434cf3b
--- /dev/null
+++ b/docs/tut.rst
@@ -0,0 +1,21 @@
+.. _tut:
+
+Tutorials and application demos
+==================================
+
+The following are instructive demos of using FINUFFT for a variety of
+spectrally-related tasks in
+numerical computing and signal/image processing. We will slowly grow the
+list (contact us to add one).
+For conciseness of code, and ease of writing, they are currently
+in MATLAB (tested on R2017a or later).
+
+.. toctree::
+
+   serieseval
+   numft
+   peripois2d
+
+For further applications, see :ref:`references <refs>`, and these
+`PDF slides <http://users.flatironinstitute.org/~ahb/notes/wam19.pdf>`_.
+