The contents here has been upstreamed to PyTorch in pytorch/pytorch#106211 and further development happens in PyTorch.
The only non-obsolete things in this repository are:
- Documentation issues are relevant and up-to-date, esp those documenting the differences between numpy and pytorch and our numpy-in-pytorch implementation.
- End-to-end examples in the
e2e/
folder demo several worked examples and limitations of our approach.
To test our wrapper, we use two strategies:
- port parts of the numpy test suite
- run several small examples which use NumPy and check that the results are identical to original NumPy.
We only run tests in the eager mode by replacing import numpy as np
by import torch_np as np
.
Examples we run in both eager and JIT modes.
For numpy tests, see torch_np/testing/numpy_tests
folder.
e2e
folder contains examples we run our wrapper on:
- A toy NN from scratch using numpy
- Build a random maze and find a path in it
- Simulate a diffusion/advection process
- Construct and visualize the Mandelbrot fractal
- Inner operation of the k-means clustering
The main observation is that torch.dynamo
unrolls python-level loops. For
iterative algorithms this leads to very long compile times. We therefore
often only compile the inner loop.
The Bellman-Ford algorithm simply does not compile because it contains a
data-dependent loop while point != start
.
We compile the inner loop of the diffusion-advection simulation. While the code compiles, the performance is on par or slightly worse than the original NumPy.
Results strongly depend on an implementation: a straighforward NumPy implementation uses a data-dependent loop, which does not compile.
The implementation based on the Mojo benchmark allows to compile the inner loop. The performance increase relative to numpy is substantial and strongly data size and machine dependent: x8 for smaller inputs and up to x50 for unputs larger than the cache size of the machine.
The internal loop of the k-means algorithm compiles into a straighforward C++ loop and offers up to x30 speedups versus NumPy.
In short, the main changes to examples are:
- With random number generators, our
random
module is a drop-in replacement to NumPy's, but exact streams of random variates is different. Therefore, to preserve exact bit-to-bit identity, one needs to use NumPy'srandom
numbers. - Interaction with matplotlib: for plotting, we need to convert our wrapper ndarrays to PyTorch tensors or original NumPy arrays. In practice, this will be done automatically by TorchDynamo.
We checked that our examples run on both CPU and GPU by setting the PyTorch global state
torch.set_default_device("cuda") # or "cpu"
More specifics for the examples we run:
Origin: https://www.geeksforgeeks.org/implementation-of-neural-network-from-scratch-using-numpy/
Source: e2e/nn_from_scratch
.
- Use the original numpy random stream in both cases to initialize the NN weights
Results with numpy and torch_np are identical modulo different scalar vs 0D array reprs:
`epochs: 100 ======== acc: 98.64865709132037` with NumPy vs
`epochs: 100 ======== acc: array_w(98.6487, dtype=float64)` with torch_np
Origin: N. Rougier, From python to numpy, https://github.com/rougier/from-python-to-numpy/blob/master/code/maze_numpy.py
Source: e2e/maze
Seed the numpy random generator, use the same random stream.
For plotting with matplotlib, convert torch_numpy arrays to numpy via
Z = Z.tensor.numpy()
.
Origin: N. Rougier, From python to numpy, https://github.com/rougier/from-python-to-numpy/blob/master/code/{smoke_solver,smoke_1,smoke_2}.py
Source: e2e/smoke
- fix a bug in bool array minus bool array (fails on numpy 1.24)
- inline np.fromfunction into a call to np.indices
Origin: N. Rougier, From python to numpy, https://github.com/rougier/from-python-to-numpy/blob/master/code/mandelbrot.py https://github.com/rougier/from-python-to-numpy/blob/master/code/mandelbrot_numpy_1.py
source: e2e/mandelbrot.py
- use
mandelbrot_numpy_1.py
version (slightly slower, but no mgrid) - complex abs in float32 runs into pytorch/pytorch#99550