Python: Can cufinufft automatically figure out `gpu_device_id`? #420

WardBrian · 2024-02-13T15:33:50Z

Originally reported downstream: flatironinstitute/pytorch-finufft#103

The following will segfault with either a Fatal Python error: aborted or Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)

import numpy as np
import torch
import cufinufft

data = torch.view_as_complex(
    torch.stack((torch.randn(15, 80, 12000), torch.randn(15, 80, 12000)), dim=-1)
)
omega = torch.rand(2, 12000) * 2 * np.pi - np.pi

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
        )

If you change to cuda:0 for both arrays it seems to work fine.

The full error I get is

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  exclusive_scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Fatal Python error: Aborted

Current thread 0x000015555552c4c0 (most recent call first):
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_plan.py", line 236 in setpts
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 38 in _invoke_plan
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 12 in nufft2d1
  File "/mnt/home/bward/finufft/finufft/mwe.py", line 14 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 20)
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

lu1and10 · 2024-02-13T23:05:54Z

Does it also break if you specify the device id explicitly in the kwargs?
e.g.

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
            gpu_device_id=1
        )

WardBrian · 2024-02-14T15:15:06Z

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

lu1and10 · 2024-02-15T00:22:11Z

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

Yes, I guess so. It will be a nice feature that device can be inferred from inputs.

WardBrian changed the title ~~Cuda memory error when on device 1~~ Python: Cuda memory error when on device 1 Feb 13, 2024

WardBrian mentioned this issue Feb 13, 2024

Fatal Python error when executing on second cuda device. flatironinstitute/pytorch-finufft#103

Closed

WardBrian changed the title ~~Python: Cuda memory error when on device 1~~ Python: Can cufinufft automatically figure out gpu_device_id? Feb 14, 2024

WardBrian mentioned this issue Feb 14, 2024

Fix: Automatically set cufinufft's gpu_device_id parameter flatironinstitute/pytorch-finufft#104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Can cufinufft automatically figure out `gpu_device_id`? #420

Python: Can cufinufft automatically figure out `gpu_device_id`? #420

WardBrian commented Feb 13, 2024

lu1and10 commented Feb 13, 2024

WardBrian commented Feb 14, 2024

lu1and10 commented Feb 15, 2024

Python: Can cufinufft automatically figure out gpu_device_id? #420

Python: Can cufinufft automatically figure out gpu_device_id? #420

Comments

WardBrian commented Feb 13, 2024

lu1and10 commented Feb 13, 2024

WardBrian commented Feb 14, 2024

lu1and10 commented Feb 15, 2024

Python: Can cufinufft automatically figure out `gpu_device_id`? #420

Python: Can cufinufft automatically figure out `gpu_device_id`? #420