Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Can cufinufft automatically figure out gpu_device_id? #420

Open
WardBrian opened this issue Feb 13, 2024 · 3 comments
Open

Python: Can cufinufft automatically figure out gpu_device_id? #420

WardBrian opened this issue Feb 13, 2024 · 3 comments

Comments

@WardBrian
Copy link
Contributor

Originally reported downstream: flatironinstitute/pytorch-finufft#103

The following will segfault with either a Fatal Python error: aborted or Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)

import numpy as np
import torch
import cufinufft

data = torch.view_as_complex(
    torch.stack((torch.randn(15, 80, 12000), torch.randn(15, 80, 12000)), dim=-1)
)
omega = torch.rand(2, 12000) * 2 * np.pi - np.pi

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
        )

If you change to cuda:0 for both arrays it seems to work fine.

The full error I get is

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  exclusive_scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Fatal Python error: Aborted

Current thread 0x000015555552c4c0 (most recent call first):
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_plan.py", line 236 in setpts
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 38 in _invoke_plan
  File "/mnt/home/bward/finufft/finufft/python/cufinufft/cufinufft/_simple.py", line 12 in nufft2d1
  File "/mnt/home/bward/finufft/finufft/mwe.py", line 14 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special (total: 20)
Aborted (core dumped)

@WardBrian WardBrian changed the title Cuda memory error when on device 1 Python: Cuda memory error when on device 1 Feb 13, 2024
@lu1and10
Copy link
Member

Does it also break if you specify the device id explicitly in the kwargs?
e.g.

cufinufft.nufft2d1(
            *omega.to("cuda:1"),
            data.reshape(-1, 12000).to("cuda:1"),
            (320,320),
            isign=-1,
            gpu_device_id=1
        )

@WardBrian
Copy link
Contributor Author

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

@WardBrian WardBrian changed the title Python: Cuda memory error when on device 1 Python: Can cufinufft automatically figure out gpu_device_id? Feb 14, 2024
@lu1and10
Copy link
Member

@lu1and10 no, that seems to have fixed it (sorry for not chasing through enough **kwarg doc to find that option).

So this issue can be re-worded as a feature request: can _compat.py pick up a reasonable default for gpu_device_id?

Yes, I guess so. It will be a nice feature that device can be inferred from inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants