You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just a tiny text fix. I checked a cmake build on my local RTX-A6000.
But it has a weird timing inconsistency.
Eg, running it by hand:
(base) ahb@ccmlin019 /mnt/home/ahb/numerics/finufft/build> test/cuda/cufinufft1dspreadinterponly_test 1e7 1e8 1e-3 1e-2 f 0.0
spread-only test 1d:
100000000 NU pts spread to 10000000 grid in 0.046 s 2.17e+09 NU pts/s
rel mass err 0.0002
interp-only test 1d:
100000000 NU pts interp from 10000000 grid in 0.341 s 2.93e+08 NU pts/s
rel sup err 0.000497
It is nice to see the 2e9 NU pt/sec for spread, and this matches the expected from 1d1 actual NUFFT. But it is 10x slower for interp? Could you look into why that is?
I notice that 1d1 vs 1d2 are not different (but the setNUpts time is 10x longer than the exec): here I use precision d in order to test accuracy, but the speed is the same for f:
(base) ahb@ccmlin019 /mnt/home/ahb/numerics/finufft/build> test/cuda/cufinufft1d_test 1 1 1e7 1e8 1e-3 1e-2 d 0.0
[time ] dummy warmup call to CUFFT 0.00474 s
[time ] cufinufft plan: 0.0229 s
[time ] cufinufft setNUpts: 0.48 s
[time ] cufinufft exec: 0.0525 s
[time ] cufinufft destroy: 0.000309 s
[Method 1] 10000000 U pts to 100000000 NU pts in 0.556 s: 1.8e+08 NU pts/s
(exec-only thoughput: 1.9e+09 NU pts/s)
[gpu ] one mode: rel err in F[3700000] is 2.69e-05
(base) ahb@ccmlin019 /mnt/home/ahb/numerics/finufft/build> test/cuda/cufinufft1d_test 1 2 1e7 1e8 1e-3 1e-2 d 0.0
[time ] dummy warmup call to CUFFT 0.00385 s
[time ] cufinufft plan: 0.0225 s
[time ] cufinufft setNUpts: 0.48 s
[time ] cufinufft exec: 0.048 s
[time ] cufinufft destroy: 0.000335 s
[Method 1] 10000000 U pts to 100000000 NU pts in 0.551 s: 1.81e+08 NU pts/s
(exec-only thoughput: 2.08e+09 NU pts/s)
[gpu ] one targ: rel err in c[50000000] is 0.000339
These are all using method=1 as in your new tester.
It's very peculiar that setNUpts doesn't show up in the spread-only timing.
Could it be more cuda-event-synchronizations are needed?
Investigations welcome ! Thanks, Alex
But it has a weird timing inconsistency.
Eg, running it by hand:
It is nice to see the 2e9 NU pt/sec for spread, and this matches the expected from 1d1 actual NUFFT. But it is 10x slower for interp? Could you look into why that is?
I notice that 1d1 vs 1d2 are not different (but the setNUpts time is 10x longer than the exec): here I use precision d in order to test accuracy, but the speed is the same for f:
These are all using method=1 as in your new tester.
It's very peculiar that setNUpts doesn't show up in the spread-only timing.
Could it be more cuda-event-synchronizations are needed?
Investigations welcome ! Thanks, Alex
Originally posted by @ahbarnett in #631 (review)
The text was updated successfully, but these errors were encountered: