Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaFftPlanMany.Exec only works with in place transforms #105

Open
TheWhiteAmbit opened this issue Aug 22, 2021 · 3 comments
Open

cudaFftPlanMany.Exec only works with in place transforms #105

TheWhiteAmbit opened this issue Aug 22, 2021 · 3 comments

Comments

@TheWhiteAmbit
Copy link

When calling my cuda plan with only one parameter, I can find a transformed Array on the original position. But whenever I call one of the methods with separate input and output parameters, the resulting array is always filled with just zeros. I have this problem on CudaDeviceVariable als well as with CudaPitchedDeviceVariable (mapped from texture as CudaDirectXInteropResource). Array size should be corrext, using cufftType.R2C with output arrays twice the size of input arrays.

   `public void Exec(CUdeviceptr iodata);                                                                //working
    public void Exec(CUdeviceptr iodata, TransformDirection direction);                                  //working
    public void Exec(CUdeviceptr idata, CUdeviceptr odata, TransformDirection direction);           //not working
    public void Exec(CUdeviceptr idata, CUdeviceptr odata);                                         //not working`
@TheWhiteAmbit
Copy link
Author

This does not occur when using cufftType.C2C, maybe i can format just the input data different as a workaround, but this still seems to be a problem for cufftType.R2C

@kunzmi
Copy link
Owner

kunzmi commented Aug 22, 2021

Can you provide a minimal example showing the problem? Is it a 1D, 2D or 3D transform? Are the input data padded?

Because you mention DirectX, I assume you are using 2D transforms, are you sure the array sizes are correct, given that "output arrays twice the size of input arrays" is not correct: For 2D R2C transforms, the output array must be of size (width / 2 + 1) x height and of datatype cuFloatComplex, float2 or twice the size of floats.

@TheWhiteAmbit
Copy link
Author

TheWhiteAmbit commented Aug 29, 2021

First, thank you for the great work! I made a workaorund with C2C and don't have the original code anymore. I use a 1D transform with plan many and stride on a Texture2D and it works like charm now :) So this is my redone sample - it should work not to work :) hope I did not miss any edits:

`

     void CudaFFTPlanManyOnMappedResource(Texture1D inputTexture, Texture2D outputTexture, uint startIndexOfset = 0)
     {
        try
        {
            if (cudaContext == null)
                cudaContext = new CudaContext();
            cudaContext.SetCurrent();

            //int elementCount = inputTexture.Description.Width * inputTexture.Description.Height;
            //float[] floatArrayInput = new float[elementCount];
            //for (int i = 0; i < elementCount; i++) {
            //    floatArrayInput[i] = rand.Next(0, 65535);
            //}

            //float[] floatArrayOutput = new float[elementCount * 2];
            //CudaDeviceVariable<float> cudaDeviceInput = new CudaDeviceVariable<float>(elementCount);
            //cudaDeviceInput.CopyToDevice(floatArrayInput);
            //CudaDeviceVariable<float> cudaDeviceOutput = new CudaDeviceVariable<float>(elementCount * 2);            

            CudaPitchedDeviceVariable<float> cudaPitchedDeviceInput = new CudaPitchedDeviceVariable<float>(inputTexture.Description.Width, inputTexture.Description.Height);
            CudaPitchedDeviceVariable<ManagedCuda.VectorTypes.float2> cudaPitchedDeviceOutput = new CudaPitchedDeviceVariable<ManagedCuda.VectorTypes.float2>(outputTexture.Description.Width, outputTexture.Description.Height);

            using (CudaDirectXInteropResource resourceInput = new CudaDirectXInteropResource(inputTexture.NativePointer, CUGraphicsRegisterFlags.None, CudaContext.DirectXVersion.D3D11, CUGraphicsMapResourceFlags.None))
            {
                resourceInput.Map();
                using (var dataInput = resourceInput.GetMappedArray2D(startIndexOfset, 0))
                {
                    dataInput.CopyFromThisToDevice(cudaPitchedDeviceInput);
                }
                resourceInput.UnMap();
            }

            if (cudaFftPlanMany == null)
            {
                if (cudaFftPlanMany != null)
                {
                    cudaFftPlanMany.Dispose();
                }
                var cudaFftPlanManyWidth = inputTexture.Description.Width;
                var cudaFftPlanSizeHeight = inputTexture.Description.Height;
                var cudaFftPlanManyWidth = outputTexture.Description.Width;
                var cudaFftPlanSizeHeight = outputTexture.Description.Height;
                
                int[] inembed = { 0 };
                int istride = 1;                  
                int idist = cudaPitchedDeviceInput.Pitch / cudaPitchedDeviceInput.TypeSize;  ;
                int[] onembed = { 0 };
                int ostride = 1;
                int odist = cudaPitchedDeviceOutput.Pitch / cudaPitchedDeviceOutput.TypeSize;

                cudaFftPlanMany = new CudaFFTPlanMany(1, new int[] { cudaFftPlanSizeHeight }, cudaFftPlanManyWidth, cufftType.R2C, inembed, istride, idist, onembed, ostride, odist);
            }

            cudaFftPlanMany.Exec(cudaPitchedDeviceInput.DevicePointer, cudaPitchedDeviceOutput.DevicePointer, TransformDirection.Forward);
            cudaContext.Synchronize();

            using (CudaDirectXInteropResource resourceOutput = new CudaDirectXInteropResource(outputTexture.NativePointer, CUGraphicsRegisterFlags.None, CudaContext.DirectXVersion.D3D11, CUGraphicsMapResourceFlags.None))
            {
                resourceOutput.Map();
                using (var dataOutput = resourceOutput.GetMappedArray2D(0, 0))
                {
                    dataOutput.CopyFromDeviceToThis(cudaPitchedDeviceOutput);
                }
                resourceOutput.UnMap();
            }

            //cudaDeviceOutput.CopyToHost(floatArrayOutput);

            cudaPitchedDeviceInput.Dispose();
            cudaPitchedDeviceOutput.Dispose();

            //cudaDeviceInput.Dispose();
            //cudaDeviceOutput.Dispose();
        }
        catch (ManagedCuda.CudaException)
        {
        }
    }`

It does not work on neither Texture2D or the commented out CudaDeviceVariable buffers, the result is always all zero. Having changed that to cufftType.C2C with corresponding Textureformat R32G32_Float (from R32_Float ) and ManagedCuda.VectorTypes.float2 mappings now it's working. So I assume the array sizes are correct, I made no changes to the output buffers from my working code, input buffers of course half the output size for cufftType.R2C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants