problem with asynchronous peer to peer copy. #178

ivandrodri · 2020-06-12T09:27:24Z

Hi, I would like to convert my code in CUDA to CUPLA and I have some issues with
mutli-GPU/single-CPU P2P async. copies. This is the error:

error: argument of type "cuplaStream_t" is incompatible with parameter of type "cudaStream_t"

This is a typical P2P copy in my code:

    const int numStreams = 10;
    cudaStream_t streams[numStreams];

    cuplaSetDevice (idGpuI);
    for (size_t i=0; i<numStreams; ++i)
        cudaStreamCreate(&streams[i]);

   for (size_t loc1=0; loc1<grid1Size*grid1Size*grid1Size; ++loc1)
   {
       cudaMemcpyPeerAsync(&(grid0GpuJ[loc1].grid0Size), idGpuJ, &(grid0GpuI[loc1].grid0Size), 
       idGpuI, sizeof(int), streams[loc1%numStreams]);
   }

So how do I write this in CUPLA?

In order to give access to P2P copy, this is what I am doing in CUDA:

inline void enableP2P (int ngpus, std::string info) {
    for( int i = 0; i < ngpus; i++ ) {
         cuplaSetDevice (i);
         for(int j = 0; j < ngpus; j++) {
            if(i == j) continue;
            int peer_access_available = 0;
            cudaDeviceCanAccessPeer(&peer_access_available, i, j);
            if (peer_access_available) {
                cudaDeviceEnablePeerAccess(j, 0);
                if (info=="info")
                    printf("> GPU%d enabled direct access to GPU%d\n",i,j);
                }else {
                if (info=="info")
                    printf("(%d, %d)\n", i, j);
                }
        }
    }
}

It seems in CUPLA cudaDeviceEnablePeerAccess is done automatically and
cudaDeviceCanAccessPeer disappears, so I think the function enableP2P is not necessary anymore, right?

Thanks for any help!

[edited by psychocoderHPC: fixed formation]

sbastrakov · 2020-06-12T09:58:52Z

Hello @ivandrodri . This is a good question. I have personally never tried this, but here are my thoughts after looking at the code just now. I am sure @psychocoderHPC knows this, but he is currently on vacation.

I believe alpaka and so cupla have all memory buffers attached to a device already. For cupla, when you create a buffer with cuplaMalloc, the buffer is attached to the device active at the moment (can be set via cuplaSetDevice).

Then the copy functions like cuplaMemcpyAsync just take two buffers, and internally check whether it's the same device or not and take care of it. For the CUDA backend, this function should just call either cudaMemcpyAsync or cudaMemcpyPeerAsync. I think you are right about automatic cudaDeviceEnablePeerAccess, this should not be needed.

So in your case, I think just creating buffers while the right devices are active, and then using the universal copy functions should work.

Regarding the streams and asynchronous part (which is orthogonal to peer-to-peer), it mirrors CUDA: create streams via cuplaStreamCreate, store them as type cuplaStream_t, provide such a variable as last parameter to cuplaMemcpyAsync.

ivandrodri · 2020-06-15T08:18:26Z

Hello @sbastrakov, thanks a lot for the help! I'll try it and I let you know.

psychocoderHPC · 2021-02-17T09:07:50Z

@ivandrodri Sry for the late response

Do you solve this issue already?
I never tried peer mem copies but alpaka should do the job transparently for you. A simple cuplaMemcpyAsync should be enouph.

sbastrakov added the question label Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with asynchronous peer to peer copy. #178

problem with asynchronous peer to peer copy. #178

ivandrodri commented Jun 12, 2020 •

edited by psychocoderHPC

Loading

sbastrakov commented Jun 12, 2020 •

edited

Loading

ivandrodri commented Jun 15, 2020

psychocoderHPC commented Feb 17, 2021

problem with asynchronous peer to peer copy. #178

problem with asynchronous peer to peer copy. #178

Comments

ivandrodri commented Jun 12, 2020 • edited by psychocoderHPC Loading

sbastrakov commented Jun 12, 2020 • edited Loading

ivandrodri commented Jun 15, 2020

psychocoderHPC commented Feb 17, 2021

ivandrodri commented Jun 12, 2020 •

edited by psychocoderHPC

Loading

sbastrakov commented Jun 12, 2020 •

edited

Loading