Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with asynchronous peer to peer copy. #178

Open
ivandrodri opened this issue Jun 12, 2020 · 3 comments
Open

problem with asynchronous peer to peer copy. #178

ivandrodri opened this issue Jun 12, 2020 · 3 comments
Labels

Comments

@ivandrodri
Copy link

ivandrodri commented Jun 12, 2020

Hi, I would like to convert my code in CUDA to CUPLA and I have some issues with
mutli-GPU/single-CPU P2P async. copies. This is the error:

error: argument of type "cuplaStream_t" is incompatible with parameter of type "cudaStream_t"

This is a typical P2P copy in my code:

    const int numStreams = 10;
    cudaStream_t streams[numStreams];

    cuplaSetDevice (idGpuI);
    for (size_t i=0; i<numStreams; ++i)
        cudaStreamCreate(&streams[i]);

   for (size_t loc1=0; loc1<grid1Size*grid1Size*grid1Size; ++loc1)
   {
       cudaMemcpyPeerAsync(&(grid0GpuJ[loc1].grid0Size), idGpuJ, &(grid0GpuI[loc1].grid0Size), 
       idGpuI, sizeof(int), streams[loc1%numStreams]);
   }

So how do I write this in CUPLA?

In order to give access to P2P copy, this is what I am doing in CUDA:

inline void enableP2P (int ngpus, std::string info) {
    for( int i = 0; i < ngpus; i++ ) {
         cuplaSetDevice (i);
         for(int j = 0; j < ngpus; j++) {
            if(i == j) continue;
            int peer_access_available = 0;
            cudaDeviceCanAccessPeer(&peer_access_available, i, j);
            if (peer_access_available) {
                cudaDeviceEnablePeerAccess(j, 0);
                if (info=="info")
                    printf("> GPU%d enabled direct access to GPU%d\n",i,j);
                }else {
                if (info=="info")
                    printf("(%d, %d)\n", i, j);
                }
        }
    }
}

It seems in CUPLA cudaDeviceEnablePeerAccess is done automatically and
cudaDeviceCanAccessPeer disappears, so I think the function enableP2P is not necessary anymore, right?

Thanks for any help!

[edited by psychocoderHPC: fixed formation]

@sbastrakov
Copy link
Member

sbastrakov commented Jun 12, 2020

Hello @ivandrodri . This is a good question. I have personally never tried this, but here are my thoughts after looking at the code just now. I am sure @psychocoderHPC knows this, but he is currently on vacation.

I believe alpaka and so cupla have all memory buffers attached to a device already. For cupla, when you create a buffer with cuplaMalloc, the buffer is attached to the device active at the moment (can be set via cuplaSetDevice).

Then the copy functions like cuplaMemcpyAsync just take two buffers, and internally check whether it's the same device or not and take care of it. For the CUDA backend, this function should just call either cudaMemcpyAsync or cudaMemcpyPeerAsync. I think you are right about automatic cudaDeviceEnablePeerAccess, this should not be needed.

So in your case, I think just creating buffers while the right devices are active, and then using the universal copy functions should work.

Regarding the streams and asynchronous part (which is orthogonal to peer-to-peer), it mirrors CUDA: create streams via cuplaStreamCreate, store them as type cuplaStream_t, provide such a variable as last parameter to cuplaMemcpyAsync.

@ivandrodri
Copy link
Author

Hello @sbastrakov, thanks a lot for the help! I'll try it and I let you know.

@psychocoderHPC
Copy link
Member

@ivandrodri Sry for the late response

Do you solve this issue already?
I never tried peer mem copies but alpaka should do the job transparently for you. A simple cuplaMemcpyAsync should be enouph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants