You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to give access to P2P copy, this is what I am doing in CUDA:
inlinevoidenableP2P (int ngpus, std::string info) {
for( int i = 0; i < ngpus; i++ ) {
cuplaSetDevice (i);
for(int j = 0; j < ngpus; j++) {
if(i == j) continue;
int peer_access_available = 0;
cudaDeviceCanAccessPeer(&peer_access_available, i, j);
if (peer_access_available) {
cudaDeviceEnablePeerAccess(j, 0);
if (info=="info")
printf("> GPU%d enabled direct access to GPU%d\n",i,j);
}else {
if (info=="info")
printf("(%d, %d)\n", i, j);
}
}
}
}
It seems in CUPLA cudaDeviceEnablePeerAccess is done automatically and
cudaDeviceCanAccessPeer disappears, so I think the function enableP2P is not necessary anymore, right?
Thanks for any help!
[edited by psychocoderHPC: fixed formation]
The text was updated successfully, but these errors were encountered:
Hello @ivandrodri . This is a good question. I have personally never tried this, but here are my thoughts after looking at the code just now. I am sure @psychocoderHPC knows this, but he is currently on vacation.
I believe alpaka and so cupla have all memory buffers attached to a device already. For cupla, when you create a buffer with cuplaMalloc, the buffer is attached to the device active at the moment (can be set via cuplaSetDevice).
Then the copy functions like cuplaMemcpyAsync just take two buffers, and internally check whether it's the same device or not and take care of it. For the CUDA backend, this function should just call either cudaMemcpyAsync or cudaMemcpyPeerAsync. I think you are right about automatic cudaDeviceEnablePeerAccess, this should not be needed.
So in your case, I think just creating buffers while the right devices are active, and then using the universal copy functions should work.
Regarding the streams and asynchronous part (which is orthogonal to peer-to-peer), it mirrors CUDA: create streams via cuplaStreamCreate, store them as type cuplaStream_t, provide such a variable as last parameter to cuplaMemcpyAsync.
Do you solve this issue already?
I never tried peer mem copies but alpaka should do the job transparently for you. A simple cuplaMemcpyAsync should be enouph.
Hi, I would like to convert my code in CUDA to CUPLA and I have some issues with
mutli-GPU/single-CPU P2P async. copies. This is the error:
error: argument of type "cuplaStream_t" is incompatible with parameter of type "cudaStream_t"
This is a typical P2P copy in my code:
So how do I write this in CUPLA?
In order to give access to P2P copy, this is what I am doing in CUDA:
It seems in CUPLA cudaDeviceEnablePeerAccess is done automatically and
cudaDeviceCanAccessPeer disappears, so I think the function enableP2P is not necessary anymore, right?
Thanks for any help!
[edited by psychocoderHPC: fixed formation]
The text was updated successfully, but these errors were encountered: