Skip to content

Commit

Permalink
Always SFENCE at end of memcpy_nontemporal
Browse files Browse the repository at this point in the history
It might be needed for correct ordering with write-combining memory,
such as used by gdrcopy.
  • Loading branch information
bmerry committed Nov 24, 2020
1 parent 41bbfff commit 5182f7b
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
2 changes: 2 additions & 0 deletions doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Changelog
- Make the ibverbs sender compatible with `PeerDirect`_.
- Add examples programs showing integration with `gdrcopy`_ and
`PeerDirect`_.
- Always use SFENCE at end of :cpp:func:`memcpy_nontemporal` so that it is
appropriate for use with `gdrcopy`_.
- Fix a memory leak when receiving with ibverbs.

.. _gdrcopy: https://github.com/NVIDIA/gdrcopy
Expand Down
9 changes: 8 additions & 1 deletion src/common_memcpy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,14 @@ void *memcpy_nontemporal(void * __restrict__ dest, const void * __restrict__ src
{
if (head >= n)
{
return std::memcpy(dest_c, src_c, n);
std::memcpy(dest_c, src_c, n);
/* Not normally required, but if the destination is
* write-combining memory then this will flush the combining
* buffers. That may be necessary if the memory is actually on
* a GPU or other accelerator.
*/
_mm_sfence();
return dest;
}
std::memcpy(dest_c, src_c, head);
dest_c += head;
Expand Down

0 comments on commit 5182f7b

Please sign in to comment.