Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Pool #270

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open

Memory Pool #270

wants to merge 47 commits into from

Conversation

kaushikcfd
Copy link

This PR implements a memory pool for ViennaCL's OpenCL backend.

Brief overview of the implementation details:

  • The code for defining the memory pool is mostly pulled from PyOpenCL's memory pool implementation with appropriate licences.
  • Classes that allocate a memory like vector_base, now take in an additional template parameter that determines the handle type of the memory allocated. The handle type can be either viennacl::ocl::mem_handle<cl_mem> or viennacl::ocl::pooled_clmem_handle . This breaks backward compatibility, but I expect there wouldn't be many instances in which a user-facing code would deal with memory handles.
  • The few changes that are needed to be propagated across PETSc can be seen in this commit.
  • The temporaries involved in linalg/vector_operations.hpp are now allocated through a pooled handle.

These allocation calls for the temporaries were substantial when vector operations were called in PETSc. The following table shows the timings(in ms) before and after implementation of the pooled memory handle.

Before After
VecNorm 0.294 0.036
VecMDot 0.741 0.506

Details of the test:

  • VecNorm on vector of length (2^{18}),
  • VecMDot on a vectors of length (2^{18}) involving 30 inner products.
  • Tests run on a machine with Nvidia Titan V.

I have attached the files for the tests, along with their makefiles.

Attachments: timings.tar.gz

…t myself out of the handle<->context fwd decl. issue.
@kaushikcfd
Copy link
Author

cc @inducer

@karlrupp
Copy link
Collaborator

karlrupp commented Jan 2, 2019

Thanks, @kaushikcfd !

While a memory pool is one way to solve the issue you have encountered with PETSc, I'm afraid this PR is way too intrusive (yet backend-specific) for achieving the actual goal of eliminating the impact of temporaries you've seen in PETSc.

Please let me spend some more thoughts on a more concise (yet equally powerful) fix that immediately carries over to the CUDA backend as well.

@kaushikcfd
Copy link
Author

@karlrupp: Thanks for taking a look. This can be extended for CUDA backend as well. PyCUDA and PyOpenCL share the same memory pool implementation and hence we would also need to just add another memory pool allocator class, which would involve minimal changes.

Copy link

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlrupp I agree that we should preserve backend independence. One thing I would like to know is what you think of memory pools in general. You mention a different way of being able to avoid expensive temporary allocation--I would be curious to hear what you have in mind.

@@ -466,8 +502,8 @@ namespace backend


/** @brief Copies data of the provided 'DataType' from 'handle_src' to 'handle_dst' and converts the data if the binary representation of 'DataType' among the memory domains differs. */
template<typename DataType>
void typesafe_memory_copy(mem_handle const & handle_src, mem_handle & handle_dst)
template<typename DataType, typename H = viennacl::ocl::handle<cl_mem> >
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this defaults to the OpenCL handle?

@@ -100,19 +100,19 @@ struct zero_vector : public scalar_vector<NumericT>
*
* @tparam NumericT The floating point type, either 'float' or 'double'
*/
template<class NumericT, typename SizeT /* see forwards.h for default type */, typename DistanceT /* see forwards.h for default type */>
template<class NumericT, typename OCLHandle, typename SizeT /* see forwards.h for default type */, typename DistanceT /* see forwards.h for default type */>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the definition, OCLHandle -> HandleImpl?

@@ -69,6 +69,8 @@
#include "viennacl/meta/enable_if.hpp"
#include "viennacl/version.hpp"

#include "CL/cl.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be avoided?

@@ -86,6 +86,7 @@ inline memory_types default_memory_type(memory_types new_memory_type) { return d
* Instead, this class collects all the necessary conditional compilations.
*
*/
template <typename OCLHandle>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Suggest renaming OCLHandle -> HandleImpl?

Maybe

template <typename OCLHandle =
#if defined(VIENNACL_WITH_OPENCL)
   stuff
#elif defined(VIENNACL_WITH_CUDA)
   stuff
#endif
>
class mem_handle_with_impl
{ ... };

typedef mem_handle_with_impl<> mem_handle;

This might also avoid having to add many of the angle brackets?


private:
typedef uint32_t bin_nr_t;
typedef std::vector<cl_mem> bin_t;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PyOpenCL memory pool was not backend-specific--in fact, it was used identically (same header file!) in PyCUDA. How come cl_mem can't be obtained from the allocator type as in the original pool?

@@ -0,0 +1,137 @@
// Various odds and ends
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. Particularly, we shouldn't add our own error classes (but use VCL's existing ones instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants