Skip to content

Latest commit

 

History

History
69 lines (56 loc) · 8.4 KB

README.md

File metadata and controls

69 lines (56 loc) · 8.4 KB

Standard Library for CUDA Programming

Build Status

This library provides data structures to ease programming in CUDA (version 12 or higher). For a tutorial and further information, please read this manual.

Example

Quick example on how to transfer a std::vector on CPU to a battery::vector on GPU (notice you don't need to do any manual memory allocation or deallocation):

#include <vector>
#include "battery/vector.hpp"
#include "battery/unique_ptr.hpp"
#include "battery/allocator.hpp"

using mvector = battery::vector<int, battery::managed_allocator>;

__global__ void kernel(mvector* v_ptr) {
  mvector& v = *v_ptr;
  // ... Compute on `v` in parallel.
}

int main(int argc, char** argv) {
  std::vector<int> v(10000, 42);
  // Transfer from CPU vector to GPU vector.
  auto gpu_v = battery::make_unique<mvector, battery::managed_allocator>(v);
  kernel<<<256, 256>>>(gpu_v.get());
  CUDAEX(cudaDeviceSynchronize());
  // Transfering the new data to the initial vector.
  for(int i = 0; i < v.size(); ++i) {
    v[i] = (*gpu_v)[i];
  }
  return 0;
}

Common Questions

Quick Reference

  • Namespace: battery::*.
  • The documentation is not exhaustive (which is why we provide a link to the standard C++ STL documentation), but we document most of the main differences and the features without a standard counterpart.
  • The table below is a quick reference to the most useful features, but it is not exhaustive.
  • The structures provided here are not thread-safe, this responsibility is delegated to the user of this library.
Category Main features
Allocator standard_allocator global_allocator managed_allocator pool_allocator
Pointers shared_ptr (std) make_shared (std) allocate_shared (std)
unique_ptr (std) make_unique (std) make_unique_block make_unique_grid
Containers vector (std) string (std) dynamic_bitset
tuple variant (std) bitset (std)
Utility CUDA INLINE CUDAE CUDAEX
limits ru_cast rd_cast
popcount (std) countl_zero (std) countl_one (std) countr_zero (std)
countr_one (std) signum ipow
add_up add_down sub_up sub_down
mul_up mul_down div_up div_down
Memory local_memory read_only_memory atomic_memory
atomic_scoped_memory atomic_memory_block atomic_memory_grid