CUDA is a parallel processing platform model developed by NVIDIA developer for performing general computing operation on their GPU.
- CUDA abbrevation is Compute Unified Device Architecture
- CUDA platform provide a software layer which give access to the GPU's virtual instruction set along with parallel computing element for execution of computer kernel.
- This platform is designed to work with programming language such as C,C++ and Fortan
CUDA 8.0 comes with the following libraries (for compilation & runtime, in alphabetical order):
- cuBLAS – CUDA Basic Linear Algebra Subroutines library
- CUDART – CUDA Runtime library
- cuFFT – CUDA Fast Fourier Transform library
- cuRAND – CUDA Random Number Generation library
- cuSOLVER – CUDA based collection of dense and sparse direct solvers
- cuSPARSE – CUDA Sparse Matrix library
- NPP – NVIDIA Performance Primitives library
- nvGRAPH – NVIDIA Graph Analytics library
- NVML – NVIDIA Management Library
- NVRTC – NVIDIA Runtime Compilation library for CUDA C++
CUDA 8.0 comes with these other software components:
- nView – NVIDIA nView Desktop Management Software
- NVWMI – NVIDIA Enterprise Management Toolkit
- GameWorks PhysX – is a multi-platform game physics engine
CUDA 9.0–9.2 comes with these other components:
- CUTLASS 1.0 – custom linear algebra algorithms,
NVCUVID– NVIDIA Video Decoder was deprecated in CUDA 9.2; it is now available in NVIDIA Video Codec SDK
CUDA 10 comes with these other components:
- nvJPEG – Hybrid (CPU and GPU) JPEG processing
CUDA has several advantages over traditional general-purpose computation on GPUs (GPGPU) using graphics APIs:
- Scattered reads – code can read from arbitrary addresses in memory.
- Unified virtual memory (CUDA 4.0 and above)
- Unified memory (CUDA 6.0 and above)
- Shared memory – CUDA exposes a fast shared memory region that can be shared among threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups.[16]
- Faster downloads and readbacks to and from the GPU
- Full support for integer and bitwise operations, including integer texture lookups
- On RTX 20 and 30 series cards, the CUDA cores are used for a feature called "RTX IO" Which is where the CUDA cores dramatically decrease game-loading times.