Skip to content

a-heintz/pyopencl-matmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

PyOpenCL-MatMul-GEMM

This is an OpenCL Matrix Multiplication (GEMM) implementation using PyOpenCL. The code was written and designed specifically for use on FPGAs, but will run on any CPU, GPU, and FPGA that supports OpenCL. The accompanying notebook output is from running on a Macbook Pro. Performances can be greatly accelerated with the use of GPUs and FPGAs.

The GEMM cases all perform the following operation: C = A B

where A is M x N, B is N x P, making C = M x P.

There are several implementations of GEMM:

  1. GEMM: NDRange Kernel with Local Memory Tiling where M, N, P must all be multiples of BLOCK_SIZE
  2. GEMM_1DREG: NDRange Kernel with Local Memory and 1D Register Tiling where M, N, P must all be multiples of BLOCK_SIZE
  3. GEMM_2DREG: NDRange Kernel with Local Memory and 2D Register Tiling where M, N, P must all be multiples of BLOCK_SIZE
  4. GEMM_IMITATE_PADDING: NDRange Kernel with Local Memory where M, N, P are arbitrarily sized
  5. GEMM_2DREG_IMITATE_PADDING: NDRange Kernel with Local Memory and 2D Register Tiling where M, N, P are arbitrarily sized -- this case computes the GEMM case imitates padding, i.e. as if it were padded, but no explicit padding is necessary. This dramatically makes GEMM faster and allows for arbitrarily sized matrices.

About

OpenCL Matrix Multiplication using PyOpenCL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published