Skip to content

Experiments with C++ and CUDA matrix multiplication algorithms.

Notifications You must be signed in to change notification settings

sidyakinian/parallel-matmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Matmul

Description

This repo contains 5 matrix multiplication algorithms in C++ and CUDA:

  1. Baseline single-threaded matrix multiplication.
  2. Tiled single-threaded matrix multiplication.
  3. Multithreaded matrix multiplication.
  4. CUDA kernel for matrix multiplication.
  5. CUDA kernel for tiled matrix multiplication.

The purpose of this repo is to compare their implementation and performance.

Performance comparison

On input size 1024, the algorithms take the following time to execute in seconds (as measured on AWS EC2 g4dn instance):

matmul_speeds

Clearly multithreaded offers an order-of-magnitude better performance, and tiling offers a 20-30% optimization as well due to spatial locality.

About

Experiments with C++ and CUDA matrix multiplication algorithms.

Resources

Stars

Watchers

Forks