Skip to content

Latest commit

 

History

History
37 lines (22 loc) · 1.18 KB

README.md

File metadata and controls

37 lines (22 loc) · 1.18 KB

Cute-Learning

Welcome to the Cute-Learning repository! This project showcases several example implementations using Cutlass CuTe, a powerful tool for high-performance computing.

Features

This repository includes implementations for:

  • GEMM (General Matrix Multiply)
  • GEMV (General Matrix-Vector Multiply)
  • Flash-Decoding
  • Data Copy
  • LDSM (ldmatrix instruction)
  • Tensor Dequant
  • TODO... (More features to come!)

GEMM

The GEMM implementation is optimized for performance. Below is a performance graph showcasing its efficiency:

GEMM Performance

Refer to the following blog:

LDSM

Refer to the following blog:


We hope you find this repository useful for your learning and development needs. Contributions and feedback are welcome!