Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Latest commit

 

History

History
48 lines (45 loc) · 1.56 KB

CHANGELOG.md

File metadata and controls

48 lines (45 loc) · 1.56 KB

Change Log for hipBLASLt

(Unreleased) hipBLASLt 0.3.0

Added

  • Add getAllAlgos extension APIs
  • TensileLite support new epilogues: gradient gelu, gradient D, gradient A/B, aux
  • Add sample package including three sample apps
  • Add new C++ GEMM class in hipblaslt extension

Changed

  • refactor GroupGemm APIs as C++ class in hipblaslt extension
  • change scaleD vector enum as HIPBLASLT_MATMUL_DESC_D_SCALE_VECTOR_POINTER

Fixed

  • Enable norm check validation for CI

Optimizations

  • GSU kernel optimization: wider memory, PGR N
  • update logic yaml to improve some FP16 NN sizes
  • GroupGemm support GSU kernel
  • Add grouped gemm tuning for aldebaran

(Unreleased) hipBLASLt 0.2.0

Added

  • Added CI tests for tensilelite
  • Initilized extension group gemm APIs (FP16 only)
  • Added group gemm sample app: example_hipblaslt_groupedgemm

Fixed

  • Fixed ScaleD kernel incorrect results

Optimizations

  • Tuned equality sizes for HHS data type
  • Reduced host side overhead for hipblasLtMatmul()
  • Removed unused kernel arguments
  • Schedule valus setup before first s_waitcnt
  • Refactored tensilelite host codes
  • Optimized building time

hipBLASLt 0.1.0

Added

  • Enable hipBLASLt APIs
  • Support gfx90a
  • Support problem type: fp32, fp16, bf16
  • Support activation: relu, gelu
  • Support bias vector
  • Integreate with tensilelite kernel generator
  • Add Gtest: hipblaslt-test
  • Add full function tool: hipblaslt-bench
  • Add sample app: example_hipblaslt_preference

Optimizations

  • Gridbase solution search algorithm for untuned size
  • Tune 10k sizes for each problem type