This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Change Log for hipBLASLt

(Unreleased) hipBLASLt 0.3.0

Added

Add getAllAlgos extension APIs
TensileLite support new epilogues: gradient gelu, gradient D, gradient A/B, aux
Add sample package including three sample apps
Add new C++ GEMM class in hipblaslt extension

Changed

refactor GroupGemm APIs as C++ class in hipblaslt extension
change scaleD vector enum as HIPBLASLT_MATMUL_DESC_D_SCALE_VECTOR_POINTER

Fixed

Enable norm check validation for CI

Optimizations

GSU kernel optimization: wider memory, PGR N
update logic yaml to improve some FP16 NN sizes
GroupGemm support GSU kernel
Add grouped gemm tuning for aldebaran

(Unreleased) hipBLASLt 0.2.0

Added

Added CI tests for tensilelite
Initilized extension group gemm APIs (FP16 only)
Added group gemm sample app: example_hipblaslt_groupedgemm

Fixed

Fixed ScaleD kernel incorrect results

Optimizations

Tuned equality sizes for HHS data type
Reduced host side overhead for hipblasLtMatmul()
Removed unused kernel arguments
Schedule valus setup before first s_waitcnt
Refactored tensilelite host codes
Optimized building time

hipBLASLt 0.1.0

Added

Enable hipBLASLt APIs
Support gfx90a
Support problem type: fp32, fp16, bf16
Support activation: relu, gelu
Support bias vector
Integreate with tensilelite kernel generator
Add Gtest: hipblaslt-test
Add full function tool: hipblaslt-bench
Add sample app: example_hipblaslt_preference

Optimizations

Gridbase solution search algorithm for untuned size
Tune 10k sizes for each problem type