Skip to content

Releases: ROCm/rocFFT

rocFFT 1.0.23 for ROCm 5.6.1

29 Aug 20:12
9bd44ae
Compare
Choose a tag to compare

rocFFT code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

rocFFT 1.0.23 for ROCm 5.6.0

28 Jun 23:17
946a75d
Compare
Choose a tag to compare

Added

  • Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
  • Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
  • Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.

Changed

  • Replaced std::complex with hipComplex data types for data generator.
  • FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
  • Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.

Fixed

  • Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.

rocFFT 1.0.22 for ROCm 5.5.1

24 May 19:07
e7d6273
Compare
Choose a tag to compare

rocFFT code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

rocFFT 1.0.22 for ROCm 5.5.0

01 May 21:04
e7d6273
Compare
Choose a tag to compare

Optimizations

  • Improved performance of 1D lengths < 2048 that use Bluestein's algorithm.
  • Reduced time for generating code during plan creation.
  • Optimized 3D R2C/C2R lengths 32, 84, 128.
  • Optimized batched small 1D R2C/C2R cases.

Added

  • Added gfx1101 to default AMDGPU_TARGETS.

Changed

  • Moved client programs to C++17.
  • Moved planar kernels and infrequently used Stockham kernels to be runtime-compiled.
  • Moved transpose, real-complex, Bluestein, and Stockham kernels to library kernel cache.

Fixed

  • Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
  • Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.

rocFFT 1.0.21 for ROCm 5.4.4

22 Mar 20:47
5687cd9
Compare
Choose a tag to compare

rocFFT code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.

rocFFT 1.0.21 for ROCm 5.4.3

07 Feb 17:34
5687cd9
Compare
Choose a tag to compare

Fixed

  • Removed source directory from rocm_install_targets call to prevent installation of rocfft.h in an unintended location.

rocFFT 1.0.20 for ROCm 5.4.2

13 Jan 16:43
9961827
Compare
Choose a tag to compare

rocFFT code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

rocFFT 1.0.20 for ROCm 5.4.1

15 Dec 18:40
9961827
Compare
Choose a tag to compare

Fixed

  • Fixed incorrect results on strided large 1D FFTs where batch size does not equal the stride.

rocFFT 1.0.19 for ROCm 5.4.0

30 Nov 17:38
6005bfa
Compare
Choose a tag to compare

Optimizations

  • Optimized some strided large 1D plans.

Added

  • Added rocfft_plan_description_set_scale_factor API to efficiently multiply each output element of a FFT by a given scaling factor.
  • Created a rocfft_kernel_cache.db file next to the installed library. SBCC kernels are moved to this file when built with the library, and are runtime-compiled for new GPU architectures.
  • Added gfx1100 and gfx1102 to default AMDGPU_TARGETS.

Changed

  • Moved runtime compilation cache to in-memory by default. A default on-disk cache can encounter contention problems
    on multi-node clusters with a shared filesystem. rocFFT can still be told to use an on-disk cache by setting the
    ROCFFT_RTC_CACHE_PATH environment variable.

rocFFT 1.0.18 for ROCm 5.3.3

17 Nov 19:21
11c649a
Compare
Choose a tag to compare

rocFFT code for ROCm 5.3.3 did not change. The library was rebuilt for the updated ROCm 5.3.3 stack.