Skip to content

Releases: sphexa-org/sphexa

HIP and Spack

13 Dec 08:58
Compare
Choose a tag to compare
  • HIP compatibility without requiring the source code to be hipified
  • CMake changes to allow easier integration with Spack

Propagator library

21 Nov 14:56
f56fc7c
Compare
Choose a tag to compare

Enhancements:

  • Separate library with a translation unit for each propagator to reduce compilation times

Fixes:

  • Prevent GPU kernel launches with 0 thread blocks which started to be an issue with CUDA 12.6

CUDA 12.5 compatibility

21 Nov 13:37
Compare
Choose a tag to compare

Fixes:

  • Full encapsulation of thrust::device_vector, because starting from CUDA 12.5 inclusion of its in .cpp files is no longer possible

Dynamic LET surface refinement and node pruning

21 Nov 13:34
Compare
Choose a tag to compare

New features:

  • Refine LET resolution at surface after domain boundary changes
  • Prune LET nodes outside focus that exceed the LET resolution on the owning rank

Hierarchical block time steps

20 Nov 17:33
Compare
Choose a tag to compare

New features:

  • Hierarchical block time stepping

Ewald summation

20 Nov 17:17
61a9674
Compare
Choose a tag to compare

New features:

  • Ewald summation on CPUs and GPUs for gravitational forces with periodic boundarys
  • New smoothing kernel for SPH: S49

Performance enhancements:

  • Improve tree refinement for remote LET nodes such that fewer remote nodes are needed to ensure successful gravity traversal. Improves performance due to smaller amount of communication needed
  • injectKeys on GPUs. This tree resolution-enforcement mechanism is needed more frequently than previously thought,
    hence it made sense to port it to GPU.

Fixes:

  • Fix compilation issues with CUDA 12.4 related to thrust::device_vector

Fields in 32/64 bit precision

20 Nov 17:11
5975c17
Compare
Choose a tag to compare

Performance enhancements:

  • By default: keep coordinates and temperature in double precision, all other hydrodynamics fields in single precision
  • Reduce temporary memory allocation in radix sort by reusing scratch buffers

New features:

  • reapplySync, repeats domain update for addtional fields after calling domain.sync()

Tree-based neighbor search

12 Jun 12:24
Compare
Choose a tag to compare

Performance enhancements:

  • Octree-based warp-aware neighbor searches for better performance and quasi-2D geometry support
  • Adaptive target particle groups for gravity traversal based on bounding volumes relative to volumes of local leaf cells. Avoids large traversal stacks.
  • Support for multi-level tree merges for faster octree rebalancing. Avoids a rare issue where LET updates couldn't keep up
    with changing domain boundaries. (Loss of a peer rank followed by inability to scale back the octree to the global resolution in a single step)

New features:

  • Support for particle splitting when initializing from a checkpoint file.
  • Support for initialization of rectangular domains at scale for Kelvin-Helmholtz and Wind-shock
  • Pure N-body gravity propagator
  • Coupled update of neighbor counts and smoothing lengths

Minor fixes and enhancements:

  • IAD tau determinants with normalization factors for better over/underflow resilience
  • Correct observable selection handling and settings parameter when writing and restoring from file
  • More robust initial domain synchronization that avoids the MPI_Send limit of MPI_INT32 elements per message.
  • Modify signalling velocity for larger time-steps
  • Added divergence of velocity based minDtRho criterium to time-step control
  • Added acceleration-based time-step control

AV cleaning

19 Jan 14:35
4a587ec
Compare
Choose a tag to compare
  • Added artificial viscosity cleaning as a feature
  • Added interface to GRACKLE for radiative cooling
  • Improvements to Domain: perform octree updates and halo discovery on GPUs
  • Bugfix: added missing device synchronization points in domain and halo exchange when using GPU_DIRECT=ON

v0.6

22 Sep 13:57
cb4aa8d
Compare
Choose a tag to compare
  • Volume elements are now the default type of SPH, implemented on the GPU
  • Support for large-scale gravity through Ryoanji
  • HIP-support
  • GPU-direct halo exchange
  • Expanded test case selection
  • Turbulence stirring