Releases: sphexa-org/sphexa
Releases · sphexa-org/sphexa
HIP and Spack
Propagator library
Enhancements:
- Separate library with a translation unit for each propagator to reduce compilation times
Fixes:
- Prevent GPU kernel launches with 0 thread blocks which started to be an issue with CUDA 12.6
CUDA 12.5 compatibility
Fixes:
- Full encapsulation of
thrust::device_vector
, because starting from CUDA 12.5 inclusion of its in.cpp
files is no longer possible
Dynamic LET surface refinement and node pruning
New features:
- Refine LET resolution at surface after domain boundary changes
- Prune LET nodes outside focus that exceed the LET resolution on the owning rank
Hierarchical block time steps
New features:
- Hierarchical block time stepping
Ewald summation
New features:
- Ewald summation on CPUs and GPUs for gravitational forces with periodic boundarys
- New smoothing kernel for SPH: S49
Performance enhancements:
- Improve tree refinement for remote LET nodes such that fewer remote nodes are needed to ensure successful gravity traversal. Improves performance due to smaller amount of communication needed
- injectKeys on GPUs. This tree resolution-enforcement mechanism is needed more frequently than previously thought,
hence it made sense to port it to GPU.
Fixes:
- Fix compilation issues with CUDA 12.4 related to
thrust::device_vector
Fields in 32/64 bit precision
Performance enhancements:
- By default: keep coordinates and temperature in double precision, all other hydrodynamics fields in single precision
- Reduce temporary memory allocation in radix sort by reusing scratch buffers
New features:
- reapplySync, repeats domain update for addtional fields after calling domain.sync()
Tree-based neighbor search
Performance enhancements:
- Octree-based warp-aware neighbor searches for better performance and quasi-2D geometry support
- Adaptive target particle groups for gravity traversal based on bounding volumes relative to volumes of local leaf cells. Avoids large traversal stacks.
- Support for multi-level tree merges for faster octree rebalancing. Avoids a rare issue where LET updates couldn't keep up
with changing domain boundaries. (Loss of a peer rank followed by inability to scale back the octree to the global resolution in a single step)
New features:
- Support for particle splitting when initializing from a checkpoint file.
- Support for initialization of rectangular domains at scale for Kelvin-Helmholtz and Wind-shock
- Pure N-body gravity propagator
- Coupled update of neighbor counts and smoothing lengths
Minor fixes and enhancements:
- IAD
tau
determinants with normalization factors for better over/underflow resilience - Correct observable selection handling and settings parameter when writing and restoring from file
- More robust initial domain synchronization that avoids the MPI_Send limit of MPI_INT32 elements per message.
- Modify signalling velocity for larger time-steps
- Added divergence of velocity based minDtRho criterium to time-step control
- Added acceleration-based time-step control
AV cleaning
- Added artificial viscosity cleaning as a feature
- Added interface to GRACKLE for radiative cooling
- Improvements to Domain: perform octree updates and halo discovery on GPUs
- Bugfix: added missing device synchronization points in domain and halo exchange when using GPU_DIRECT=ON