Fastor V0.3
This release brings lots fundamental new features, performance improvements and bug fixes to Fastor, in particular
- Tensor views provide the ability to index, slice and broadcast multi-dimensional tensors using ranges/sequences/other tensors much like NumPy/MATLAB arrays. See the documentation.
- The evaluation mechanism in Fastor so far used
static_cast
ing for chaining operations in the correspondingeval
functions. This used to generate a lot of unnecessary type conversion code. Starting fromV0.3
theeval
functions are well-informed leading to faster and much cleaner code and helping the compiler optimise much more. - Support for FMA. The
matmul
,norm
andinner
functions and multiple other tensor overloads now use FMA instructions when available. - Support for
norm
,inner
,sum
andproduct
functions for any type of expressions. - Bug fix in generic transpose and 2D SP transpose methods.
- Code splitting and plugins for cleaner maintainable code base.
- Division instructions can safely be dispatched to multiplication while hoisting the reciprocal out of the loop for expressions of type
Expr / Scalar
. FASTOR_FAST_MATH
andFASTOR_UNSAFE_MATH
are introduced. TheFASTOR_UNSAFE_MATH
flag turnsExpr / Scalar
expressions to approximate reciprocal and multiplication intrinsics, which can harm the accuracy.FASTOR_FAST_MATH
is just a place holder macro activated by default under-Ofast
.- Lots of new test cases introduced.
- New benchmark problems for views and finite difference introduced.
scalar_type
was not correctly implemented for expressions. Now fixed.- Equal rank tensor assignment restriction is now relaxed in order for expressions and views of any rank to be assigned to expressions of a different rank, as long as their size (capacity) is equal.
- Many functions are decorated
inline
andconstexpr
. This helps the compiler generate very compact code and aggressively eliminate dead code. - Low and high rank tensors can be created using brace initialisers.
- Fix the
SP/DP
bug inmatmul
. - Introduce the now very recommended
-DNDEBUG
flags to most Makefiles. - Lots of other minor improvements and bug fixes.
As a final note, while compiling views mixed with other complex expression it is really beneficial to add the inlining flags to the compiler, such as -finline-limit=n
for GCC
, -mllvm -inline-threshold=n
for Clang and -inline-forceinline -inline-factor=n
for ICC, and although overlapping assignments are provided for convenience, it helps the compiler a lot in inlining if -DFASTOR_NO_ALIAS
is issused. Also for 1D and 2D views -FASTOR_USE_VECTORISED_ASSIGN
can cut down runtimes by a factor of 2-4, if compiler is successful at inlining.