Release Fastor V0.3 · romeric/Fastor

This release brings lots fundamental new features, performance improvements and bug fixes to Fastor, in particular

Tensor views provide the ability to index, slice and broadcast multi-dimensional tensors using ranges/sequences/other tensors much like NumPy/MATLAB arrays. See the documentation.
The evaluation mechanism in Fastor so far used static_casting for chaining operations in the corresponding eval functions. This used to generate a lot of unnecessary type conversion code. Starting from V0.3 the eval functions are well-informed leading to faster and much cleaner code and helping the compiler optimise much more.
Support for FMA. The matmul, norm and inner functions and multiple other tensor overloads now use FMA instructions when available.
Support for norm, inner, sum and product functions for any type of expressions.
Bug fix in generic transpose and 2D SP transpose methods.
Code splitting and plugins for cleaner maintainable code base.
Division instructions can safely be dispatched to multiplication while hoisting the reciprocal out of the loop for expressions of type Expr / Scalar.
FASTOR_FAST_MATH and FASTOR_UNSAFE_MATH are introduced. The FASTOR_UNSAFE_MATH flag turns Expr / Scalar expressions to approximate reciprocal and multiplication intrinsics, which can harm the accuracy. FASTOR_FAST_MATH is just a place holder macro activated by default under -Ofast.
Lots of new test cases introduced.
New benchmark problems for views and finite difference introduced.
scalar_type was not correctly implemented for expressions. Now fixed.
Equal rank tensor assignment restriction is now relaxed in order for expressions and views of any rank to be assigned to expressions of a different rank, as long as their size (capacity) is equal.
Many functions are decorated inline and constexpr. This helps the compiler generate very compact code and aggressively eliminate dead code.
Low and high rank tensors can be created using brace initialisers.
Fix the SP/DP bug in matmul.
Introduce the now very recommended -DNDEBUG flags to most Makefiles.
Lots of other minor improvements and bug fixes.

As a final note, while compiling views mixed with other complex expression it is really beneficial to add the inlining flags to the compiler, such as -finline-limit=n for GCC, -mllvm -inline-threshold=n for Clang and -inline-forceinline -inline-factor=n for ICC, and although overlapping assignments are provided for convenience, it helps the compiler a lot in inlining if -DFASTOR_NO_ALIAS is issused. Also for 1D and 2D views -FASTOR_USE_VECTORISED_ASSIGN can cut down runtimes by a factor of 2-4, if compiler is successful at inlining.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastor V0.3