V3.58: modular and saturating lns arithmetic (#300)

* bumping SEMVER to v3.58 * adding lns performance baseline * adding dynamic range API to LNS * adding special values constexpr constructor to lns * Fix posit8_str C API declaration/definition mismatch Fixes a warning (gcc12): c_api/pure_c/posit/posit8.c:43:23: warning: argument 1 of type ‘char *’ declared as a pointer [-Warray-parameter=] 43 | void posit8_str(char* str, posit8_t a) include/universal/number/posit/posit_c_macros.h:90:29: note: previously declared as an array ‘char[static 16]’ 90 | void POSIT_MKNAME(str)(char out[static POSIT_MKNAME(str_SIZE)], POSIT_T p); * Fix missing <cstring> include for paranoia.c strrchr The missing #include breaks the build (on gcc12) * Simplify clear() *this = {} copy-assign a default constructed object * Reduce repetitive array code with variadics The blocksignificant(raw,radixPoint) constructor, and the significant_ull() member function were using constexpr-if chains to initialize or read from array elements. Variadic template expansion reduces the code size and the chances of cut-n-paste errors between the different-size code blocks. It does however increase the logic complexity (and compile time). Using std::index_sequence<I...> to deduce I.. requires two stages; a variadic helper constructor is added and delegated to while the significant_ull member function does a recursive bootstrap. * Apply noexcept consistently Universal has a lot of noexcept spec functions, and that seems the right decision for a provider of low level types. The noexcept usage in this blocksignificant file was inconsistent, so I made it more consistent by adding noexcept in most places, apart from on the division functions in case divide by zero is defined as throwing in future. * Remove inline specifier from member functions defined in class They're implicitly inline when defined in-line. Note that the inline spec is left on the free function defined at the end of the file (twosComplementFree) because it is needed there. * Add more constexpr, for consistency * Make more friends, make them public and hidden There were some private friend declarations at the end of the class definition, then the definitions were free function templates. I removed the private label, making them public, and brought the definitions inline, also bringing in other free operators that were not declared inline. * Add missing template disambiguator Stops GCC 12 warnings * Silence type-punning warning GCC warning: dereferencing type-punned pointer will break strict-aliasing rules on e.g. uint32_t f(float f) { return *(uint32_t*)&f; } Silenced by using a union instead. * Add initializers to silence uninitialized warnings These are all due to type_tag(x) implementations that access their argument x. Ironically, it looks like these were modified at some point to access their argument in order to silence warnings about 'unused formal parameter'; a fix for that should be to remove the formal parameter name (and its superfluous use) - I'll do a PR. * Remove type_tag formal arg name, and its use The type_tag(x) overload implementations deduce their argument type: template <typename X> string type_tag(X x); or template <typename X> string type_tag(X const& x); The formal argument x shouldn't be used; only its type X is pertinent. The declarations should drop the name: template <typename X> string type_tag(X); An alternative signature then takes no argument; convenient when there is no variable to use so it saves declaring one uneccesarily: template <typename X> string type_tag(); Here the type cannot be deduced so must be provided: type_tag<X>(). The two overloads can be combined into a one with a default argument: template <typename X> string type_tag(X = {}); (assuming that X is default constructible). Note that type_tag should really be a constexpr function (or a variable template, or a type trait) because it deals only with types. However, relying on C++ runtime typeinfo disallows any compile-time definition, as does returning std::string (and implementing with stringstream). If the signature is changed to return a 'constexpr string' type then I can PR a constexpr type_tag implementation. * adding MSC guard to specialize posit8_str for vc++ * Comment out unused retval variable Looks like it should've been commented out in recent change "cleaning up blocktriple regression tests to reduce output" which commented out retval's useage. Commenting out rather than removing because there's a TODO. (Third time I've prepared this change, it keeps getting lost.) (I think this is the last of the non _block related warnings.) * completing classify regression test * gcc fix for missing fpclassify declaration * WIP: debugging fmod behavior: it appears to use double precision math * bit_cast polyfill, detects and uses std, builtin or non-builtin C++20 introduced std::bit_cast, with clang, gcc and msvc implementing it with a builtin, __builtin_bit_cast(T,v), portable between compilers and available in < c++20 language modes. If compiler-provided constexpr-capable bit_cast is detected it's used, otherwise this header defines a non-constexpr bit_cast, useful as a UB-free alternative to 'type-punning', for trivially copyable types. The BIT_CAST_CONSTEXPR preprocessor symbol is defined as constexpr if sw::bit_cast is constexpr, otherwise it's defined as empty, allowing clients to propagate constexpr on functions that depend on bit_cast, and BIT_CAST_IS_CONSTEXPR symbol is defined as true or false. The earlier BIT_CAST_SUPPORT, and related CONSTEXRESSION, symbols are now redundant and can be deprecated. * removing unused files * adding dev containers for different gcc compilers * adding MSVC compiler flag to enable latest C++20 features * redesigning the lerp function include * WIP: creating a common arithmetic behavior configuration enum * adding negation to lns and streamlining arithmetic type testing * forgot to make lns negation operator-() constexpr * bug fix in tempered logt and expt functions and test * bumping SEMVER to v3.59 and moving dev container to gcc11 * fixing identifying version environment vars for dev containers * g++ exclusion of riscv, which is a specialized g++ * temporary removal of 3D hypotenuse math functions for g++ * removing conversion experiment test from posit regression suite * code hygiene for quire test * adding VerifyHypot test to posit mathlib regression suite * adding VerifyHypot test to cfloat mathlib regression suite * setting up lns arithmetic test suites for dev testing * updating README with new LNCS proceedings and latest cmake install guidance * adding Wilkinson's polynomial info * WIP: adding mathlib stubs to lns * adding protection to call out incorrect parameterization of lns * WIP: adding arithmetic behavior parameters to LNS * compilation fixes required for API change of fixpnt * compilation fix for hypotl function for lns * compilation fixes for gcc * warnings removal for gcc * rearranging assignment regression test for posits to the conversion set * WIP: adding a conversion test to lns * adding LNS tables to documentation * WIP: lns saturation arithmetic * lns conversion from IEEE-754 using saturating arithmetic * WIP: saturating lns multiplication * saturating lns multiplication * saturating lns addition (marshalling through double) * saturating lns subtraction (marshalling through double) * ready to use generic or specialized regression test * saturating lns division (native implementation) * compilation fix for areal regression tests * clang compilation fix * first attempt to try to recover from a bad merge Co-authored-by: Will Wray <[email protected]>
stillwater-sc · Jul 30, 2022 · 584e380 · 584e380
1 parent f2b3345
commit 584e380
Show file tree

Hide file tree

Showing 132 changed files with 15,110 additions and 2,115 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -33,7 +33,7 @@ if(NOT DEFINED UNIVERSAL_VERSION_MAJOR)
   set(UNIVERSAL_VERSION_MAJOR 3)
 endif()
 if(NOT DEFINED UNIVERSAL_VERSION_MINOR)
-  set(UNIVERSAL_VERSION_MINOR 58)
+  set(UNIVERSAL_VERSION_MINOR 59)
 endif()
 if(NOT DEFINED UNIVERSAL_VERSION_PATCH)
   set(UNIVERSAL_VERSION_PATCH 1)

diff --git a/README.md b/README.md
@@ -55,6 +55,15 @@ The library contains integers, decimals, fixed-points, rationals, linear floats,
 Please cite [our work](https://arxiv.org/abs/2012.11011) if you use _Universal_.
 
 ```bib
+@inproceedings{Omtzigt:2022,
+  title={Universal: Reliable, Reproducible, and Energy-Efficient Numerics},
+  author={E. Theodore L. Omtzigt and James Quinlan},
+  booktitle={Conference on Next Generation Arithmetic},
+  pages={100--116},
+  year={2022},
+  organization={Springer}
+}
+
 @article{Omtzigt2020,
     author    = {E. Theodore L. Omtzigt and Peter Gottschling and Mark Seligman and William Zorn},
     title     = {{Universal Numbers Library}: design and implementation of a high-performance reproducible number systems library},
@@ -78,14 +87,14 @@ CTestTestfile.cmake  c_api         education              tools       universal-
 
 ## How to build
 
-If you do want to work with the code, the universal numbers software library is built using cmake version v3.18. 
+If you do want to work with the code, the universal numbers software library is built using cmake version v3.23. 
 Install the latest [cmake](https://cmake.org/download).
 There are interactive installers for MacOS and Windows. 
 For Linux, a portable approach downloads the shell archive and installs it at /usr/local:
 
 ```text
-> wget https://github.com/Kitware/CMake/releases/download/v3.18.2/cmake-3.18.2-Linux-x86_64.sh 
-> sudo sh cmake-3.18.2-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir
+> wget https://github.com/Kitware/CMake/releases/download/v3.23.1/cmake-3.23.1-Linux-x86_64.sh 
+> sudo sh cmake-3.23.1-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir
 ```
 
 For Ubuntu, snap will install the latest cmake, and would be the preferred method:

diff --git a/applications/adaptive/double-double.cpp b/applications/adaptive/double-double.cpp
@@ -45,7 +45,7 @@ try {
 	std::streamsize precision = std::cout.precision();
 
 	{
-		using lns = lns<16, 10, std::uint16_t>;
+		using lns = lns<16, 10, Saturating, std::uint16_t>;
 
 		lns a, b, c;
 		a = 0.5;
@@ -57,6 +57,7 @@ try {
 	std::cout << std::setprecision(precision);
 	std::cout << std::endl;
 
+	ReportTestSuiteResults(test_suite, nrOfFailedTestCases);
 	return EXIT_SUCCESS;
 }
 catch (char const* msg) {

diff --git a/applications/chebyshev/chebpoly.hpp b/applications/chebyshev/chebpoly.hpp
@@ -23,7 +23,7 @@ namespace chebyshev {
 
 		blas::vector<Scalar>Tn(n+1);
 		if (n==0) Tn(0) = 1;
-        if (n==1) Tn(0) = 0;Tn(1) = 1;
+        if (n==1) { Tn(0) = 0; Tn(1) = 1; }
         if (n>1){
             blas::vector<Scalar> T0(n+1);
             blas::vector<Scalar> T1(n+1);

diff --git a/applications/chebyshev/chebtests.cpp b/applications/chebyshev/chebtests.cpp
@@ -59,7 +59,7 @@ try {
 		std::cout << "\nUsing POSIT<" << nbits << "," <<  es << ">\n\n";
 	#else	  
 		using Scalar = double;
-		std::cout << "\nUsing DOUBLE " << "\n\n";
+		std::cout << "\nUsing DOUBLE " << Scalar(0.0) << "\n\n";
 	#endif
 
 	// TESTS

diff --git a/applications/numeric/limits.cpp b/applications/numeric/limits.cpp
@@ -28,7 +28,7 @@ try {
 	using fixpnt32 = fixpnt<32, 16, Modulo, std::uint32_t>;
 	using posit32  = posit<32, 2>;
 	using areal32  = areal<32, 8, std::uint32_t>;
-	using lns32    = lns<32, 8, std::uint32_t>;
+	using lns32    = lns<32, 8, Saturating, std::uint32_t>;
 
 	// report on precision and dynamic range of the number system
 

diff --git a/applications/numeric/numbers.cpp b/applications/numeric/numbers.cpp
@@ -129,7 +129,7 @@ try {
 	using cfloat32 = cfloat<32, 8, std::uint32_t, true, true, false>;
 	using posit32 = posit<32, 2>;
 	using areal32 = areal<32, 8, std::uint32_t>;
-	using lns32 = lns<32, 8, std::uint32_t>;
+	using lns32 = lns<32, 8, Saturating, std::uint32_t>;
 
 	// report on precision and dynamic range of the number system
 

diff --git a/applications/stream/stream.cpp b/applications/stream/stream.cpp
@@ -210,7 +210,7 @@ try {
 	Sweep< float >(startSample, endSample);
 	Sweep< double >(startSample, endSample);
 	Sweep< fixpnt<8, 4, Modulo, std::uint8_t> >(startSample, endSample);
-	Sweep< fixpnt<8, 4, Saturating, std::uint8_t> >(startSample, endSample);
+	Sweep< fixpnt<8, 4, Saturate, std::uint8_t> >(startSample, endSample);
 	Sweep< cfloat<32, 8, std::uint32_t, true, false, false> >(startSample, endSample);
 #endif
 

diff --git a/benchmark/performance/arithmetic/fixpnt/performance.cpp b/benchmark/performance/arithmetic/fixpnt/performance.cpp
@@ -1,6 +1,6 @@
-//  performance.cpp : performance benchmarking for abitrary precision fixpnts
+//  performance.cpp : performance benchmarking for fixed-sized, abitrary precision fixpnts
 //
-// Copyright (C) 2017-2021 Stillwater Supercomputing, Inc.
+// Copyright (C) 2017-2022 Stillwater Supercomputing, Inc.
 //
 // This file is part of the universal numbers project, which is released under an MIT Open Source license.
 #include <iostream>
@@ -28,52 +28,52 @@ void TestShiftOperatorPerformance() {
 
 	constexpr uint64_t NR_OPS = 1000000;
 
-	PerformanceRunner("fixpnt<8,4,    Saturating, uint8_t>  shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<8, 4, Saturating, uint8_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<16,8,   Saturating, uint16_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<16, 8, Saturating, uint16_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<32,16,  Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<32, 16, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<64,32,  Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<64, 32, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<128,32, Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<128, 32, Saturating, uint32_t> >, NR_OPS / 2);
-	PerformanceRunner("fixpnt<256,32, Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<256, 32, Saturating, uint32_t> >, NR_OPS / 4);
-	PerformanceRunner("fixpnt<512,32, Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<512, 32, Saturating, uint32_t> >, NR_OPS / 8);
-	PerformanceRunner("fixpnt<1024,32,Saturating, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<1024, 32, Saturating, uint32_t> >, NR_OPS / 16);
+	PerformanceRunner("fixpnt<   8,  4, Saturate, uint8_t>  shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<   8,  4, Saturate, uint8_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<  16,  8, Saturate, uint16_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<  16,  8, Saturate, uint16_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<  32, 16, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<  32, 16, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<  64, 32, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<  64, 32, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 128, 32, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt< 128, 32, Saturate, uint32_t> >, NR_OPS / 2);
+	PerformanceRunner("fixpnt< 256, 32, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt< 256, 32, Saturate, uint32_t> >, NR_OPS / 4);
+	PerformanceRunner("fixpnt< 512, 32, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt< 512, 32, Saturate, uint32_t> >, NR_OPS / 8);
+	PerformanceRunner("fixpnt<1024, 32, Saturate, uint32_t> shifts         ", ShiftPerformanceWorkload< sw::universal::fixpnt<1024, 32, Saturate, uint32_t> >, NR_OPS / 16);
 }
 
 
 // measure performance of arithmetic operations
 void TestArithmeticOperatorPerformance() {
 	using namespace sw::universal;
-	std::cout << "\nFIXPNT Fixed-Point Saturating Arithmetic operator performance\n";
+	std::cout << "\nFIXPNT Fixed-Point Saturate Arithmetic operator performance\n";
 
 	uint64_t NR_OPS = 1000000;
-	PerformanceRunner("fixpnt<8,4, Saturating,uint8_t>      add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<8,4, Saturating, uint8_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<16,8, Saturating,uint16_t>    add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<16,8, Saturating, uint16_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<32,16, Saturating,uint32_t>   add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<32,16, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<64,32, Saturating,uint32_t>   add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<64,32, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<128,32, Saturating,uint32_t>  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<128,32, Saturating, uint32_t> >, NR_OPS / 2);
+	PerformanceRunner("fixpnt<  8,  4, Saturate, uint8_t >  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<  8,  4, Saturate, uint8_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 16,  8, Saturate, uint16_t>  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt< 16,  8, Saturate, uint16_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 32, 16, Saturate, uint32_t>  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt< 32, 16, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 64, 32, Saturate, uint32_t>  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt< 64, 32, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<128, 32, Saturate, uint32_t>  add/subtract    ", AdditionSubtractionWorkload< sw::universal::fixpnt<128, 32, Saturate, uint32_t> >, NR_OPS / 2);
 
 #if 0
 	NR_OPS = 1024 * 32;
-	PerformanceRunner("fixpnt<8,4,    Saturating,uint8_t>   division        ", DivisionWorkload< sw::universal::fixpnt<8,4, Saturating, uint8_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<16,8,   Saturating,uint16_t>  division        ", DivisionWorkload< sw::universal::fixpnt<16,8, Saturating, uint16_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<32,16,  Saturating,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt<32,16, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<64,32,  Saturating,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt<64,32, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<128,32, Saturating,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt<128,32, Saturating, uint32_t> >, NR_OPS / 2);
+	PerformanceRunner("fixpnt<  8,  4, Saturate,uint8_t >  division        ", DivisionWorkload< sw::universal::fixpnt<  8,  4, Saturate, uint8_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 16,  8, Saturate,uint16_t>  division        ", DivisionWorkload< sw::universal::fixpnt< 16,  8, Saturate, uint16_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 32, 16, Saturate,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt< 32, 16, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 64, 32, Saturate,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt< 64, 32, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<128, 32, Saturate,uint32_t>  division        ", DivisionWorkload< sw::universal::fixpnt<128, 32, Saturate, uint32_t> >, NR_OPS / 2);
 
 	NR_OPS = 1024 * 32;
-	PerformanceRunner("fixpnt<8,4,    Saturating,uint8_t>   remainder       ", RemainderWorkload< sw::universal::fixpnt<8,4, Saturating, uint8_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<16,8,   Saturating,uint16_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt<16,8, Saturating, uint16_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<32,16,  Saturating,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt<32,16, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<64,32,  Saturating,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt<64,32, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<128,32, Saturating,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt<128,32, Saturating, uint32_t> >, NR_OPS / 2);
+	PerformanceRunner("fixpnt<  8,  4, Saturate,uint8_t >  remainder       ", RemainderWorkload< sw::universal::fixpnt<  8,  4, Saturate, uint8_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 16,  8, Saturate,uint16_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt< 16,  8, Saturate, uint16_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 32, 16, Saturate,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt< 32, 16, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 64, 32, Saturate,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt< 64, 32, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<128, 32, Saturate,uint32_t>  remainder       ", RemainderWorkload< sw::universal::fixpnt<128, 32, Saturate, uint32_t> >, NR_OPS / 2);
 #endif
 	// multiplication is the slowest operator
 
 	NR_OPS = 1024 * 32;
-	PerformanceRunner("fixpnt<8,4,   Saturating,uint8_t>    multiplication ", MultiplicationWorkload< sw::universal::fixpnt<8,4, Saturating, uint8_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<16,8,  Saturating,uint16_t>   multiplication ", MultiplicationWorkload< sw::universal::fixpnt<16,8, Saturating, uint16_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<32,16  Saturating,,uint32_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt<32,16, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<64,32, Saturating,uint32_t>   multiplication ", MultiplicationWorkload< sw::universal::fixpnt<64,32, Saturating, uint32_t> >, NR_OPS);
-	PerformanceRunner("fixpnt<128,32 Saturating,,uint32_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt<128,32, Saturating, uint32_t> >, NR_OPS / 2);
+	PerformanceRunner("fixpnt<  8,  4, Saturate, uint8_t >  multiplication ", MultiplicationWorkload< sw::universal::fixpnt<  8,  4, Saturate, uint8_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 16,  8, Saturate, uint16_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt< 16,  8, Saturate, uint16_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 32, 16  Saturate, uint32_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt< 32, 16, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt< 64, 32, Saturate, uint32_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt< 64, 32, Saturate, uint32_t> >, NR_OPS);
+	PerformanceRunner("fixpnt<128, 32, Saturate, uint32_t>  multiplication ", MultiplicationWorkload< sw::universal::fixpnt<128, 32, Saturate, uint32_t> >, NR_OPS / 2);
 }
 
 // conditional compilation

diff --git a/docker/Dockerfile.gcc10.builder b/docker/Dockerfile.gcc10.builder
@@ -37,5 +37,5 @@ USER stillwater
 WORKDIR /home/stillwater
 
 # add a command that when you run the container without a command, it produces something meaningful
-ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 10.3"
+ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 10.4"
 CMD ["/usr/bin/env", "bash"]
diff --git a/docker/Dockerfile.gcc11.builder b/docker/Dockerfile.gcc11.builder
@@ -37,5 +37,6 @@ USER stillwater
 WORKDIR /home/stillwater
 
 # add a command that when you run the container without a command, it produces something meaningful
-ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 10.3"
+ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 11.3"
+
 CMD ["/usr/bin/env", "bash"]
diff --git a/docker/Dockerfile.gcc12.builder b/docker/Dockerfile.gcc12.builder
@@ -37,5 +37,6 @@ USER stillwater
 WORKDIR /home/stillwater
 
 # add a command that when you run the container without a command, it produces something meaningful
-ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 10.3"
+ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 12.1"
+
 CMD ["/usr/bin/env", "bash"]
diff --git a/docker/Dockerfile.gcc9.builder b/docker/Dockerfile.gcc9.builder
@@ -37,5 +37,5 @@ USER stillwater
 WORKDIR /home/stillwater
 
 # add a command that when you run the container without a command, it produces something meaningful
-ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 9.4"
+ENV CONTAINER_ID "Universal Numbers Library Builder V3 GCC 9.5"
 CMD ["/usr/bin/env", "bash"]