This repository shows a non-GPU way to get faster linear algebra calculations. We test some software and hardware optimizations to get speedup.
- C simple
- C unroll
- C - Intel MKL
- Python simple
- Python numpy
- Requires C, Intel MKL library, Python and numpy
- For best performance for C programs compile with -O3 flag for compiler level optimizations
gcc program.c -O3 -o program
- To compile C program with Intel MKL
gcc program.c -O3 -o program -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm
Algorithm | Vector Size | Runtime(sec) | Bandwidth (GB/s) | Throughput |
---|---|---|---|---|
C simple | 1,000,000 | 0.001501 sec | 5.330960 | 1.333 GFLOPS |
C simple | 300,000,000 | 0.457934 sec | 5.240930 | 1.310 GFLOPS |
C unroll | 1,000,000 | 0.000465 sec | 17.202948 | 4.301 GFLOPS |
C unroll | 300,000,000 | 0.208287 sec | 11.522536 | 2.881 GFLOPS |
C - Intel MKL | 1,000,000 | 0.000194 sec | 41.342320 | 10.336 GFLOPS |
C - Intel MKL | 300,000,000 | 0.093496 sec | 25.669479 | 6.417 GFLOPS |
Python simple | 1,000,000 | 0.610351 sec | 0.013107 | 0.00328 GFLOPS |
Python simple | 300,000,000 | 177.1355 sec | 0.013549 | 0.00339 GFLOPS |
Python numpy | 1,000,000 | 0.001509 sec | 5.301609 | 1.3254 GFLOPS |
Python numpy | 300,000,000 | 0.452451 sec | 5.304442 | 1.3261 GFLOPS |