Compiler Flags for Different Architectures

Last updated: 3/29/19

This page lists flags that are used to compile Spatter and STREAM comparisons on different architectures.

STREAM

Some general notes for STREAM can be found at this blog post:

Additionally:

ICC generally will generate the best quality code for STREAM and Spatter on Intel architectures.
Streaming loads/stores may be needed to increase performance to "peak" performance.

STREAM flags per architecture

Common flags for Intel compilers with OpenMP backend: -Ofast -qopenmp -qopenmp-link=static -fargument-noalias

TBD - when do we use -ffreestanding?

Note that in many cases, you can check for vectorized instructions by generating the assembly with the -S flag or by using objdump -d <compiled_app> to look at the assembly code. As mentioned in this StackOverflow post, you want to look for instructions with names like vgatherpf0qpd.

Architecture	Short Name	Compiler	Flags	Notes
Sandy Bridge	SNB	icc	-march=sandybridge
Broadwell	BDW	icc	-march=broadwell
Skylake	SKL	icc	-march=skylake
Skylake with AVX512	SKL	icc	-march=skylake-avx512
		cce	-hvector2 or -hvector3	moderate or aggressive vectorization
			-hvector1 or -hscalar1/2/3	limited automatic vectorization
Knight's Landing with AVX512 and MCDRAM	KNL	icpc	icpc -xCOMMON-AVX512	Compilation notes
Power9	PWR9	codexl		Use `xlc_r` to create thread-safe version of Spatter
			-qtune=pwr9	Tune for Power9 arch (auto tunes for arch where compiled)
			-qsimd=auto	Implied for -O3 or higher opt level
			-qenablevmx	Enable vector generation
			-qhot=vector
ARM TX2	TX2	armclang	-O3 -mcpu=native	Let compiler decide based on host
			-O3 -mcpu=thunderx2t99
		gcc	-ftree-vectorize

Intel-specific flags

To use HBM on KNL:

#Check mem settings
numactl -H 
#Run on NUMA mem region 1 (HBM)
numactl --membind 1 ./run-app

How do we know if code has been vectorized with a specific compiler?

ICC

Returns info on which loops were vectorized and why: -qopt-report=1 -qopt-report-phase=vec

Returns info on loops that were not vectorized and why: -qopt-report-phase=vec,loop -qopt-report=2

You can also use the following high-level flag: -vec-report=3

[CodeXL]

-qreport or -qlist flags can be used to generate high-order transformation (HOT) reports or print an object listing of the code.

Armclang

Using the ARMHPC compiler, we can also print out the vectorization report: -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -Rpass-missed=loop-vectorize

As an example:

armclang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -Rpass-missed=loop-vectorize example.c -gline-tables-only 2> vecreport.txt

Alternatively, you will need to use the armllvm-objdump with the correct disassemble flags or you can use the -S flag to generate the assembly code during compilation. #From the basic SVE example; ld1w and st1w are SVE instructions $armllvm-objdump -disassemble -mattr=+sve example &> example.dis #Sample output from example.dis - ld1w and st1w are both SVE instructions 400898: a0 42 48 a5 ld1w { z0.s }, p0/z, [x21, x8, lsl #2] 40089c: c1 42 48 a5 ld1w { z1.s }, p0/z, [x22, x8, lsl #2] 4008a0: 00 04 a1 04 sub z0.s, z0.s, z1.s 4008a4: e0 42 48 e5 st1w { z0.s }, p0, [x23, x8, lsl #2]

#Option 2 - generate assembly during compilation $armclang -O3 -S --target=aarch64-arm-none-eabi -march=armv8-a+sve -o example.s example.c #Sample output from example.s .LBB1_3: // =>This Inner Loop Header: Depth=1 ld1w { z0.s }, p0/z, [x21, x8, lsl #2] ld1w { z1.s }, p0/z, [x22, x8, lsl #2] sub z0.s, z0.s, z1.s st1w { z0.s }, p0, [x23, x8, lsl #2]

Cray Compiler - CCE

GNU

-fopt-info-missed-vec or -fopt-info-vec-missed=vec.miss to print to a file

Clang

-Rpass-analysis=loop-vectorize -Rpass=loop-vectorize -Rpass-missed=loop-vectorize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler Flags for Different Architectures

STREAM

STREAM flags per architecture

Intel-specific flags

How do we know if code has been vectorized with a specific compiler?

ICC

[CodeXL]

Armclang

Cray Compiler - CCE

GNU

Clang

Spatter-Related Publications

Using Spatter as a Benchmark

Development with Spatter

Related Work

DynamoRio G/S Tools

Clone this wiki locally