I am having trouble understanding the output of ICC (18.0.1.163) when generating vectorized functions with pragma omp declare simd. Consider the following simple code for a vectorized pow10 function:
#include <math.h> #pragma omp declare simd simdlen(4) double pow10v(double x) { return exp(2.3025850929940459*x); }
I compile this for an AVX2 capable CPU:
icc -std=c++11 -qopenmp -xCORE-AVX2 -O3 -qopt-report-phase=vec -qopt-report=5 -c micro.c -o micro.o
The compiler generates two vectorized functions (masked / nonmasked). Vectorization report for nonmasked version reads that XMM registers are used, which I confirm by looking at the assembly code:
Begin optimization report for: pow10v..xN4v(double) Report from: Vector optimizations [vec] remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector) remark #15305: vectorization support: vector length 4 remark #15475: --- begin vector cost summary --- remark #15482: vectorized math library calls: 1 remark #15488: --- end vector cost summary --- ===========================================================================
_ZGVxN4v_pow10v: # parameter 1: %xmm0 # parameter 2: %xmm1 [...] vinsertf128 $1, %xmm1, %ymm0, %ymm2 #5.1 vmulpd .L_2il0floatpacket.0(%rip), %ymm2, %ymm0 #6.33 call *__svml_exp4_l9@GOTPCREL(%rip) #6.10 # LOE rbx r12 r13 r14 r15 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 ymm0 # Execution count [1.00e+00] vextractf128 $1, %ymm0, %xmm1 #6.10 vzeroupper #6.10 [...]
So it seems that arguments are passed to svml_exp4 using the AVX registers, but the function itself takes SSE2 registers as parameters, and then reassembles them into YMM.
Looking at the Vector ABI specification, _ZGVxN4v_pow10v denotes an SSE function. First, this is not entirely correct, since the function uses AVX instructions and calls an AVX-enabled exp implementation. But then why does ICC not generate the (IMO requested) AVX version in the first place?
Can somebody hint what am I doing wrong?
Thanks a lot!