I am having trouble understanding the output of ICC (18.0.1.163) when generating vectorized functions with pragma omp declare simd. Consider the following simple code for a vectorized pow10 function:
#include <math.h>
#pragma omp declare simd simdlen(4)
double pow10v(double x)
{
return exp(2.3025850929940459*x);
}I compile this for an AVX2 capable CPU:
icc -std=c++11 -qopenmp -xCORE-AVX2 -O3 -qopt-report-phase=vec -qopt-report=5 -c micro.c -o micro.o
The compiler generates two vectorized functions (masked / nonmasked). Vectorization report for nonmasked version reads that XMM registers are used, which I confirm by looking at the assembly code:
Begin optimization report for: pow10v..xN4v(double)
Report from: Vector optimizations [vec]
remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector)
remark #15305: vectorization support: vector length 4
remark #15475: --- begin vector cost summary ---
remark #15482: vectorized math library calls: 1
remark #15488: --- end vector cost summary ---
===========================================================================
_ZGVxN4v_pow10v:
# parameter 1: %xmm0
# parameter 2: %xmm1
[...]
vinsertf128 $1, %xmm1, %ymm0, %ymm2 #5.1
vmulpd .L_2il0floatpacket.0(%rip), %ymm2, %ymm0 #6.33
call *__svml_exp4_l9@GOTPCREL(%rip) #6.10
# LOE rbx r12 r13 r14 r15 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 ymm0
# Execution count [1.00e+00]
vextractf128 $1, %ymm0, %xmm1 #6.10
vzeroupper #6.10
[...]So it seems that arguments are passed to svml_exp4 using the AVX registers, but the function itself takes SSE2 registers as parameters, and then reassembles them into YMM.
Looking at the Vector ABI specification, _ZGVxN4v_pow10v denotes an SSE function. First, this is not entirely correct, since the function uses AVX instructions and calls an AVX-enabled exp implementation. But then why does ICC not generate the (IMO requested) AVX version in the first place?
Can somebody hint what am I doing wrong?
Thanks a lot!