Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

AVX and omp simd vectorization of functions

$
0
0

I am having trouble understanding the output of ICC (18.0.1.163) when generating vectorized functions with pragma omp declare simd. Consider the following simple code for a vectorized pow10 function:

#include <math.h>
#pragma omp declare simd simdlen(4)
double pow10v(double x)
{
  return exp(2.3025850929940459*x);
}

I compile this for an AVX2 capable CPU:

icc -std=c++11 -qopenmp -xCORE-AVX2 -O3 -qopt-report-phase=vec -qopt-report=5 -c micro.c -o micro.o

The compiler generates two vectorized functions (masked / nonmasked). Vectorization report for nonmasked version reads that XMM registers are used, which I confirm by looking at the assembly code:

Begin optimization report for: pow10v..xN4v(double)

    Report from: Vector optimizations [vec]

remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector)
remark #15305: vectorization support: vector length 4
remark #15475: --- begin vector cost summary ---
remark #15482: vectorized math library calls: 1
remark #15488: --- end vector cost summary ---
===========================================================================
_ZGVxN4v_pow10v:
# parameter 1: %xmm0
# parameter 2: %xmm1
[...]
        vinsertf128 $1, %xmm1, %ymm0, %ymm2                     #5.1
        vmulpd    .L_2il0floatpacket.0(%rip), %ymm2, %ymm0      #6.33
        call      *__svml_exp4_l9@GOTPCREL(%rip)                #6.10
                                # LOE rbx r12 r13 r14 r15 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 ymm0
                                # Execution count [1.00e+00]
        vextractf128 $1, %ymm0, %xmm1                           #6.10
        vzeroupper                                              #6.10
[...]

So it seems that arguments are passed to svml_exp4 using the AVX registers, but the function itself takes SSE2 registers as parameters, and then reassembles them into YMM.

Looking at the Vector ABI specification, _ZGVxN4v_pow10v denotes an SSE function. First, this is not entirely correct, since the function uses AVX instructions and calls an AVX-enabled exp implementation. But then why does ICC not generate the (IMO requested) AVX version in the first place?

Can somebody hint what am I doing wrong?

Thanks a lot!


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>