AVX and omp simd vectorization of functions

I am having trouble understanding the output of ICC (18.0.1.163) when generating vectorized functions with pragma omp declare simd. Consider the following simple code for a vectorized pow10 function:

#include <math.h>
#pragma omp declare simd simdlen(4)
double pow10v(double x)
{
  return exp(2.3025850929940459*x);
}

I compile this for an AVX2 capable CPU:

icc -std=c++11 -qopenmp -xCORE-AVX2 -O3 -qopt-report-phase=vec -qopt-report=5 -c micro.c -o micro.o

The compiler generates two vectorized functions (masked / nonmasked). Vectorization report for nonmasked version reads that XMM registers are used, which I confirm by looking at the assembly code:

Begin optimization report for: pow10v..xN4v(double)

    Report from: Vector optimizations [vec]

remark #15347: FUNCTION WAS VECTORIZED with xmm, simdlen=4, unmasked, formal parameter types: (vector)
remark #15305: vectorization support: vector length 4
remark #15475: --- begin vector cost summary ---
remark #15482: vectorized math library calls: 1
remark #15488: --- end vector cost summary ---
===========================================================================

_ZGVxN4v_pow10v:
# parameter 1: %xmm0
# parameter 2: %xmm1
[...]
        vinsertf128 $1, %xmm1, %ymm0, %ymm2                     #5.1
        vmulpd    .L_2il0floatpacket.0(%rip), %ymm2, %ymm0      #6.33
        call      *__svml_exp4_l9@GOTPCREL(%rip)                #6.10
                                # LOE rbx r12 r13 r14 r15 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 ymm0
                                # Execution count [1.00e+00]
        vextractf128 $1, %ymm0, %xmm1                           #6.10
        vzeroupper                                              #6.10
[...]

So it seems that arguments are passed to svml_exp4 using the AVX registers, but the function itself takes SSE2 registers as parameters, and then reassembles them into YMM.

Looking at the Vector ABI specification, _ZGVxN4v_pow10v denotes an SSE function. First, this is not entirely correct, since the function uses AVX instructions and calls an AVX-enabled exp implementation. But then why does ICC not generate the (IMO requested) AVX version in the first place?

Can somebody hint what am I doing wrong?

Thanks a lot!

AVX and omp simd vectorization of functions

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...