Hi,
I use Intel 17.0.4 compiler and Intel Xeon E5 2697 V4 (Broadwell) processor. I know that this processor supports fused multiply add instruction.
For this line of code:
yy += (A[i] * B[i]);
If I convert the C++ code to assembly I can see vfmadd231pd 16(%rdx,%r11,8), %xmm6, %xmm1
However, when I use vYY = _mm256_fmadd_pd (vA, vB, vYY) in the C++ code, the compiler uses add and multiply vector instructions only:
vmulpd (%r15,%rsi,8), %ymm4, %ymm5
vaddpd %ymm1, %ymm5, %ymm1
Is there any explanation for this ?
Thanks,