Hi,
I have been surprised to spot the following behavior of Intel compiler (17.0.2 20170213 on Linux), using -xCORE-AVX2. The following code generates FMA instructions
double norm(double* x, int n) { ans = 0.0; for (int i = 0; i < n; ++i) { ans += x[i] * x[i]; } return ans; }
but the following code does not
float norm(float* x, int n) { ans = 0.0f; for (int i = 0; i < n; ++i) { ans += x[i] * x[i]; } return ans; }
Is there a reason for this, or is it a missed optimization form the compiler?
Best regards,
Francois