Hi,
I have been surprised to spot the following behavior of Intel compiler (17.0.2 20170213 on Linux), using -xCORE-AVX2. The following code generates FMA instructions
double norm(double* x, int n) {
ans = 0.0;
for (int i = 0; i < n; ++i) {
ans += x[i] * x[i];
}
return ans;
}but the following code does not
float norm(float* x, int n) {
ans = 0.0f;
for (int i = 0; i < n; ++i) {
ans += x[i] * x[i];
}
return ans;
}Is there a reason for this, or is it a missed optimization form the compiler?
Best regards,
Francois