Hi all,
I have a complex code that calls many Fortran and MKL functions which I need to optimize it. As the first step, I've started with compiler optimization -O 3. I've compiled my code once with MKL and the Intel compilers version 14 and another time with version 16. I used the verbose flag to get more information and I found that each version can vectorize a different set of loops. Although v14 can vectorize more loops, the compiled code with v16 performed better because more critical loops were vectorized. I expected to see that v16 can vectorize more loops or at least all of the loops that v14 vectorized but this is not the case. I am interested to know why this happens and also to see if anybody had some experience that over vectorization could cause slow down in the code?
Any other hint in this respect is greatly appreciated.
Regards,
Hossein