Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

AVX best performance min function with usigned char

$
0
0

Hi everybody and thanks for your help!

I have this piece of code :

unsigned char A,B,C;

// init A,B,C with mm_malloc, 64 bit aligned

for(j=0;j<size;j++)
       C[j] = fminf(255,255-(A[j]*B[j]));

Considering that A,B,C are 8 bit datatype so with AVX vectorization I should have 16 operation per clock cycle, but the function fmin work with 32 bit float datatype so the operation per clock cycle are 8. I see in Intel intrinsic function exist a min between u8 datatype. 

I try to translate the loop in intrinsic but I have a problem to find a load and mul function to u8 packed datatype (epu8).

How can obtain the maximum performance in this loop?

Thanks

Best regards

Eric

 

Zone: 


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>