I'm trying to implement the _mm512_adds_epu8 function for CPUs which don't support it, and CI picked up a problem with ICC earlier today. The problem occurs when attempting to call a portable implementation of _mm_adds_epu8 4 times in a loop to emulate the _mm512_adds_epu8 function. GCC and clang work as expected.
Unfortunately I've had a hard time putting together a reduced test case; when I tried I ended up with an internal compiler error. AFAICT that reduced test case should work (GCC and clang are fine with it), so I'm attaching it too… maybe it's related.
The original test is gigantic as it includes tons of unrelated code, but it actually compiles. The result is just incorrect.
This is with icc 19.1 20200117 on (and targeting) Linux x86_64 (Fedora 31).