Hello,
I use icc (ICC) version 16.0.2 (20160204). I found a bug in the way its MPX transformation pass creates bounds for SSE-heavy (and heavily-optimized) code. My computer has an Intel Skylake CPU.
Here is the minimal test case that reproduces the problem (adapted from Vips program where the bug was triggered):
#define SCALE (1<<6)
float ar[SCALE + 1][SCALE + 1][4];
void __attribute__ ((noinline)) foo() {
int x, y;
for( x = 0; x < SCALE + 1; x++ )
for( y = 0; y < SCALE + 1; y++ ) {
double X, Y, Xd, Yd;
double c1, c2, c3, c4;
X = (double) x / SCALE;
Y = (double) y / SCALE;
Xd = 1.0 - X;
Yd = 1.0 - Y;
c1 = Xd * Yd;
c2 = X * Yd;
c3 = Xd * Y;
c4 = X * Y;
ar[x][y][0] = c1;
ar[x][y][1] = c2;
ar[x][y][2] = c3;
ar[x][y][3] = c4;
}
}
int main() {
foo();
return ar[0][0][0];
}The code raises an exception when built with O2 and -no-check-pointers-narrowing (exactly this combination on my computer):
>>> icc -O2 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c>>> ./a.out Saw a #BR! status 1 at 0x400c26 Saw a #BR! status 1 at 0x400c2e ... # now with O1: works correctly >>> icc -O1 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c>>> ./a.out [ no output ] # now without no-check-pointers-narrowing >>> icc -O2 -ggdb -check-pointers-mpx=rw -lmpx -lmpxwrappers vipstest.c>>> ./a.out [ no output ]
The offending asm snippet looks like this:
bndmk 0x13(%rdx),%bnd1 # INCORRECT BOUND: TRIGGERS BR bndmk 0x1080f(%rdx),%bnd0 # CORRECT BUT UNUSED BOUND ... bndcl 0x603904(%rdi),%bnd1 bndcl 0x603908(%rdi),%bnd1 bndcl 0x60390c(%rdi),%bnd1 bndcu 0x603917(%rdi),%bnd1 # TRIGGERS BR bndcu 0x60391b(%rdi),%bnd1 bndcu 0x60391f(%rdi),%bnd1 ...
Note that when compiled with O1 (or without no-check-pointers-narrowing), the asm uses the correct BND0 register. Clearly, some autovectorization (SSE) optimization pass clashes with the MPX instrumentation.
Thread Topic:
Bug Report