Hello,
I use icc (ICC) version 16.0.2 (20160204). I found a bug in the way its MPX transformation pass creates bounds for SSE-heavy (and heavily-optimized) code. My computer has an Intel Skylake CPU.
Here is the minimal test case that reproduces the problem (adapted from Vips program where the bug was triggered):
#define SCALE (1<<6) float ar[SCALE + 1][SCALE + 1][4]; void __attribute__ ((noinline)) foo() { int x, y; for( x = 0; x < SCALE + 1; x++ ) for( y = 0; y < SCALE + 1; y++ ) { double X, Y, Xd, Yd; double c1, c2, c3, c4; X = (double) x / SCALE; Y = (double) y / SCALE; Xd = 1.0 - X; Yd = 1.0 - Y; c1 = Xd * Yd; c2 = X * Yd; c3 = Xd * Y; c4 = X * Y; ar[x][y][0] = c1; ar[x][y][1] = c2; ar[x][y][2] = c3; ar[x][y][3] = c4; } } int main() { foo(); return ar[0][0][0]; }
The code raises an exception when built with O2 and -no-check-pointers-narrowing (exactly this combination on my computer):
>>> icc -O2 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c>>> ./a.out Saw a #BR! status 1 at 0x400c26 Saw a #BR! status 1 at 0x400c2e ... # now with O1: works correctly >>> icc -O1 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c>>> ./a.out [ no output ] # now without no-check-pointers-narrowing >>> icc -O2 -ggdb -check-pointers-mpx=rw -lmpx -lmpxwrappers vipstest.c>>> ./a.out [ no output ]
The offending asm snippet looks like this:
bndmk 0x13(%rdx),%bnd1 # INCORRECT BOUND: TRIGGERS BR bndmk 0x1080f(%rdx),%bnd0 # CORRECT BUT UNUSED BOUND ... bndcl 0x603904(%rdi),%bnd1 bndcl 0x603908(%rdi),%bnd1 bndcl 0x60390c(%rdi),%bnd1 bndcu 0x603917(%rdi),%bnd1 # TRIGGERS BR bndcu 0x60391b(%rdi),%bnd1 bndcu 0x60391f(%rdi),%bnd1 ...
Note that when compiled with O1 (or without no-check-pointers-narrowing), the asm uses the correct BND0 register. Clearly, some autovectorization (SSE) optimization pass clashes with the MPX instrumentation.
Thread Topic:
Bug Report