I wish I had a small bit of code to demonstrate this, but unfortunately I don't. Wondering if anyone else has run into something similar, though.
I followed the instructions to link mkl_rt at https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compi.... I’m building/running on a Skylake with AVX-512, on a fresh install of RHEL/CentOS 7.4 with glibc-2.17-196.el7_4.2, and Intel Compiler 18 update 1. The compile finishes without errors, and R mostly works, but deadlocks under certain workloads—typically if a lot of computation has been done and then it forks. A guaranteed way to trigger it is running “make check-all” after compilation. MKL_THREADING_LAYER=intel deadlocks even on some very basic tests (during forks to system calls), =tbb gets around it most—but not all—of the time (tests involving the R parallel library fail), and =sequential (or setting OMP_NUM_THREADS=1 in the other two modes) passes all tests.
Here’s the relevant stack of deadlocked process:
#0 0x00007fae3d963945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fae3de53b77 in ___kmp_suspend_template_aux (th_gtid=<optimized out>, th=<optimized out>, flag=<optimized out>) at ../../src/z_Linux_util.cpp:1781 #2 __kmp_suspend_template (th_gtid=<optimized out>, flag=<optimized out>) at ../../src/z_Linux_util.cpp:1910 #3 __kmp_suspend_64 (th_gtid=635251012, flag=0x80) at ../../src/z_Linux_util.cpp:2019 #4 0x00007fae3dde2f13 in suspend (this=<optimized out>, th_gtid=<optimized out>) at ../../src/kmp_wait_release.h:731 #5 __kmp_wait_template (this_thr=<optimized out>, flag=<optimized out>, final_spin=<optimized out>, itt_sync_obj=<optimized out>) at ../../src/kmp_wait_release.h:343 #6 wait (this=<optimized out>, this_thr=<optimized out>, final_spin=<optimized out>, itt_sync_obj=<optimized out>) at ../../src/kmp_wait_release.h:742 #7 _INTERNAL_25_______src_kmp_barrier_cpp_ce635104::__kmp_hyper_barrier_release (bt=635251012, this_thr=0x80, gtid=1, tid=-1, propagate_icvs=635250944, itt_sync_obj=0x0) at ../../src/kmp_barrier.cpp:865 #8 0x00007fae3dde4556 in __kmp_fork_barrier (gtid=635251012, tid=128) at ../../src/kmp_barrier.cpp:2177 #9 0x00007fae3de1cc1f in __kmp_launch_thread (this_thr=0x7fae25dd2944) at ../../src/kmp_runtime.cpp:5768 #10 0x00007fae3de4fc00 in _INTERNAL_26_______src_z_Linux_util_cpp_c3d2e46c::__kmp_launch_worker (thr=0x7fae25dd2944) at ../../src/z_Linux_util.cpp:585 #11 0x00007fae3d95fe25 in start_thread () from /lib64/libpthread.so.0 #12 0x00007fae3d68d34d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fae3eef3780 (LWP 27709)): #0 0x00007fae3d671e47 in sched_yield () from /lib64/libc.so.6 #1 0x00007fae3de530a4 in _INTERNAL_26_______src_z_Linux_util_cpp_c3d2e46c::__kmp_atfork_prepare () at ../../src/z_Linux_util.cpp:1531 #2 0x00007fae3d654232 in fork () from /lib64/libc.so.6 #3 0x00007fae3d601bbc in _IO_proc_open@@GLIBC_2.2.5 () from /lib64/libc.so.6 #4 0x00007fae3d601e4c in popen@@GLIBC_2.2.5 () from /lib64/libc.so.6 #5 0x00007fae3e756617 in do_system () from /tmp/rbuild/lib/libR.so
If you strace the parent, you see sched_yield() being called infinitely, and the CPU is pinned at 100% (seen a similar issue on here, but it’s related to v15 and was fixed in v16). I’ve recompiled probably near 100 times with different compiler options, but the result is always the same. I’ve tried the entire process with the kernel from 7.3, as well, and that fails similarly—so I’m more inclined to think it’s something to do with the interaction between Intel’s OpenMP and glibc. clang builds against the system-provided OpenMP execute with threading fine. I haven’t had a chance to test on other distros or a different processor.
Any thoughts?