I'm on Scientific Linux 6.X and have a parallel real-time program in which I'm using OMP. I originally compiled this program using GCC 5.2.0. While looking at a Kernelshark trace of my program I noticed that threads spin before sleeping at the barrier at the end of each parallel region (while the original thread does not). This seemed to be a constant 1.5ms using GCC. After doing some more research into OMP it looked like I could set OMP_WAIT_POLICY=passive and make these threads sleep instead. However, changing this between active and passive seemed to make no difference. See image below for 1.5ms delay at end of parallel section.
I found some documentation and answers to posts https://software.intel.com/en-us/forums/intel-c-compiler/topic/707453 and https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/338457 that made me believe that perhaps using ICC I could get rid of this with KMP_BLOCKTIME=0. However, I saw the same sort of behavior using ICC 18.0.0 regardless of what values I used for KMP_BLOCKTIME and OMP_WAIT_POLICY. The measured time to spin when compiling with ICC was 2ms or 0.2s which is consistent with what Andrey Churbanov says is the default in one of the above linked posts.
My question is how can I get rid of this spinning at the end of a parallel region? We potentially need to run this at rates as high as 1024Hz so obviously we can't have a 1-2ms delay each iteration.
Thanks a lot,
James