Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

ICC 19.0.4.243 parallelized loop with confirmed Race Condition on Lenovo Legion Y7000 16 Gb ram i7 8th gen, ubuntu 18.04.

$
0
0

Hi there,

 

I am compiling the code below with ICC using the following command line: 

icc -w -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=5
 1 #include <stdio.h>
  2 int a[100];
  3
  4 int main(int argc, char *argv[])
  5 {
  6   int len=argc;
  7   int i,x=10;
  8
  9   for (i=0;i<len;i++)
 10   {
 11     a[x] = i;
 12     x=i;
 13   }
 14
 15   for (i = 0; i < len; i++)
 16     printf("%d ", a[i]);
 17   printf("x=%d",x);
 18   return 0;
 19 }

The code is a modification of the following program in AutoParBench:

https://github.com/LLNL/dataracebench/blob/master/micro-benchmarks/DRB016-outputdep-orig-yes.c

The loop pattern in the code has two pair of dependencies:  

1. loop carried output dependence

 x = .. : 

 

2. loop carried true dependence due to: 

.. = x; // a[x]

 x = ..; 

 

Below I am showing you the report produced by ICC. It seems that ICC tried to parallelize the loop at lines 9-13. 

Intel(R) Advisor can now assist with vectorization and show optimization
  report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.

Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.4.243 Build 20190416

Compiler options: -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=5 -o test.out

    Report from: Interprocedural optimizations [ipo]

  WHOLE PROGRAM (SAFE) [EITHER METHOD]: false
  WHOLE PROGRAM (SEEN) [TABLE METHOD]: true
  WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false

INLINING OPTION VALUES:
  -inline-factor: 100
  -inline-min-size: 30
  -inline-max-size: 230
  -inline-max-total-size: 2000
  -inline-max-per-routine: 10000
  -inline-max-per-compile: 500000

In the inlining report below:
   "sz" refers to the "size" of the routine. The smaller a routine's size,
      the more likely it is to be inlined.
   "isz" refers to the "inlined size" of the routine. This is the amount
      the calling routine will grow if the called routine is inlined into it.
      The compiler generally limits the amount a routine can grow by having
      routines inlined into it.

Begin optimization report for: main(int, char **)

    Report from: Interprocedural optimizations [ipo]

INLINE REPORT: (main(int, char **)) [1/1=100.0%] modified_clean_DRB016-outputdep-orig-yes.c(5,1)
  -> EXTERN: (16,5) printf(const char *__restrict__, ...)
  -> EXTERN: (17,3) printf(const char *__restrict__, ...)


    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]


LOOP BEGIN at modified_clean_DRB016-outputdep-orig-yes.c(9,3)
   remark #17109: LOOP WAS AUTO-PARALLELIZED
   remark #17101: parallel loop shared={ .2 } private={ } firstprivate={ argc } lastprivate={ } firstlastprivate={ i } reduction={ }
   remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
   remark #25439: unrolled with remainder by 2  
   remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
   remark #25015: Estimate of max trip count of loop=100
LOOP END

LOOP BEGIN at modified_clean_DRB016-outputdep-orig-yes.c(9,3)
<Remainder>
   remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
   remark #25015: Estimate of max trip count of loop=100
LOOP END

LOOP BEGIN at modified_clean_DRB016-outputdep-orig-yes.c(15,3)
   remark #17104: loop was not parallelized: existence of parallel dependence
   remark #15382: vectorization support: call to function printf(const char *__restrict__, ...) cannot be vectorized   [ modified_clean_DRB016-outputdep-orig-yes.c(16,5) ]
   remark #15344: loop was not vectorized: vector dependence prevents vectorization
   remark #25015: Estimate of max trip count of loop=100
LOOP END

LOOP BEGIN at modified_clean_DRB016-outputdep-orig-yes.c(9,3)
   remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
   remark #25439: unrolled with remainder by 2  
   remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
   remark #25015: Estimate of max trip count of loop=100
LOOP END

LOOP BEGIN at modified_clean_DRB016-outputdep-orig-yes.c(9,3)
<Remainder>
   remark #25456: Number of Array Refs Scalar Replaced In Loop: 1
   remark #25015: Estimate of max trip count of loop=100
LOOP END

    Report from: Code generation optimizations [cg]

modified_clean_DRB016-outputdep-orig-yes.c(5,1):remark #34051: REGISTER ALLOCATION : [main] modified_clean_DRB016-outputdep-orig-yes.c:5

    Hardware registers
        Reserved     :    2[ rsp rip]
        Available    :   39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15]
        Callee-save  :    6[ rbx rbp r12-r15]
        Assigned     :   14[ rax rdx rcx rbx rsi rdi r8-r15]
        
    Routine temporaries
        Total         :     125
            Global    :      33
            Local     :      92
        Regenerable   :      46
        Spilled       :       1
        
    Routine stack
        Variables     :      32 bytes*
            Reads     :       6 [0.00e+00 ~ 0.0%]
            Writes    :       9 [0.00e+00 ~ 0.0%]
        Spills        :      48 bytes*
            Reads     :      11 [5.00e+00 ~ 0.6%]
            Writes    :      11 [0.00e+00 ~ 0.0%]
    
    Notes
    
        *Non-overlapping variables and spills may share stack space,
         so the total stack size might be less than this.
    

===========================================================================

However, intel inspector reports a data race in the loop parallelized by ICC. The contents of “log/realtime_mode.log”, generated by intel inspector, follows below.

<?xml version="1.0" encoding="UTF-8"?>
<feedback>
 <message severity="verbose">Analysis started...</message>
 <nop/>
 <message severity="info">Collection started. To stop the collection, either press CTRL-C or enter from another console window: inspxe-cl -r /home/gleison/Desktop/Fernando_modifed_example/r005ti3 -command stop.</message>
 <nop/>
 <message severity="verbose">Result file: /home/gleison/Desktop/Fernando_modifed_example/r005ti3/r005ti3.inspxe </message>
 <nop/>
 <message severity="verbose">Found target process /home/gleison/Desktop/Fernando_modifed_example/test.out (PID = 20895). Analysis started... </message>
 <nop/>
 <message severity="verbose">Loaded module: /home/gleison/Desktop/Fernando_modifed_example/test.out. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib64/ld-linux-x86-64.so.2. </message>
 <nop/>
 <message severity="verbose">Loaded module: [vdso]. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib/x86_64-linux-gnu/libm.so.6. </message>
 <nop/>
 <message severity="verbose">Loaded module: /usr/lib/x86_64-linux-gnu/libiomp5.so. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib/x86_64-linux-gnu/libgcc_s.so.1. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib/x86_64-linux-gnu/libpthread.so.0. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib/x86_64-linux-gnu/libc.so.6. </message>
 <nop/>
 <message severity="verbose">Loaded module: /lib/x86_64-linux-gnu/libdl.so.2. </message>
 <nop/>
 <message severity="verbose">Loaded module: /opt/intel/inspector_2019.4.0.597413/lib64/runtime/libittnotify.so. </message>
 <nop/>
 <message severity="warning">One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.</message>
 <nop/>
 <message severity="verbose">Unloaded module: /home/gleison/Desktop/Fernando_modifed_example/test.out. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib64/ld-linux-x86-64.so.2. </message>
 <nop/>
 <message severity="verbose">Unloaded module: [vdso]. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib/x86_64-linux-gnu/libm.so.6. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /usr/lib/x86_64-linux-gnu/libiomp5.so. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib/x86_64-linux-gnu/libgcc_s.so.1. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib/x86_64-linux-gnu/libpthread.so.0. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib/x86_64-linux-gnu/libc.so.6. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /lib/x86_64-linux-gnu/libdl.so.2. </message>
 <nop/>
 <message severity="verbose">Unloaded module: /opt/intel/inspector_2019.4.0.597413/lib64/runtime/libittnotify.so. </message>
 <nop/>
 <message severity="verbose">Process /home/gleison/Desktop/Fernando_modifed_example/test.out (PID = 20895) has terminated. </message>
 <nop/>
 <message severity="verbose">Application exit code: 0 </message>
 <nop/>
 <message severity="verbose">Result file: /home/gleison/Desktop/Fernando_modifed_example/r005ti3/r005ti3.inspxe </message>
 <nop/>
 <message severity="verbose">Analysis completed</message>
 <nop/>
 <message severity="info">  </message>
 <nop/>
 <message severity="info">1 new problem(s) found </message>
 <nop/>
 <message severity="info">    1 Data race problem(s) detected </message>
 <nop/>
</feedback>

The loop has a race condition on a[10], which, if run in parallel, can receive either integers 10 (first iteration) or 11 (tenth iteration).

 

Regards,

Gleison

 

 

 

TCE Level: 

TCE Open Date: 

Thursday, January 16, 2020 - 13:26

Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>