Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

icpc memory alignment - unexpected output

$
0
0

Hi all,

I would like to understand the behavior of this small piece of code that I have extracted from a bigger application that makes use of vectorization and simd instructions.
Please don't look at the design, it is inherited from my original code and I want to take it as it is to reproduce the anomaly, though I agree with the fact that it's senseless in this small context. I'm following the guidelines described here about the alignment.

I have the following Dummy class.
dummy.h

#ifdef __INTEL_COMPILER
  typedef double * __restrict__ Real_ptr __attribute__((align_value(32)));
  typedef const double * const __restrict__ ConstReal_ptr __attribute__((align_value(32)));
#else
  typedef double * __restrict__ Real_ptr __attribute__((aligned(32)));
  typedef const double * const __restrict__ ConstReal_ptr __attribute__((aligned(32)));
#endif

class Dummy {
public:
   virtual void calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const;
private:
   double computeSingleValue( const double x, const double y ) const;
};

dummy.cpp

#include "dummy.h"
#include <algorithm>

static const double K = 10.0;

void Dummy::calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const
{
   for( unsigned int i = 0; i < n; ++i)
   {
    z[i] = computeSingleValue( x[i], y[i] );
   }
}

double Dummy::computeSingleValue( const double x, const double y ) const
{
   return std::max(K, (x >= y) ? x : y);
}

The main function tests the calculate method and couts a message in case of output different from the expected. The main.cpp is the following:

#include "dummy.h"
#include <cassert>
#include <cmath>
#include <iostream>
#include <stdlib.h>

int main()
{
   const unsigned int N = 4;

   Real_ptr x;
   assert( 0 == posix_memalign ( (void **)&x, 32, sizeof ( double ) * N ) );
   x[0] = 0.0;
   x[1] = 10.0;
   x[2] = 100.0;
   x[3] = 1000.0;

   Real_ptr y;
   assert( 0 == posix_memalign ( (void **)&y, 32, sizeof ( double ) * N ) );
   y[0] = 0.0;
   y[1] = 10.0;
   y[2] = 100.0;
   y[3] = 1000.0;

   Real_ptr z;
   assert( 0 == posix_memalign ( (void **)&z, 32, sizeof ( double ) * N ) );
   z[0] = 0.0;
   z[1] = 0.0;
   z[2] = 0.0;
   z[3] = 0.0;

   Dummy obj;
   obj.calculate( N, x, y, z );
   if( std::abs(10.0   - z[0])> 1.0E-18 ) { std::cout << "FAIL 0: z = "<< z[0] << std::endl; };
   if( std::abs(10.0   - z[1])> 1.0E-18 ) { std::cout << "FAIL 1: z = "<< z[1] << std::endl; };
   if( std::abs(100.0  - z[2])> 1.0E-18 ) { std::cout << "FAIL 2: z = "<< z[2] << std::endl; };
   if( std::abs(1000.0 - z[3])> 1.0E-18 ) { std::cout << "FAIL 3: z = "<< z[3] << std::endl; };

   free(x);
   free(y);
   free(z);
}

Now, I'm trying to compile it with -O2 and the following compilers:

  • g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
  • icpc (ICC) 16.0.3 20160415

With GCC everything works fine and the result is as expected, while with the Intel compiler the values in the last two elements of the z array are wrong and the output of the program is

FAIL 2: z = 10
FAIL 3: z = 10

The thing that puzzles me, apart from the compiler dependency, is that if I do one of the following things I can get the correct output:

  • decrease the optimization to -O1 or -O0
  • move all the source code in a single translation unit
  • replace z[i] = computeSingleValue( x[i], y[i] ); with z[i] = std::max(K, (x >= y) ? x : y); in dummy.cpp
  • add a std::cout << std::endl; in the body of computeSingleValue in dummy.cpp
  • remove the __restrict__ keyword from ConstReal_ptr typedef

I'm probably doing something wrong, but I don't get it. Any help would be really appreciated.

Thanks in advance and regards,

Massi

AttachmentSize
Downloadtext/x-chdrdummy.h671 bytes
Downloadtext/x-c++srcmain.cpp1.09 KB
Downloadtext/x-c++srcdummy.cpp407 bytes

Thread Topic: 

Question

Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>