icpc memory alignment - unexpected output

Hi all,

I would like to understand the behavior of this small piece of code that I have extracted from a bigger application that makes use of vectorization and simd instructions.
Please don't look at the design, it is inherited from my original code and I want to take it as it is to reproduce the anomaly, though I agree with the fact that it's senseless in this small context. I'm following the guidelines described here about the alignment.

I have the following Dummy class.
dummy.h

#ifdef __INTEL_COMPILER
  typedef double * __restrict__ Real_ptr __attribute__((align_value(32)));
  typedef const double * const __restrict__ ConstReal_ptr __attribute__((align_value(32)));
#else
  typedef double * __restrict__ Real_ptr __attribute__((aligned(32)));
  typedef const double * const __restrict__ ConstReal_ptr __attribute__((aligned(32)));
#endif

class Dummy {
public:
   virtual void calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const;
private:
   double computeSingleValue( const double x, const double y ) const;
};

dummy.cpp

#include "dummy.h"
#include <algorithm>

static const double K = 10.0;

void Dummy::calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const
{
   for( unsigned int i = 0; i < n; ++i)
   {
    z[i] = computeSingleValue( x[i], y[i] );
   }
}

double Dummy::computeSingleValue( const double x, const double y ) const
{
   return std::max(K, (x >= y) ? x : y);
}

The main function tests the calculate method and couts a message in case of output different from the expected. The main.cpp is the following:

#include "dummy.h"
#include <cassert>
#include <cmath>
#include <iostream>
#include <stdlib.h>

int main()
{
   const unsigned int N = 4;

   Real_ptr x;
   assert( 0 == posix_memalign ( (void **)&x, 32, sizeof ( double ) * N ) );
   x[0] = 0.0;
   x[1] = 10.0;
   x[2] = 100.0;
   x[3] = 1000.0;

   Real_ptr y;
   assert( 0 == posix_memalign ( (void **)&y, 32, sizeof ( double ) * N ) );
   y[0] = 0.0;
   y[1] = 10.0;
   y[2] = 100.0;
   y[3] = 1000.0;

   Real_ptr z;
   assert( 0 == posix_memalign ( (void **)&z, 32, sizeof ( double ) * N ) );
   z[0] = 0.0;
   z[1] = 0.0;
   z[2] = 0.0;
   z[3] = 0.0;

   Dummy obj;
   obj.calculate( N, x, y, z );
   if( std::abs(10.0   - z[0])> 1.0E-18 ) { std::cout << "FAIL 0: z = "<< z[0] << std::endl; };
   if( std::abs(10.0   - z[1])> 1.0E-18 ) { std::cout << "FAIL 1: z = "<< z[1] << std::endl; };
   if( std::abs(100.0  - z[2])> 1.0E-18 ) { std::cout << "FAIL 2: z = "<< z[2] << std::endl; };
   if( std::abs(1000.0 - z[3])> 1.0E-18 ) { std::cout << "FAIL 3: z = "<< z[3] << std::endl; };

   free(x);
   free(y);
   free(z);
}

Now, I'm trying to compile it with -O2 and the following compilers:

g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
icpc (ICC) 16.0.3 20160415

With GCC everything works fine and the result is as expected, while with the Intel compiler the values in the last two elements of the z array are wrong and the output of the program is

FAIL 2: z = 10
FAIL 3: z = 10

The thing that puzzles me, apart from the compiler dependency, is that if I do one of the following things I can get the correct output:

decrease the optimization to -O1 or -O0
move all the source code in a single translation unit
replace z[i] = computeSingleValue( x[i], y[i] ); with z[i] = std::max(K, (x >= y) ? x : y); in dummy.cpp
add a std::cout << std::endl; in the body of computeSingleValue in dummy.cpp
remove the __restrict__ keyword from ConstReal_ptr typedef

I'm probably doing something wrong, but I don't get it. Any help would be really appreciated.

Thanks in advance and regards,

Massi

Attachment	Size
Download dummy.h	671 bytes
Download main.cpp	1.09 KB
Download dummy.cpp	407 bytes

Thread Topic:

Question

icpc memory alignment - unexpected output

Thread Topic:

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Windows Update / Microsoft Update の接続先 URL について

Bureau of Internal Revenue: Regional Offices (Directory)

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Nalgonda District Police Office Mobile Numbers List in Telangana State

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Black Angus Grilled Artichokes

Moondru Mudichu 27-12-2016 – Polimer tv Serial

CIERA PERNELL

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

IIS 観点でアンチウイルススキャン対象から除外したいフォルダ

Error 1920. Service VMware Blast (VMBlast) failed to start.

Afzal Hai Kul Jahan Se Gharana Hussain Ka

Re: clean install, hardware monitoring service on this host is not responding

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

NCERT Solutions for Class 10th: Ch 5 Les médias French

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Shanike Mcbride

[BluRay] Girls’ Generation – The Best Live at Tokyo Dome