Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

Intel C++ Compiler 16.0 OpenMP initialization of thread local static member in template class

$
0
0

Hi,

it appears that on Windows thread local storage (TLS) static members in template classes do not get initialized properly in worker threads.

Here is a (maybe not so minimal) reproducing test:

--------------------------------------------------------------------------------------------------------------------------------------------------

#include <iostream>
#include <sstream>
#include <string>
#include <thread>
#include <omp.h>

#define NTHREADS 2
#define THREAD thread_local

class MyClass
{
public:
  MyClass(int n): m_n(n)
  {
    std::ostringstream s;
    s << "MyClass::MyClass(int) in thread "<< std::this_thread::get_id() << ", m_n = "<< m_n << std::endl;
    std::cout << s.str();
  }
  MyClass(MyClass & other): m_n(other.m_n)
  {
    std::ostringstream s;
    s << "MyClass::MyClass(MyClass &) in thread "<< std::this_thread::get_id() << ", m_n = "<< m_n << std::endl;
    std::cout << s.str();
  }
  long long m_n;
};

/*=======================================*/

template<typename T>
class TemplateContainer
{
public:
  typedef T obj_type;
  static void print();
  static THREAD obj_type s_obj;
};

template<typename T>
THREAD typename TemplateContainer<T>::obj_type TemplateContainer<T>::s_obj(1);

template <typename T>
void TemplateContainer<T>::print()
{
  std::ostringstream s;
  s << "TemplateContainer<>::print() in thread "<< std::this_thread::get_id() << ", MyClass::m_n = "<< s_obj.m_n << std::endl;
  std::cout << s.str();
}

/*=======================================*/

class NonTemplateContainer
{
public:
  typedef MyClass obj_type;
  static void print();
  static THREAD obj_type s_obj;
};

THREAD typename NonTemplateContainer::obj_type NonTemplateContainer::s_obj(2);

void NonTemplateContainer::print()
{
  std::ostringstream s;
  s << "NonTemplateContainer::print() in thread "<< std::this_thread::get_id() << ", MyClass::m_n = "<< s_obj.m_n << std::endl;
  std::cout << s.str();
}

/*=======================================*/

int main()
{
  omp_set_dynamic(0);
  omp_set_num_threads(NTHREADS);
  std::cout << "======================================================================"<< std::endl;
#pragma omp parallel
  NonTemplateContainer::print();
  std::cout << "======================================================================"<< std::endl;
#pragma omp parallel
  TemplateContainer<MyClass>::print();
  std::cout << "======================================================================"<< std::endl;
}

--------------------------------------------------------------------------------------------------------------------------------------

 

When compiled in Intel Parallel Studio XE 2016 (i.e. MSVS 2015 + Intel Compiler 16.0 Update 3) in Debug configuration with all default options plus /Qopenmp and /Qstd=c++11, this code produces the following typical output:

MyClass::MyClass(int) in thread 13724, m_n = 2
MyClass::MyClass(int) in thread 13724, m_n = 1
======================================================================
MyClass::MyClass(int) in thread 14276, m_n = 2
NonTemplateContainer::print() in thread 13724, MyClass::m_n = 2
MyClass::MyClass(int) in thread 11568, m_n = 2
NonTemplateContainer::print() in thread 11568, MyClass::m_n = 2
======================================================================
TemplateContainer<>::print() in thread 13724, MyClass::m_n = 1
TemplateContainer<>::print() in thread 11568, MyClass::m_n = 2073053888752
======================================================================

Here we see that static TLS member s_obj was first initialized in master thread in both non-template and template class, and then in two spawned worker threads for the non-template container. For the template container there was no static initialization in worker threads. Since one of the TemplateContainer<>::print() functions of the parallel region seems to be executed in the master thread (why?), it outputs the initialized value, but the other one executed in worker thread outputs non-initialized garbage - that is, if you're lucky, mostly it will just crash with access violation.

The same code on Linux (CentOS 6.8) with icpc 16.0.3 compiled with -qopenmp -g -std=c++11 produces:

======================================================================
MyClass::MyClass(int) in thread 140110818879296, m_n = 2
MyClass::MyClass(int) in thread 140110660962176, m_n = 2
MyClass::MyClass(int) in thread 140110818879296, m_n = 1
MyClass::MyClass(int) in thread 140110660962176, m_n = 1
NonTemplateContainer::print() in thread 140110818879296, MyClass::m_n = 2
NonTemplateContainer::print() in thread 140110660962176, MyClass::m_n = 2
======================================================================
TemplateContainer<>::print() in thread 140110660962176, MyClass::m_n = 1
TemplateContainer<>::print() in thread 140110818879296, MyClass::m_n = 1
======================================================================

Here only two threads are used and in both of them TemplateContainer::s_obj seems to be initialized correctly.

Can anyone explain whether I'm doing something wrong or if this seems to be a bug?

P.S.: I'm running tests with OMP_NUM_THREADS set to 1 and then number of threads dynamically changed to 2 in code (see main() above), but looks like the way the number of threads is set is irrelevant.

Zone: 


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>