Quantcast
Viewing all 1175 articles
Browse latest View live

Unable to use std::complex

I am unable to compile the following snippet that uses std::complex, getting the error "qualified name is not allowed". The version of the compiler  (15.0.6) seems pretty up to date and compatible with the existing gcc installation (5.1.0). Am I missing anything???

#include <cmath>
#include <complex>
namespace tensor {
  typedef std::complex<double> cdouble;
}
 
[nnp05@cierzo ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/cm/local/apps/gcc/5.1.0/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-5.1.0/configure --prefix=/cm/local/apps/gcc/5.1.0 --enable-languages=c,c++,fortran --with-gmp-include=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-gmp --with-gmp-lib=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-gmp/.libs --with-mpc-include=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-mpc/src --with-mpc-lib=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-mpc/src/.libs --with-mpfr-include=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-mpfr/src --with-mpfr-lib=/root/rpmbuild/BUILD/gcc-5.1.0-obj/../gcc-5.1.0/our-mpfr/src/.libs
Thread model: posix
gcc version 5.1.0 (GCC)
[nnp05@cierzo ~]$ icpc -v
icpc version 15.0.6 (gcc version 5.1.0 compatibility)
[nnp05@cierzo ~]$ icc -v
icc version 15.0.6 (gcc version 5.1.0 compatibility)

Thread Topic: 

Bug Report

icc (ICC) 17.0.1 20161005 creates much larger and slow executable than gcc

If I compile plain C code using icc 17.0.1, it creates executable (if stripped) of 105k size, while the gcc (6.3.0 and 4.8.5) creates 8k size executable.  The executable compiled with icc is also about 20% slower.  Also, for some reason icc unnecessarily links to libdl.so and libgcc_s.so libraries.

I was under impression icc was meant to produce better performing executables than gcc ..

Is there anything I can do here with icc to get a better-performing, smaller executable, than with gcc ?

AVX best performance min function with usigned char

Hi everybody and thanks for your help!

I have this piece of code :

unsigned char A,B,C;

// init A,B,C with mm_malloc, 64 bit aligned

for(j=0;j<size;j++)
       C[j] = fminf(255,255-(A[j]*B[j]));

Considering that A,B,C are 8 bit datatype so with AVX vectorization I should have 16 operation per clock cycle, but the function fmin work with 32 bit float datatype so the operation per clock cycle are 8. I see in Intel intrinsic function exist a min between u8 datatype. 

I try to translate the loop in intrinsic but I have a problem to find a load and mul function to u8 packed datatype (epu8).

How can obtain the maximum performance in this loop?

Thanks

Best regards

Eric

 

Zone: 

Crash by vector destructor if used with range based for loop

Hello,

I have a problem with vector destructor if I use it with range based for loop.

Sample code to reproduce this issue:

#include <vector>

class foo
{
public:
    void bar(const std::vector<int>& ints)
    {
        std::vector<int> items;
#if 1
        for (const auto& i : ints)
        {
#else
        for (size_t j = 0; j < ints.size(); ++j)
        {
            int i = ints[j];
#endif
            items.push_back(i+5);
        }
    }
};

int main()
{
    std::vector<int> ints{ 1,2,5,7,9,0 };

    foo f;
    f.bar(ints);

    return 0;
}

The call stack:

>    VectorCrash.exe!std::_Container_base12::_Orphan_all() Line 223    C++
     VectorCrash.exe!std::_Vector_alloc<std::_Vec_base_types<int, std::allocator<int> > >::_Orphan_all() Line 614    C++
     VectorCrash.exe!std::vector<int, std::allocator<int> >::_Tidy() Line 1640    C++
     VectorCrash.exe!std::vector<int, std::allocator<int> >::~vector() Line 977    C++
     VectorCrash.exe!main() Line 36    C++
     VectorCrash.exe!invoke_main() Line 64    C++

It crashes in:

inline void _Container_base12::_Orphan_all()

function in xutility header, when _ITERATOR_DEBUG_LEVEL == 2, also in Debug configuration.

My compiler: Intel® Parallel Studio XE 2017 Update 1 Composer Edition for C++ Windows* Integration for Microsoft* Visual Studio* 2015, Version 17.0.71.14

It works fine with Microsoft C++ compiler.

Best regards

Przemek

Zone: 

#pragma warning( disable: 46 ) does not disable

Compiling using Parallel Studio 2017 on Windows 10, VS 2015 community edition

 

#ifdef WIN32
#pragma warning( push,    46 )
#pragma warning( disable: 46 )
#define _Pragma(text) __pragma(text)    // ignore this warning message
#pragma warning( pop )
#endif

 

1>..\..\src\./checker.h(294): warning #46: "_Pragma" is predefined; attempted redefinition ignored
1>    #define _Pragma(text) __pragma(text)
1>            ^

Why did the warning #46 not get suppressed?

 

Intel C++ Compiler 16.0 OpenMP initialization of thread local static member in template class

Hi,

it appears that on Windows thread local storage (TLS) static members in template classes do not get initialized properly in worker threads.

Here is a (maybe not so minimal) reproducing test:

--------------------------------------------------------------------------------------------------------------------------------------------------

#include <iostream>
#include <sstream>
#include <string>
#include <thread>
#include <omp.h>

#define NTHREADS 2
#define THREAD thread_local

class MyClass
{
public:
  MyClass(int n): m_n(n)
  {
    std::ostringstream s;
    s << "MyClass::MyClass(int) in thread "<< std::this_thread::get_id() << ", m_n = "<< m_n << std::endl;
    std::cout << s.str();
  }
  MyClass(MyClass & other): m_n(other.m_n)
  {
    std::ostringstream s;
    s << "MyClass::MyClass(MyClass &) in thread "<< std::this_thread::get_id() << ", m_n = "<< m_n << std::endl;
    std::cout << s.str();
  }
  long long m_n;
};

/*=======================================*/

template<typename T>
class TemplateContainer
{
public:
  typedef T obj_type;
  static void print();
  static THREAD obj_type s_obj;
};

template<typename T>
THREAD typename TemplateContainer<T>::obj_type TemplateContainer<T>::s_obj(1);

template <typename T>
void TemplateContainer<T>::print()
{
  std::ostringstream s;
  s << "TemplateContainer<>::print() in thread "<< std::this_thread::get_id() << ", MyClass::m_n = "<< s_obj.m_n << std::endl;
  std::cout << s.str();
}

/*=======================================*/

class NonTemplateContainer
{
public:
  typedef MyClass obj_type;
  static void print();
  static THREAD obj_type s_obj;
};

THREAD typename NonTemplateContainer::obj_type NonTemplateContainer::s_obj(2);

void NonTemplateContainer::print()
{
  std::ostringstream s;
  s << "NonTemplateContainer::print() in thread "<< std::this_thread::get_id() << ", MyClass::m_n = "<< s_obj.m_n << std::endl;
  std::cout << s.str();
}

/*=======================================*/

int main()
{
  omp_set_dynamic(0);
  omp_set_num_threads(NTHREADS);
  std::cout << "======================================================================"<< std::endl;
#pragma omp parallel
  NonTemplateContainer::print();
  std::cout << "======================================================================"<< std::endl;
#pragma omp parallel
  TemplateContainer<MyClass>::print();
  std::cout << "======================================================================"<< std::endl;
}

--------------------------------------------------------------------------------------------------------------------------------------

 

When compiled in Intel Parallel Studio XE 2016 (i.e. MSVS 2015 + Intel Compiler 16.0 Update 3) in Debug configuration with all default options plus /Qopenmp and /Qstd=c++11, this code produces the following typical output:

MyClass::MyClass(int) in thread 13724, m_n = 2
MyClass::MyClass(int) in thread 13724, m_n = 1
======================================================================
MyClass::MyClass(int) in thread 14276, m_n = 2
NonTemplateContainer::print() in thread 13724, MyClass::m_n = 2
MyClass::MyClass(int) in thread 11568, m_n = 2
NonTemplateContainer::print() in thread 11568, MyClass::m_n = 2
======================================================================
TemplateContainer<>::print() in thread 13724, MyClass::m_n = 1
TemplateContainer<>::print() in thread 11568, MyClass::m_n = 2073053888752
======================================================================

Here we see that static TLS member s_obj was first initialized in master thread in both non-template and template class, and then in two spawned worker threads for the non-template container. For the template container there was no static initialization in worker threads. Since one of the TemplateContainer<>::print() functions of the parallel region seems to be executed in the master thread (why?), it outputs the initialized value, but the other one executed in worker thread outputs non-initialized garbage - that is, if you're lucky, mostly it will just crash with access violation.

The same code on Linux (CentOS 6.8) with icpc 16.0.3 compiled with -qopenmp -g -std=c++11 produces:

======================================================================
MyClass::MyClass(int) in thread 140110818879296, m_n = 2
MyClass::MyClass(int) in thread 140110660962176, m_n = 2
MyClass::MyClass(int) in thread 140110818879296, m_n = 1
MyClass::MyClass(int) in thread 140110660962176, m_n = 1
NonTemplateContainer::print() in thread 140110818879296, MyClass::m_n = 2
NonTemplateContainer::print() in thread 140110660962176, MyClass::m_n = 2
======================================================================
TemplateContainer<>::print() in thread 140110660962176, MyClass::m_n = 1
TemplateContainer<>::print() in thread 140110818879296, MyClass::m_n = 1
======================================================================

Here only two threads are used and in both of them TemplateContainer::s_obj seems to be initialized correctly.

Can anyone explain whether I'm doing something wrong or if this seems to be a bug?

P.S.: I'm running tests with OMP_NUM_THREADS set to 1 and then number of threads dynamically changed to 2 in code (see main() above), but looks like the way the number of threads is set is irrelevant.

Zone: 

ICC ignoring all linker information

Dear Intel forum,

  I am using the icc  17.0.1 for compile several ccp codes onto object then linker on  a executable.  Some odd thing is happening, if i use the follow lines to linker to a executable :

icc in_boundary.o in_gfmd.o in_integration.o in_print.o in_rand.o in_util.o in_forces.o in_init.o in_namelist.o in_prop.o in_singlestep.o main.o -o gfmd

icc -o gfmd in_boundary.o in_gfmd.o in_integration.o in_print.o in_rand.o in_util.o in_forces.o in_init.o in_namelist.o in_prop.o in_singlestep.o main.o

and insert any argument like -lm or even libs dont even exists like -lasdf the compiler ignores all the -l information generating several "undefined reference to"

  Please, what is wrong ? There is necessary argument ?

  Best regards.

 

 

Thread Topic: 

Help Me

error #31001: The dll for reading and writing the pdb (for example, mspdb110.dll) could not be found on your path

Hi everyone,

I recently upgraded my Intel Composer version (2016 -> 2017 update 1) and cannot compile my program anymore in 32bit mode. I get this error :

"error #31001: The dll for reading and writing the pdb (for example, mspdb110.dll) could  not be found on your path"

I found this similar topic but in my case it works fine in 64bit and crash in 32bit. The compilation crash in command line and in Visual Studio.

My config: Win7 (64bit), Visual Studio 2013, Platform Toolset: Intel C++ Compiler 17.0, Base Platform Toolset: V100.

Can someone help me please ?

Guix

Zone: 


Inheriting an explicit constuctor

Intel C++ compiler (Version 16.0.3.207 Build 20160415) seems to drop the explicit specifier when the constructor of the base class is inherited with using.

struct B
{
    explicit B(int) { }
};

struct D : B
{
    using B::B;
};

B b = 1; // Not OK, fine
D d = 1; // Not OK with Microsoft C++ and GCC, but OK with Intel C++

Thread Topic: 

Bug Report

Equivalent of "gcc -Q --help=target"

Intel documentation indicates that the optimization parameters used with -O3 can change between versions of the compiler. I would like to evaluate the difference in optimizations between v16.3 and v17. Is there a simple way to arrive at the options used in a similar fashion that gcc allows with the options shown in my Subject line?

Intel compiler chokes on template instantiation

I have written code that compiles fine on GCC 6.2, clang 3.9, and MSVS 2015 (the C++ build tools version).

It's available on github:
https://www.github.com/rubenvb/skui

Build with the following define to skip building Skia and head straight for the error:

cmake -DSKIP_SKIA -DCMAKE_CXX_COMPILER=icpc
make

The error is as follows:

/home/ruben/Development/skui/core/value_ptr.h++(50): error: object of abstract class type "skui::core::implementation::slot<void>" is not allowed:
            function "skui::core::implementation::slot<ReturnType, ArgTypes...>::operator() [with ReturnType=void, ArgTypes=<>]" is a pure virtual function
               { return static_cast<void*>(new T(*static_cast<T*>(other))); };
                                               ^
          detected during:
            instantiation of "void *(*skui::core::copy_constructor_copier<T>())(void *) [with T=skui::core::implementation::slot<void>]" at line 69
            instantiation of "skui::core::smart_copy<T>::smart_copy() [with T=skui::core::implementation::slot<void>]" at line 140

/home/ruben/Development/skui/core/value_ptr.h++(59): error: static assertion failed with "Cannot default construct smart_copy for an abstract type."
        explicit smart_copy() { static_assert(!std::is_abstract<T>::value, "Cannot default construct smart_copy for an abstract type."); }
                                ^
          detected during instantiation of "skui::core::smart_copy<T>::smart_copy() [with T=skui::core::implementation::slot<void>]" at line 140

/home/ruben/Development/skui/core/value_ptr.h++(50): error: object of abstract class type "skui::core::implementation::slot<void, skui::core::string>" is not allowed:
            function "skui::core::implementation::slot<ReturnType, ArgTypes...>::operator() [with ReturnType=void, ArgTypes=<skui::core::string>]" is a pure virtual function
               { return static_cast<void*>(new T(*static_cast<T*>(other))); };
                                               ^
          detected during:
            instantiation of "void *(*skui::core::copy_constructor_copier<T>())(void *) [with T=skui::core::implementation::slot<void, skui::core::string>]" at line 69
            instantiation of "skui::core::smart_copy<T>::smart_copy() [with T=skui::core::implementation::slot<void, skui::core::string>]" at line 140
            instantiation of class "skui::core::value_ptr<T, Copier, Deleter> [with T=skui::core::implementation::slot<void, skui::core::string>, Copier=skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, Deleter=std::default_delete<skui::core::implementation::slot<void, skui::core::string>>]" at line 72 of "/usr/include/c++/6.2.1/bits/list.tcc"
            instantiation of "void std::__cxx11::_List_base<_Tp, _Alloc>::_M_clear() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *,
                      skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 442 of "/usr/include/c++/6.2.1/bits/stl_list.h"
            instantiation of "std::__cxx11::_List_base<_Tp, _Alloc>::~_List_base() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *,
                      skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            implicit generation of "std::__cxx11::list<_Tp, _Alloc>::~list() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void,
                      skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            instantiation of class "std::__cxx11::list<_Tp, _Alloc> [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void,
                      skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            instantiation of "skui::core::implementation::signal_base<ArgTypes...>::~signal_base() [with ArgTypes=<skui::core::string>]" at line 191 of "/home/ruben/Development/skui/core/signal.h++"

/home/ruben/Development/skui/core/value_ptr.h++(59): error: static assertion failed with "Cannot default construct smart_copy for an abstract type."
        explicit smart_copy() { static_assert(!std::is_abstract<T>::value, "Cannot default construct smart_copy for an abstract type."); }
                                ^
          detected during:
            instantiation of "skui::core::smart_copy<T>::smart_copy() [with T=skui::core::implementation::slot<void, skui::core::string>]" at line 140
            instantiation of class "skui::core::value_ptr<T, Copier, Deleter> [with T=skui::core::implementation::slot<void, skui::core::string>, Copier=skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, Deleter=std::default_delete<skui::core::implementation::slot<void, skui::core::string>>]" at line 72 of "/usr/include/c++/6.2.1/bits/list.tcc"
            instantiation of "void std::__cxx11::_List_base<_Tp, _Alloc>::_M_clear() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *,
                      skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 442 of "/usr/include/c++/6.2.1/bits/stl_list.h"
            instantiation of "std::__cxx11::_List_base<_Tp, _Alloc>::~_List_base() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *,
                      skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            implicit generation of "std::__cxx11::list<_Tp, _Alloc>::~list() [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void,
                      skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            instantiation of class "std::__cxx11::list<_Tp, _Alloc> [with _Tp=std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void, skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>, _Alloc=std::allocator<std::pair<const skui::core::trackable *, skui::core::value_ptr<skui::core::implementation::slot<void,
                      skui::core::string>, skui::core::smart_copy<skui::core::implementation::slot<void, skui::core::string>>, std::default_delete<skui::core::implementation::slot<void, skui::core::string>>>>>]" at line 62 of "/home/ruben/Development/skui/core/signal.h++"
            instantiation of "skui::core::implementation::signal_base<ArgTypes...>::~signal_base() [with ArgTypes=<skui::core::string>]" at line 191 of "/home/ruben/Development/skui/core/signal.h++"

compilation aborted for /home/ruben/Development/skui/core/application.c++ (code 2)

I have absolutely no idea why it fails, and the fact that three other independent compilers like the code as it is (with /W4 or -Wextra -pedantic warning free) points me in the direction of a compiler bug. Am I right or is there a subtlety in the code I have written?

Thread Topic: 

Bug Report

icpc (17.0.1.132) slow when invoking multiple icpc processes

When we use a single icpc process and perform "icpc -V", the command usually finishes in about 1 - 1.5 seconds.  When we use multiple icpc processes, something appears to serialize.  For example, if we perform 16 "icpc -V" commands concurrently, each one takes approximately 16 seconds to return.  The execution time gets worse and worse as we add processes.

After examining strace and various flexlm debug output, the delay and serialization appears to occur when something in icpc (flexlm?) is scanning all devices in the system using libudev.  Why does icpc scan all devices (stat and readlink on them, too)?  This device scan happens after the license server sends the license information... The following is printed "INTEL_LMD: checkoutfilter: returns ACCEPT".  The scan happens before we see "Checkout succeeded".  Is there any way to disable the device scan?  Or speed up the device scan?

The only way I've found to make the intel C++ compiler function at acceptable speeds is to create zero length libudev.so.0 and libudev.so.1 and add them to my LD_LIBRARY_PATH when using icpc.  This gets rid of the device scan, but it is not the safest thing to do.  Is there any other better workaround?

Note: We have had similar slowness in earlier intel C++ compiler releases, too.  Some previous releases of the intel C++ compiler only looked for libudev.so.0, so the intel C++ compiler worked reasonably fast on a system that did not have libudev.so.0.  However, previous intel C++ compiler releases were slow on systems that did have libudev.so.0.

Thread Topic: 

Bug Report

OMP_WAIT_POLICY and KMP_BLOCKTIME

I had assumed that the Intel OpenMP Runtime library would respect OMP_WAIT_POLICY, and further I assumed that

OMP_WAIT_POLICY=passive

would be the same as KMP_BLOCKTIME=0.  I think this is a wrong assumption. 

 

BUT After some timing I believe it's true that:

OMP_WAIT_POLICY=passive is same as KMP_BLOCKTIME=<default 200ms>

OMP_WAIT_POLICY=active is same as KMP_BLOCKTIME=infinite

Can someone confirm the above?

Thread pinning with OpenMP

Hi,

I need to make scaling graphs for an OpenMP application.

My machine is a Dual-Xeon (14 cores per Xeon), with hyper-threading. I would like to place threads using the OpenMP 4 standard, so using OMP_PLACES, OMP_PROC_BIND, OMP_NUM_THREADS.

One of the benchmark is the following: use 4 threads, the first two threads should be bound to the first core of the first socket, and the other 2 threads should be bound to the first core of the second socket. For that, I use:

export OMP_PLACES='{0}, {14}'
export OMP_PROC_BIND=close
export OMP_NUM_THREADS=4

but I am not sure that it does the right job. Bare in mind that I don't want the first and the third thread to be on core 0. I want the first and the second threads to be on this core as I want to use the first touch policy and limits the number of chunks of arrays being allocated in different NUMA domains.

Could you also confirm the number in OMP_PLACES is related to the core number, and that different NUMA domains (including on KNL with the 2 cores on a tile, and the quadrants) are grouped in ascending order.

Thanks for your help,

Francois

 

Static library created with icpc, cannot use it using g++

The objects in the library are compiled as follows:

icpc -c  -fPIC -Wno-unused-function -Wall -pthread -static-intel -O3 -axCORE-AVX2 -fma -fp-model fast=2 -funroll-all-loops -unroll-aggressive -D__AVX2__ -ffat-lto-objects -xHost -ipo -ipo-jobs20 -qopenmp-link=dynamic -o "fileN.o""fileN.cpp"

The library is created:

gcc-ar ruvs ./libtest.a file*.o

When I try to compile with g++:

g++   -Wall -pthread -O3 -march=native -mtune=native -flto=8 -minline-stringops-dynamically -fvariable-expansion-in-unroller -fweb -fgcse-sm -fgcse-las -fgcse-after-reload -fipa-pta -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fivopts -funswitch-loops -frename-registers -funroll-loops   -o MyProgram ../lib/libtest.a
./lib/libtest.a(kmp_stub.o): In function `ompc_set_num_threads':
(.text+0x0): multiple definition of `ompc_set_num_threads'
../lib/libtest.a(kmp_csupport.o):(.text+0x11f0): first defined here
collect2: error: ld returned 1 exit status

The thought problem seemed to be with openMP, if I try to make openMP dynamic with 

-qopenmp-link=dynamic

I get the same errors.

How can I create a working static library with icpc?

Thanks.

 

Thread Topic: 

Question

Benchmark tips

Hi,

 

I need to make some benchmark on a Dual-Xeon and on a KNL. When doing the benchmark, I would like to get a time and an estimation of the error. For that, I run 10 benchmarks of the same program in a row and take the average as the time and compute the standard deviation to get an error. The goal is obviously to reduce the standard deviation.

I do the following, on a Linux CentOS 7.3 box:

- Disable turbo boost on the BIOS

- Make sure that the Dual-Xeon is in NUMA mode in the BIOS (That's what I want)

- Make sure to correctly pin threads to the correct hardware thread with OMP_PLACES and OMP_PROC_BIND. I also tend to use the following tip to do a scaling graph. I plot the number of cores on the x-axis and I plot 2 curves for the Dual-Xeon. One curve for 1 thread per core, and one curve for 2 threads per core. I do the same for KNL with 4 curves.

- I'll also try to boot without the graphical user interface, see if it makes a difference

Do you have any other tip?

17.0 update 1: link break with /Quse-intel-optimized-headers

Hi, I've a project which builds and runs fine using VC's headers.  Changing over to Intel optimized headers compiles without issue but fails at link time with

    LINK : fatal error LNK1104: cannot open file 'ipps.lib'

A full drive search indicates Parallel Studio XE 2017 hasn't installed this .lib anywhere, so it isn't something which can be addressed by adding a link path.

I see there are a couple threads about what seems to be the same issue in earlier ICC versions.  Intel support's responded by asking if the compiler's installed and there hasn't been follow up.  In this case, yes, it's most definitely installed.  Is the .lib perhaps mistakenly registered to some other Parallel Studio component and, if so, which one needs to be installed to get it?

AttachmentSize
DownloadImage may be NSFW.
Clik here to view.
image/png
ICC 17.0u1.png
81.24 KB

Zone: 

Thread Topic: 

Bug Report

Misleading error message (from declaring a friend function)

Hello, the icpc returns on the following code an error: 'error: "a" has already been declared in the current scope ', which is misleading as the error can be solved by declaring the function (like in the two commented lines). The gcc warns me with: 'warning: friend declaration ‘T N::a(C, int)’ declares a non-template function [-Wnon-template-friend] friend T N::a(C, int);' and notifies me with 'note: (if this is not what you intended, make sure the function template has already been declared and add <> after the function name here)'.

#include <iostream>

template <typename T>
class C;

namespace N {

template <typename T>
T a(C<T> b);

//template <typename T>
//T a(C<T>, int);

}   // END namespace N

template <typename T>
class C
{
    T i;
 public:
    C(T i = 0) : i(i) { }

    template <typename T2>
    friend T N::a(C<T2>);

    friend T N::a(C, int);
};

int main()
{
    C<int> mc(7);
}


namespace N {

template <typename T>
T a(C<T> b)
{
    return b.i;
}

template <typename T>
T a(C<T> b, int c)
{
    return b.i+c;
}

}   // END namespace N

I am aware, that the first friend declaration is not really usable, but this is the shortest way to show the error message that I could think of. Best regards, Thomas Köhler

 

 

Zone: 

Thread Topic: 

Bug Report

Linking libimf.a statically

How do I statically link libimf using GCC compiler. 

I have a simple program that I want to statically link with Libimf

#include <stdio.h>
#include <math.h>
int main(){
int c=0;
       float k = 6.25;
       float sin_value = powf(2.15,k);
       printf("POW(2.15,6.25) : %f \n", sin_value);
      return 0;
}

Running, gcc powf-example.c -static /pathtointelLibrary/libimf.a 

gives the following error

 undefined reference to `__intel_cpu_feature_indicator_x'
libm_feature_flag.c:(.text+0x5a): undefined reference to `__intel_cpu_features_init_x'
libm_feature_flag.c:(.text+0x80): undefined reference to `__intel_cpu_features_init_x'
libm_feature_flag.c:(.text+0xa5): undefined reference to `__intel_cpu_features_init_x'
libm_feature_flag.c:(.text+0xc4): undefined reference to `__intel_cpu_features_init_x'
libm_feature_flag.c:(.text+0xe3): undefined reference to `__intel_cpu_features_init_x'

C++11 support issue - memory_order_release is undefined

Dear Sir,
I am trying to compile a code , and i am getting an error:-
icc -c -O2 -lpthread -xHOST -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11 -fno-strict-aliasing load.c
 

load.c(277): warning #266: function "atomic_thread_fence" declared implicitly

                      { atomic_thread_fence(memory_order_release);};

                        ^

load.c(277): error: identifier "memory_order_release" is undefined
                      { atomic_thread_fence(memory_order_release);};
                                            ^

compilation aborted for load.c (code 2)

Is the following version of intel missing c++11 support (http://en.cppreference.com/w/cpp/atomic/memory_order) or am i using wrong flags?
$ icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.3.187 Build 20150407
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

File is attached herewith.
Eagerly awaiting your reply.

AttachmentSize
DownloadImage may be NSFW.
Clik here to view.
text/plain
load.c.txt
14.59 KB
Viewing all 1175 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>