Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all 1175 articles
Browse latest View live

Compilation gets stuck with flag -Wshadow when using boost/operators

$
0
0

The following code can not be compiled with the Intel C++ compiler and flag -Wshadow. The compilation gets stuck without any further error message. However, the code compiles fine with the Intel C++ compiler when neglecting the flag -Wshadow.

#include <boost/operators.hpp>

class C
   : boost::ordered_euclidian_ring_operators1< C
   , boost::ordered_euclidian_ring_operators2< C, int
   , boost::ordered_euclidian_ring_operators2< C, double
> > > 
{
   C (long i) {}
};

int main() {}

I have tested  and reproduced this issue with versions 13.1.3, 14.0.3, 16.0.4, 17.0.5, 18.0.4 and 19.0.4 of the Intel C++ compiler using boost versions 1.61.0 and 1.70.0.
In contrast, the same code compiles fine with flag -Wshadow with various versions of the GNU C++ compiler and the LLVM Clang C++ compiler.


omp pragma different time

$
0
0

Hi,

I'm  working  on a complicated  project, that requires a parallel  calculations  in  order to achieve  good time performance.

Our  company  bought  for  this purpose  intel xeon platinum 8168  processor (96 CORES - name 96).  Also, we have a  computer  with intel core i9 7960x processor(16 CORES - name 16). 

I'm  using  "omp  pragma  for"  directive,  as all  calculations  happen in  FOR  loops. And at this point i got strange results.

I'm running  my code  on  16  PC,  with number of iterations in FOR loop  less than 16. That's mean, that number of threads & number of CORES  that are  used  less than 16. At this point, I got almost same time results(I mean, 5 iterations, 10 iterations, and 15 iterations  complete with almost same time). And this is correct, since NOT ALL CPU  power was  used.

At this point, I try  and run SAME  code  on  96. And I see strange time performance results. If I run 40 iterations(see 40 threads), the time is almost twice against 1 iteration. And if I run 90 iterations(still, NOT full power!!),  time increase almost 4 times. 

My questions is, does it have some issue to this specific processor (intel xeon platinum 8168  processor)  working with IPP libraries ? 

What could be the possible reason for such a time complexity increase?  I am aware about dynamic memory allocations , we have some,  and time needed to create large number of threads, but still  that's seem  not the real reason.

Thanks   

 

omp different time execution

$
0
0

Hi,

I am working with complicated algorithm, that requires a lot of computations. For this purpose, Intel Xeon Platinum 8168 CPU  was puchased - with 96 Cores(name  96). Besides, we already have Intel Core i9 -7960X CPU with 16 Cores(name 16). 

I'm  running "omp pragma parallel for" directive on  FOR  loop in  order to get  parallel calculations. First, I tried this approach on 16 PC, with 5, 10  and 15  iterations on that  FOR  loop, and got almost  same  results for all three cases (and that is correct, since not all CPU power was used).

Next step, I ran same code on 96 PC. I also have tried  different number of iterations ,and I see an constantly increasing  time in total execution.

On 40 iterations, total time increased almost twice, and on 90 iterations it increased in 3.5 times(still, NOT full power of CPU as it have 96 cores!!).

I'm aware of threading pool, and time needed to create such amount of threads but still, that seems not working well at all.  Does omp have specific problems with   Intel Xeon Platinum processors, that I not aware of it? Maybe something about it's architecture, that not complied with omp.

* It is not a problem of cooling hardware, since tests are running about 3 minutes.

* It is not a problem of allocations or memory copy, since there is exactly same amount of memory allocated and memory copied. 

Could you think of a possible problems, because I have run out of ideas.

Thanks

 

LLD cannot output functional shared object with object file generated by ICPC

$
0
0

Here is the sample:

~$ cat lib.h
#ifndef LIB_H_
#define LIB_H_
void foo();
#endif
~$ cat lib.cpp
#include <iostream>
#include "lib.h"
void foo() {
    std::cout << "Hello, world!"<< std::endl;
}
~$ cat main.cpp
#include "lib.h"
int main() {
    foo();
    return 0;
}
~$ icpc -fPIC -c lib.cpp
~$ icpc -fuse-ld=lld -shared -o libmylib.so lib.o
ld.lld: error: can't create dynamic relocation R_X86_64_64 against symbol: __gxx_personality_v0 in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output
>>> defined in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
>>> referenced by lib.cpp
>>>               lib.o:(.gnu.linkonce.d.DW.ref.__gxx_personality_v0+0x0)
~$ icpc -fuse-ld=lld -shared -o libmylib.so lib.o -Wl,-z,notext
~$ icpc main.o -L . -lmylib
~$ ./a.out 
Segmentation fault (core dumped)
~$ gdb a.out core
[New LWP 50478]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f2ffdbe7526 in std::ostream::sentry::sentry(std::ostream&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007f2ffdbe7526 in std::ostream::sentry::sentry(std::ostream&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007f2ffdbe7ba8 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f2ffdbe8027 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f2ffe07f0f4 in foo() () from ./libmylib.so
#4  0x0000000000400ddf in main ()
~$ g++ -c -fPIC lib.cpp
~$ icpc -fuse-ld=lld -shared -o libmylib.so lib.o
~$ ./a.out
Hello, world!
~$

LLD does not link the object file produced by ICPC, even I just compiled the object file with -fPIC option present. If I add -Wl,-z-notext option to linker as prompted by LLD, it links, but the generated libmylib.so library does not work and segfaults in libstdc++. But if I compile the object file with G++ and all other things remain the same, it works.

xmm16 used without specifying AVX512

$
0
0

I have a weird issue that I don't understand. One of my customers has a problem with my software (it crashes immediately on startup, with a c000001d - invalid instruction - exception). Unfortunately I have no access to the system on which it happens.

OS: Microsoft Server 2012 R2 with Hyper-V inside Microsoft Server 2016
CPU: Intel Xeon Silver 4108 CPU @ 1,80Hz, which supports AVX512.

Binary created with Intel Compiler 17.0 inside Visual Studio 2015.
64 bit, minimum supported target needs to have SSE2 (/arch:SSE2), with optional paths for SSE4.1, 4.2, AVX and AVX2: /QaxSSE4.1 /QaxSSE4.2 /QaxAVX /QaxCORE-AVX-I /QaxCORE-AVX2.

The customer ran a debugger and the code crashes on an instruction that attempts to use the xmm16 register, which only exists in AVX512. I'm not building for AVX512, so I don't understand why it would generate such an instruction. Also, MS Server 2012 doesn't support AVX512 so any checking code should have protected against reaching this instruction (MS Server 2016 and the CPU both do support AVX512 though).

Unfortunately my customer has no access to the Server 2016 system so he hasn't been able to test if the software runs fine outside of the Server 2012 host.

The instruction it crashes on:

00007ff7`e76c2626 62817e08100498  vmovss  xmm16,dword ptr [r8+r11*4] ds:00007ff7`e862e320=3ec3ef15

 

ldexpf already defined linker error

$
0
0

Hi, when I try to build a project with intel compiler 19.0 , I get following linker error:

1>libmmd.lib(libmmd.dll) : : error LNK2005: ldexpf already defined in mkl_intel_lp64_dll.lib(dftifreedescriptor_lp64.obj)

Any ideas what I am doing wrong ?

Edit: Forgot to mention that I'm building on windows10 x64 .

Thanks,

Sergey.

Segmentation fault when compiling with icpc

$
0
0

Hello,

I ran into a strange problem with icpc when compiling the attached C++ code. When I compile with icpc -O0 icpc_segfault.cpp -o icpc_segfault the program crashes with a segmentation fault. If I compile with any other optimization flag or with clang++ or g++, the program runs fine.

I can't see any UB in the code, so my only other guess is that it is a problem with icpc. The attached code is a stripped down version of the actual program. I can "fix" the segfault in the attached code in several ways (e.g., add a const data member to the FormatList class, or declare pointers volatile), but these "fixes" don't work in the actual program. This just makes me think even more so that it's a compiler problem and not UB.

The target architecture is intel64 and I tried several versions of icpc (16.0.4, 17.0.5, 18.0.3, 19.0.5.281).

Any help on this is appreciated. It can very well be that I just don't see the UB in the code.

AttachmentSize
Downloadtext/x-c++srcicpc_segfault.cpp1.31 KB

Intel C++ for Windows __cplusplus value

$
0
0

When I specify the /Qstd=c++0x command line option for Intel C++ on Windows this is the equivalent of C++11 according to documentation at https://software.intel.com/en-us/cpp-compiler-developer-guide-and-refere.... Yet the __cplusplus value for C++11, which is 201103L, is not set but instead the __cplusplus value for C++03, which is 199711L, is set. This seems rto me a bug in Intel C++ as a programmer can not rely on the level of the C++ standard in his code by checking the __cplusplus value.


SIGSEGV __kmp_cleanup_threadprivate_caches()

$
0
0

I am getting segmentation fault at:  

signal SIGSEGV, Segmentation fault.
0x00007f21293df7c3 in __kmp_cleanup_threadprivate_caches () at ../../src/kmp_threadprivate.cpp:812

Since, I can't provide code details. Could you please suggest something which could be the possible reason behind this?

My cases run fine in optimize version, but with debug, If I use gdb, I get this following error at the exit of executable.

Anurag

Intel C++ 19.0 for Windows invalid macro expansion with stringizing operator

$
0
0

This example is taken from the C++ standard when discussing the rules for redefinition and reexamination. Given:

#define str(x) # x

the macrto expansion of:

char c[2][6] = { str(hello), str() };

should be:

char c[2][6] = { "hello", "" };

but with Intel C++ 19.0 on Windows the expansion is erroneously:

char c[2][6] = { "hello", };

This is clearly a bug in the preprocessor so I am reporting it here.

TCE Open Date: 

Wednesday, November 13, 2019 - 20:10

ICC 2019 on VS2019: std::map inserted gives unexpected results

$
0
0

Hello!

I use ICC 2019 update 5 embedded into VS2019 16.1.6. I observed something strange with std::map::insert(). I perform insertion of elements from a std::vector<int> to a std::map<int, int>, using the element value as the map key. Depending on the way I insert them or the compilation switches I use, I don't obtain the same map. See a question on StackOverflow.

Here is the code snippet I used:

#include <map>
#include <vector>
#include <iostream>

std::map<int, int> AddToMapWithDependencyBetweenElementsInLoop(const std::vector<int>& values)
{
    std::map<int, int>  myMap;
    for (int i = 0; i < values.size(); i+=3)
    {
        myMap.insert(std::make_pair(values[i], myMap.size()));
        myMap.insert(std::make_pair(values[i + 1], myMap.size()));
        myMap.insert(std::make_pair(values[i + 2], myMap.size()));
    }
    return myMap;
}

std::map<int, int> AddToMapOnePerLoop(const std::vector<int>& values)
{
    std::map<int, int>  myMap;
    for (int i = 0; i < values.size(); ++i)
    {
        myMap.insert(std::make_pair(values[i], 0));
    }
    return myMap;
}

int main()
{
    std::vector<int> values{ 6, 7,  15, 5,  4,  12, 13, 16, 11, 10, 9,  14, 0,  1,  2,  3,  8,  17 };

    {
        auto myMap = AddToMapWithDependencyBetweenElementsInLoop(values);
        for (const auto& keyValuePair : myMap)
        {
            std::cout << keyValuePair.first << ", ";
        }
        std::cout << std::endl;
    }

    {
        auto myMap = AddToMapOnePerLoop(values);
        for (const auto& keyValuePair : myMap)
        {
            std::cout << keyValuePair.first << ", ";
        }
        std::cout << std::endl;
    }

    return 0;
}

If I compile using "icl mycode.cpp" and I run the program, I obtain this:

0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17

(I expected the same number for both lines and 18 numbers on each lines since there are 18 different numbers inserted in the map)

 

If I compile using "icl /EHsc mycode.cpp" and I run, I obtain:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17

(Again, unexpected results?)

 

If I compile using "icl /Od mycode.cpp" and I run:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

This is the expected result...

Note that I compared using the same switches but with cl.exe (Microsoft compiler) and I always obtain the expected result.

Any idea about the case? Is it a bug? Did I do something wrong? Thanks for your help!

 

TCE Open Date: 

Thursday, November 14, 2019 - 06:36

omp pragma different time

$
0
0

Hi,

I'm  working  on a complicated  project, that requires a parallel  calculations  in  order to achieve  good time performance.

Our  company  bought  for  this purpose  intel xeon platinum 8168  processor (96 CORES - name 96).  Also, we have a  computer  with intel core i9 7960x processor(16 CORES - name 16). 

I'm  using  "omp  pragma  for"  directive,  as all  calculations  happen in  FOR  loops. And at this point i got strange results.

I'm running  my code  on  16  PC,  with number of iterations in FOR loop  less than 16. That's mean, that number of threads & number of CORES  that are  used  less than 16. At this point, I got almost same time results(I mean, 5 iterations, 10 iterations, and 15 iterations  complete with almost same time). And this is correct, since NOT ALL CPU  power was  used.

At this point, I try  and run SAME  code  on  96. And I see strange time performance results. If I run 40 iterations(see 40 threads), the time is almost twice against 1 iteration. And if I run 90 iterations(still, NOT full power!!),  time increase almost 4 times. 

My questions is, does it have some issue to this specific processor (intel xeon platinum 8168  processor)  working with IPP libraries ? 

What could be the possible reason for such a time complexity increase?  I am aware about dynamic memory allocations , we have some,  and time needed to create large number of threads, but still  that's seem  not the real reason.

Thanks   

 

ICC 2019 on VS2019: std::map inserted gives unexpected results

$
0
0

Hello!

I use ICC 2019 update 5 embedded into VS2019 16.1.6. I observed something strange with std::map::insert(). I perform insertion of elements from a std::vector<int> to a std::map<int, int>, using the element value as the map key. Depending on the way I insert them or the compilation switches I use, I don't obtain the same map. See a question on StackOverflow.

Here is the code snippet I used:

#include <map>
#include <vector>
#include <iostream>

std::map<int, int> AddToMapWithDependencyBetweenElementsInLoop(const std::vector<int>& values)
{
    std::map<int, int>  myMap;
    for (int i = 0; i < values.size(); i+=3)
    {
        myMap.insert(std::make_pair(values[i], myMap.size()));
        myMap.insert(std::make_pair(values[i + 1], myMap.size()));
        myMap.insert(std::make_pair(values[i + 2], myMap.size()));
    }
    return myMap;
}

std::map<int, int> AddToMapOnePerLoop(const std::vector<int>& values)
{
    std::map<int, int>  myMap;
    for (int i = 0; i < values.size(); ++i)
    {
        myMap.insert(std::make_pair(values[i], 0));
    }
    return myMap;
}

int main()
{
    std::vector<int> values{ 6, 7,  15, 5,  4,  12, 13, 16, 11, 10, 9,  14, 0,  1,  2,  3,  8,  17 };

    {
        auto myMap = AddToMapWithDependencyBetweenElementsInLoop(values);
        for (const auto& keyValuePair : myMap)
        {
            std::cout << keyValuePair.first << ", ";
        }
        std::cout << std::endl;
    }

    {
        auto myMap = AddToMapOnePerLoop(values);
        for (const auto& keyValuePair : myMap)
        {
            std::cout << keyValuePair.first << ", ";
        }
        std::cout << std::endl;
    }

    return 0;
}

If I compile using "icl mycode.cpp" and I run the program, I obtain this:

0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17

(I expected the same number for both lines and 18 numbers on each lines since there are 18 different numbers inserted in the map)

 

If I compile using "icl /EHsc mycode.cpp" and I run, I obtain:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17

(Again, unexpected results?)

 

If I compile using "icl /Od mycode.cpp" and I run:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

This is the expected result...

Note that I compared using the same switches but with cl.exe (Microsoft compiler) and I always obtain the expected result.

Any idea about the case? Is it a bug? Did I do something wrong? Thanks for your help!

 

TCE Open Date: 

Thursday, November 14, 2019 - 06:36

Do you know if Intel plan to release for free its compiler ?

$
0
0

I have made a game engine based on voxel, using only the CPU and here's the benchmark I made, for 1 million voxels:

  • 20 FPS for a non optimized code, using Visual Studio compiler
  • 144 FPS for an optimized code, using VS compiler
  • 400 FPS for an optimized code, using Intel C++ Compiler.

And I am using an i3 6100.

So yes, ICC can help me a lot for this project.

Also this program is made in C#, does ICC support C# ? ^^

TCE Open Date: 

Friday, November 15, 2019 - 02:07

Optimize pow(x, n) + pow(y, n) == pow(z, n)

$
0
0

I'm having a go at the xn + yn = zn problem and want to see how fast I can make it go and if I can get a result but I cant figure out how to make my code vectorize or parallelize:

#include <iostream>
using namespace std;

int main(){
    unsigned long n = 0;
    cout << "n = ";
    cin >> n;
    for(unsigned long x = 1; x < 0xffffffff; ++x){
        cout << "x = "<< x << "\n";
        for(unsigned long y = 2; y < 0xffffffff; ++y){
            const unsigned long xy = pow(x, n) + pow(y, n);
            cout << "y = "<< y << "\n";
            for(unsigned long z = 1; z < 0xffffffff; ++z){
                if(xy == pow(z, n)) cout << x << "n + "<< y << "n = "<< z << "n\n";
            }
        }
    }
}

the optimization diagnostic gives 15523

I'm using Intel C++ compiler 18.0

TCE Open Date: 

Saturday, November 16, 2019 - 08:22

Intel 19 type-deduction allows non-const lvalue reference to prvalue

$
0
0

Intel 19 (Update 5) for Windows introduces a regression in which it is possible to take a non-const l-value reference to a temporary (prvalue). This code is correctly flagged in VS2019 as aberrant.

#include <cstdio>

struct T
   {
   T(char ch) : m_ch( ch ) { printf("Constructing '%c' (0x%08p)\n", ch, this); }
  ~T() { printf("Destructing '%c' (0x%08p)\n", m_ch, this); }
   char m_ch;
   };

T CreateT(char ch)
   {
   return T{ch};
   }

template <typename T_>
void Func1(T_&& t)
   {
   printf("Calling Func1 on object '%c' (0x%08p)\n", t.m_ch, &t);
   }

void Func2(T& t)
   {
   printf("Calling Func2 on object '%c' (0x%08p)\n", t.m_ch, &t);
   }

int main()
{
    const auto& a = CreateT('A'); // ok, const l-value reference to r-value
    auto&& b = CreateT('B'); // ok, deduces to r-value (T&&)
    Func1(CreateT('C')); // ok, deduces to T&&
//  Func2(CreateT('D')); // error, cannot bind non-const l-value ref to r-value (Intel 19.0.5 allows this!)
//  auto& d = CreateT('D'); // error, cannot bind non-const l-value ref to r-value (Intel 19.0.5 allows this!)

    return 0;
}

 

TCE Open Date: 

Monday, November 18, 2019 - 17:13

operator with default definition fails to compile

$
0
0

When overriding operator `=` with default fails to compile.

Is this a bug or am I missing something?

Error Message

icpc -std=c++17 sample.cpp
ld: /tmp/icpczIU0ix.o:(.rodata._ZTV6Entity[_ZTV6Entity]+0x10): undefined reference to `Entity::operator=(Entity const&)'

Source code

#include <iostream>
class Entity {
    public:
        Entity() = default;
        virtual Entity & operator = ( Entity const & )=default;
    public:
        int index;
};

Entity e1 = Entity();

int main() {
    std::cout << e1.index;
    return(0);
}

Intel c++ compiler version

icpc (ICC) 19.0.5.281 20190815

TCE Open Date: 

Sunday, November 17, 2019 - 20:30

Assertion failed at "shared/cfe/edgcpfe/ms_lower_name.c"

$
0
0

Hello,

Our software product is built with MSVC. Now I want to only rebuild the computing module with Intel compiler while other parts are still compiled with MSVC. The computing module consists of three cpp files, and will be built as a .dll file. The following is about compiler versions.

MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.23.28107 for x64 (from Visual Studio 2019)

Intel Compiler: Intel(R) 64, Version 19.0.5.281 Build 20190815

During the compilation, there is no any error reported for our source code,  but the compilation eventually ends up with the following error.

 C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\complex(26): error : assertion failed at: "shared/cfe/edgcpfe/ms_lower_name.c", line 1826.

I have no idea which part of our code triggered this error, any suggestion about how to proceed? Many thanks.

 

TCE Open Date: 

Monday, November 18, 2019 - 04:54

command line remark #10148: option '/Qvec-report0' not supported

$
0
0

When I use Intel(R) compiler 19.0 to compile a program, a note appears: command line remark #10148: option '/Qvec-report0' not supported. I don't know what that means. Is there any expert who can help me? Thanks.

 

AttachmentSize
Downloadimage/png1.PNG97.37 KB

TCE Open Date: 

Wednesday, November 20, 2019 - 05:25

No intel C++ compiler installed even though I have installed it

$
0
0

I have installed intel c++ compiler  in windows  server 2012 R2 using PSXE2019 cluster edtion. But, I don't find the intel c++ compiler tool in the visual studio.

AttachmentSize
Downloadimage/pngNo-intelc++Compiler.png92.39 KB

TCE Open Date: 

Friday, November 22, 2019 - 01:41
Viewing all 1175 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>