Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

Extremely slow compilation time with insanely big expressions

$
0
0

Hello everybody

 

 

Here's my problem. I'm generating a C++ file with large arithmetic expressions (mostly adds and muls) from a high level specification of a numerical problem. Basically, I have a loop nest and at least one such expression in the innermost loop. The innermost loop is decorated with a #pragma simd to enforce vectorisation; the outermost loop is decorated with a #pragma omp for for OpenMP parallelism. In some complex cases, an expression can consists of thousands of arithmetic operations. There are common sub-expressions, which are already captured and assigned to temporaries.

We'd like to compile this kind of codes with `-O3 -xHost`, but the compilation is insanely slow, as in, taking hours to produce an executable. If I instead use `-O2`, the compilation time drops (as expected), but it's still unacceptably high. Disabling vectorisation helps significantly, but again, still not enough. Using a #pragma ivdep in place of #pragma simd also helps (by avoiding data dependency analysis), but again, still not enough. Summing up, using `-O2 -xHost` and #pragma ivdep caused a remarkable drop in compilation time, but we are still far from the target (tens of seconds). Without the high level common sub-expressions elimination, the compilation time is significantly worse. So here's the question: what can I do to improve this ? I'd like to switch on/off the individual optimisations applied at O2/O3 until I find an adequate compromise, but apparently this is not possible -- at least, I couldn't find useful information about how to do so in the various manuals.

An extreme solution would be disabling optimizations entirely... but that'd be too bad. Another possibility would be *not* generating such expressions and expressing the computation as composition of function calls, for performance,but this again would be terrible (e.g., by being much more difficult to vectorize, hiding redundancies in the expressions, missing loop-invariant code motion opportunities, ...).

Another hypothesis for such a horrible compilation time is that the compiler spends a lot of time applying scalar replacement -- ie, basically unpicking (partially) the high level common sub-expressions elimination that we apply.

Thoughts ?

I think this is an interesting problem because it's really stretching the optimization capabilities of the Intel compiler. 

I can try to attach a self-contained example to reproduce the issue, but before that I'd need to consult with my boss to see what I can share.

Ah, the above occurs with both icc 2016 and 2017

Thanks,

-- Fabio

 

 


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>