Extremely slow compilation time with insanely big expressions

Hello everybody

Here's my problem. I'm generating a C++ file with large arithmetic expressions (mostly adds and muls) from a high level specification of a numerical problem. Basically, I have a loop nest and at least one such expression in the innermost loop. The innermost loop is decorated with a #pragma simd to enforce vectorisation; the outermost loop is decorated with a #pragma omp for for OpenMP parallelism. In some complex cases, an expression can consists of thousands of arithmetic operations. There are common sub-expressions, which are already captured and assigned to temporaries.

We'd like to compile this kind of codes with `-O3 -xHost`, but the compilation is insanely slow, as in, taking hours to produce an executable. If I instead use `-O2`, the compilation time drops (as expected), but it's still unacceptably high. Disabling vectorisation helps significantly, but again, still not enough. Using a #pragma ivdep in place of #pragma simd also helps (by avoiding data dependency analysis), but again, still not enough. Summing up, using `-O2 -xHost` and #pragma ivdep caused a remarkable drop in compilation time, but we are still far from the target (tens of seconds). Without the high level common sub-expressions elimination, the compilation time is significantly worse. So here's the question: what can I do to improve this ? I'd like to switch on/off the individual optimisations applied at O2/O3 until I find an adequate compromise, but apparently this is not possible -- at least, I couldn't find useful information about how to do so in the various manuals.

An extreme solution would be disabling optimizations entirely... but that'd be too bad. Another possibility would be *not* generating such expressions and expressing the computation as composition of function calls, for performance,but this again would be terrible (e.g., by being much more difficult to vectorize, hiding redundancies in the expressions, missing loop-invariant code motion opportunities, ...).

Another hypothesis for such a horrible compilation time is that the compiler spends a lot of time applying scalar replacement -- ie, basically unpicking (partially) the high level common sub-expressions elimination that we apply.

Thoughts ?

I think this is an interesting problem because it's really stretching the optimization capabilities of the Intel compiler.

I can try to attach a self-contained example to reproduce the issue, but before that I'd need to consult with my boss to see what I can share.

Ah, the above occurs with both icc 2016 and 2017

Thanks,

-- Fabio

Extremely slow compilation time with insanely big expressions

Trending Articles

Karimnagar District Police Office Mobile Numbers List in Telangana State

Sarah Samis, Emil Bove III

Black Angus Grilled Artichokes

The Young & The Wealthy: Crimetown Podcast Points The Spotlight On Detroit’s...

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Lane's Photoshop Master Pack (The Complete Set) - Lane Brown (Painting +...

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

Re: XXX esx.problem.hyperthreading.unmitigated.formatOnHost not found XXX...

Blackstone — Befi Mano (Throw Back Thursday)

【急ぎ】LabVIEWのインストール中のエラー

Efendi – Cleopatra – Single [iTunes Plus M4A]

Bureau of Internal Revenue: Regional Offices (Directory)

Windows Time サービスの ID 36 の警告。これって無視しても大丈夫ですか？

Practice Sheet of Right form of verbs for HSC Students

Reservation in promotion to the PwBD – Clarification on carry forward of...

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

POST /ipp/printer HTTP.1.1 Content-Length: 179 Content -Type: application/ipp

Step by step MIM PAM setup and evaluation Guide – Part 2

99 Rain Status for Whatsapp - Best Rain Dp Collection

Custom TAB in ML81N (Entry Sheet) Header