Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

OpenMP task performance issues

$
0
0

Hello,

I am seeing some surprising performance with OpenMP task support with Intel C++ 19.0 Update 5 that I don't get with GCC 9.2. In the demo app below I expect the in-loop taskwait or the alternative taskgroup to cause it to have a single thread load and run about the same speed as the serial application. GCC gives this but Intel C++ gives 100% CPU load and a 1.8x slowdown. More importantly for our real application, we get the same slowdowns instead of speedups using a set of tasks within a taskgroup or followed by a taskwait.

// Demo for Intel C++ 19.0 Update 5 OpenMP performance issues

// Serial speed of Intel C++ is ~3.3x slower than GCC 9.2.0

// With only taskwait after while loop on quad-core Haswell CPU:
//  Intel C++: 2.8x speedup
//  GCC: 3.5x speedup
//  Both use 100% CPU as expected

// With taskwait in while loop:
//  Intel C++: 100% CPU usage and give 1.8x slowdown
//  GCC: 1 CPU/thread used and no slowdown as expected
//  This taskwait is not needed here but the same issue is seen in real application with multiple tasks followed by a taskwait
//  Same behavior seen with a taskgroup around the one task instead of this taskwait

// icl /Qstd=c++11 /DNOMINMAX /DWIN32_LEAN_AND_MEAN /DNDEBUG /Qopenmp /O3

#include <atomic>
#include <cstddef>
#include <iostream>
#include <omp.h>

int
main()
{
	#pragma omp parallel
	{
		#pragma omp single
		{
			bool run( true );
			std::size_t i( 0 );
			std::atomic_size_t sum( 0u );
			double const wall_time_beg( omp_get_wtime() );
			while ( run ) {
//				#pragma omp taskgroup // Same behavior as the in-loop taskwait
				{
				#pragma omp task shared(sum)
				{
					std::size_t loc( 0u );
					for ( std::size_t k = 0u; k < 2000000000u; ++k ) loc += k/2;
					sum += loc;
				} // omp task
				} // omp taskgroup
				#pragma omp taskwait // GCC gives expected 1 CPU/thread usage: Intel C++ gives 100% CPU and 1.8x slowdown!
				if ( ++i > 50u ) run = false;
			}
			#pragma omp taskwait
			std::cout << "sum = "<< sum << ''<< omp_get_wtime() - wall_time_beg << ''<< i << std::endl;
		} // omp single
	} // omp parallel
}

Can anyone shed light on this? Looks like a buggy task implementation but maybe there is more to it.

Thanks,
Stuart


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>