Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1175

Two optimization suggestions

$
0
0

Hello. I am want to propose two features from MSVC compiler that not available in ICC. All tests are made on windows x86.

1) Combining several small mov's to the one. Example:

struct struct_t
{
	char a, b, c, d;
};

void __declspec(noinline) test(struct_t& s)
{
	s.a = 'a';
	s.b = 'b';
	s.c = 'c';
	s.d = 'd';
}

Code by the current ICC with -Ox:

        mov       BYTE PTR [eax], 97
        mov       BYTE PTR [1+eax], 98
        mov       BYTE PTR [2+eax], 99
        mov       BYTE PTR [3+eax], 100
        ret

This four byte mov's can be combined to the single dword mov like it does MSVC:

        mov       DWORD PTR [ecx], 1684234849		; 64636261H
        ret

2) Eliminate useless copying from volatile memory to registers. I think it's correct, and MSVC does this optimization. Example:

#include <stdio.h>

bool isInt(int)
{
	return true;
}

bool isInt(short)
{
	return false;
}

void __declspec(noinline) test()
{
	volatile int a = 5;
	volatile short b = 2;
	printf("int = %i, short = %i\n", isInt(a), isInt(b));
}

Result:

        sub       esp, 8
        mov       eax, 2
        mov       DWORD PTR [esp], 5
        mov       WORD PTR [4+esp], ax
        mov       edx, DWORD PTR [esp] ; <- unnecessary copying
        movzx     ecx, WORD PTR [4+esp] ; <- unnecessary copying
        ; here was a copying from edx and ecx to non-volatile memory, but it was eliminated as a deadcode
        push      0
        push      1
        push      OFFSET "int = %i, float = %i\n"
        call      DWORD PTR [__imp__printf]
        add       esp, 12
        add       esp, 8
        ret

And here we can also see a two uncombined add's before return.


Viewing all articles
Browse latest Browse all 1175

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>