Hello
I recently came across what seems to be a bug when compiling the following code (designed to reproduce the bug):
intel_test.hpp:
#pragma once #include <cstdint> #pragma pack(push, 1) struct packed_struct { std::uint32_t uint_val; std::uint8_t byte_val; }; #pragma pack(pop) packed_struct create_packed_struct(std::uint8_t byte_val);
intel_test.cpp:
#include "intel_test.hpp" packed_struct create_packed_struct(std::uint8_t byte_val) { packed_struct result; result.uint_val = 0xbaadf00d; result.byte_val = byte_val; return result; }
main.cpp:
#include <iostream> #include "intel_test.hpp" int main() { const auto packed_struct_val = create_packed_struct(0); std::cout << "packed struct: uint_val = "<< std::hex << packed_struct_val.uint_val << ", byte_val = "<< int{packed_struct_val.byte_val} << std::endl; }
This code should result in the output:
"packed struct: uint_val = baadf00d, byte_val = 0"
which is what happens when compiling with Visual Studio 2015 or Intel Compiler 15 in release mode or Intel Compiler 17 in debug mode.
However when compiling with Intel Compiler 17 in release mode I get the following output:
"packed struct: uint_val = 3fb636a4, byte_val = 1".
When I viewed the disassembly I found that create_packed_struct() returns the whole struct packed into the RAX register:
mov rax,0BAADF00Dh movzx r8d,dl shl r8,20h or rax,r8 ret
while the calling code expects the result to be written to memory on the stack pointed to by the RCX register:
xor edx,edx lea rcx,[rbp+10h] call create_packed_struct (013F4B1000h) mov dl,byte ptr [rbp+14h] mov eax,dword ptr [rbp+10h] mov byte ptr [rbp+24h],dl
And since it overwrites RAX after create_packed_struct() returns, the result is always whatever garbage was previously at RBP+10h.
Further testing showed that removing the "#pragma pack" directives fixes the problem (the caller correctly reads the result from RAX).
Adding the __regcall calling convention specifier to the declaration of create_packed_struct() also fixes the problem (the caller correctly reads the result from RAX).
For reference:
release compiler flags: /MP /GS /Zc:rvalueCast /W4 /QxCORE-AVX2 /Gy /Zc:wchar_t /Zi /O2 /Ob2 /GF /Zc:forScope /GR /arch:CORE-AVX2 /Oi /MD /EHsc /nologo /Gw /Zo /Qstd=c++14 /Qvc14 debug compiler flags: /MP /GS /Zc:rvalueCast /W4 /Gy /Zc:wchar_t /Zi /Od /Zc:forScope /RTC1 /GR /MDd /EHsc /nologo /Gw /Zo /Qstd=c++14 /Qvc14 linker flags: /MANIFEST /NXCOMPAT /DYNAMICBASE /DEBUG /MACHINE:X64 /OPT:REF /qnoipo /INCREMENTAL:NO /SUBSYSTEM:CONSOLE /OPT:ICF /NOLOGO /TLBID:1