왜 std :: optional std :: pair보다 비싸다?
"선택 사항 int
"을 나타낼 수있는 다음 두 가지 접근 방식을 고려하십시오 .
using std_optional_int = std::optional<int>;
using my_optional_int = std::pair<int, bool>;
이 두 가지 기능이 주어지면 ...
auto get_std_optional_int() -> std_optional_int
{
return {42};
}
auto get_my_optional() -> my_optional_int
{
return {42, true};
}
... g ++ trunk 와 clang ++ trunk (와 함께 -std=c++17 -Ofast -fno-exceptions -fno-rtti
) 는 다음 어셈블리를 생성합니다.
get_std_optional_int():
mov rax, rdi
mov DWORD PTR [rdi], 42
mov BYTE PTR [rdi+4], 1
ret
get_my_optional():
movabs rax, 4294967338 // == 0x 0000 0001 0000 002a
ret
get_std_optional_int()
세 가지 mov
지침 이 필요한데 get_my_optional()
하나만 필요한 이유는 무엇 movabs
입니까? 이것이 QoI 문제입니까, 아니면 std::optional
의 사양에이 최적화를 방해하는 것이 있습니까?
또한 기능 사용자는 다음과 관계없이 완전히 최적화 될 수 있습니다.
volatile int a = 0;
volatile int b = 0;
int main()
{
a = get_std_optional_int().value();
b = get_my_optional().first;
}
... 결과 :
main:
mov DWORD PTR a[rip], 42
xor eax, eax
mov DWORD PTR b[rip], 42
ret
libstdc ++는 분명히 P0602를 구현하지 않습니다. "변형 및 옵션은 사소한 복사 / 이동을 전파해야합니다" . 다음을 통해이를 확인할 수 있습니다.
static_assert(std::is_trivially_copyable_v<std::optional<int>>);
libstdc ++에서는 실패하고 libc ++ 및 MSVC 표준 라이브러리 (정말 적절한 이름이 필요하므로 "C ++ 표준 라이브러리의 MSVC 구현"또는 "MSVC STL"이라고 부를 필요가 없음)에 전달됩니다.
물론 MS ABI 때문에 MSVC는 여전히optional<int>
레지스터를 통과하지 못합니다 .
편집 :이 문제는 GCC 8 릴리스 시리즈에서 수정되었습니다.
get_std_optional_int()
세 가지mov
지침 이 필요한데get_my_optional()
하나만 필요한 이유는 무엇movabs
입니까?
The direct cause is that optional
is returned through a hidden pointer while pair
is returned in a register. Why is that, though? The SysV ABI specification, section 3.2.3 Parameter Passing says:
If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference.
Sorting out the C++ mess that is optional
is not easy, but there seem to be a non-trivial copy constructor at least in the optional_base
class of the implementation I checked.
In Calling conventions for different C++ compilers and operating systems by Agner Fog it says that a copy constructor or destructor prevents from returning a structure in registers. This explains why optional
is not returned in registers.
There has to be something else preventing the compiler from doing store merging (merges contiguous stores of immediate values narrower than a word into fewer wider stores to reduce the number of instructions)... Update: gcc bug 82434 - -fstore-merging does not work reliably.
The optimization is technically permitted, even with std::is_trivially_copyable_v<std::optional<int>>
being false. However, it may require an unreasonable degree of "cleverness" for the compiler to find. Also, for the specific case of using std::optional
as the return type of a function, the optimization may need to be done at link-time rather than compile-time.
Performing this optimization would have no effect on any (well-defined) program's observable behavior,* and is therefore implicitly allowed under the as-if rule. However, for reasons which are explained in other answers, the compiler has not been explicitly made aware of that fact and would need to infer it from scratch. Behavioral static analysis is inherently difficult, so the compiler may not be able to prove that this optimization is safe under all circumstances.
Assuming the compiler can find this optimization, it would then need to alter this function's calling convention (i.e. change how the function returns a given value), which normally needs to be done at link time because the calling convention affects all of the call sites. Alternatively, the compiler could inline the function entirely, which may or may not be possible to do at compile time. These steps would not be necessary with a trivially-copyable object, so in this sense the standard does inhibit and complicate the optimization.
std::is_trivially_copyable_v<std::optional<int>>
ought to be true. If it were true, it would be much easier for compilers to discover and perform this optimization. So, to answer your question:
Is this a QoI issue, or is there something in
std::optional
's specification preventing this optimization?
It's both. The spec makes the optimization substantially harder to find, and the implementation is not "smart" enough to find it under those constraints.
* Assuming you haven't done something really weird, like #define int something_else
.
'Nice programing' 카테고리의 다른 글
Express Passport (node.js) 오류 처리 (0) | 2020.11.19 |
---|---|
사전에서 숫자 값 증가 (0) | 2020.11.19 |
DLL을 만들 때 모든 기호 내보내기 (0) | 2020.11.18 |
SQL Server 2008 Developer를 설치할 때 사용할 계정 (0) | 2020.11.18 |
멤버 변수 대 조각의 setArguments (0) | 2020.11.18 |