GCC Optimization Pragmas Lie to You

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	156
8	TheScrasse	154
9	Dominater069	153
9	nor	153

GCC has many optimization pragmas that can be prepended to files. Generally, they should speed up your code the same amount as the equivalent command-line argument, however this is not always the case.

Theoretically, you would expect

#pragma GCC optimize("O3")

to optimize your code the same way as

g++ main.cpp -O3

But it doesn't! Let's take a look at an example program:

#pragma GCC optimize("O3")
#include <bits/stdc++.h>
using namespace std;

#define SZ 10000005

int arr[SZ] = {};

int main() {
    iota(arr, arr+SZ, 1);
    int tgt = 19999473;
    unordered_set<int> s(arr, arr+SZ);
    for (int i = 0; i < SZ; i++) {
        if (s.count(tgt-arr[i])) {
            cout << arr[i] << ' ' << tgt-arr[i] << '\n';
            break;
        }
    }
}

This is a pretty simple and well-known solution to a problem. It solves the Two-Sum problem using a hashset.

Here's a chart showing the runtime of the program with and without the pragma (running on Ryzen 7 7700X, compiled with no other arguments, mean of 5 trials):

Optimization	Time (s)
None, without pragma	~1.79
None, with pragma	~0.98
-O3, without pragma	~0.36
-O3, with pragma	~0.36

As you can see, the pragma does much worse than the -O3 flag, even though they should be equivalent. Why is this?

Looking into the assembly code generated, we can see that the code generated by the -O3 command-line argument actually does not contain any occurrences of unordered_set, whilst the code generated by the pragma contains loads of occurrences. What does this mean?

This actually tells us that -O3 performs more optimizing (specifically, inlining) than the pragma. This is further demonstrated by the the following example:

#pragma GCC optimize("O3")
static int return5() {
    return 5;
}
int main() {
    return return5();
}

With the pragma, the generated assembly code (simplified) is:

main:
    jmp return5
return5:
    mov eax, 5
    ret

With -O3, the generated assembly code is:

main:
    mov eax, 5
    ret

Anyone can see that return5 should be inlined. It's even a static function! But the pragma doesn't inline it, whilst -O3 does. Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized. After adding __attribute__((always_inline)) to the function, it finally gets inlined. Why this is the case is beyond me. Although this is a very minor example, as shown by the first example, these kinds of small improvements matter more and more as the program gets more complex.

There are probably many more examples of optimizations that -O3 does that the pragma doesn't, but the most important thing to take away from this is that the pragma is not quite equivalent to -O3.

Most of the time, there is no reason to use #pragma GCC optimize("O3") over -O3, because you can just modify your compile-time command-line arguments. The only place where this is necessary would be competitive programming, since most judges compile with -O2 and sometimes you're able to squeeze into the time limit by using O3 and avx2.

What can we do with this information? Not much, really. Just be aware that the pragma is not equivalent to -O3, and that you should use -O3 over the pragma whenever possible. However, in situations where specifying -O3 in the command line is not possible, the pragma is a passable alternative.

One final note: make sure that if you use the pragma, you use it at the top of the file, before any includes. If you use it in the middle of the file, it will only apply to the code after the pragma.

Thanks for reading my first blog post! I hope you enjoyed!

Комментарии (9)

Написать комментарий?

TianyiChen

2 месяца назад, # |

For your second example, I did see gcc 14 inlines the function, which doesn't happen with previous versions.

I suspect whether in the source file in doc means only the current file, excluding the header it includes, which could explain the first example.

→ Ответить

kevlu8

2 месяца назад, # ^ |

← Rev. 2 →

I did consider the possibility of the pragma only optimizing functions defined in the source file, but it doesn't seem to be the case:

File header.h:

int sum100() {
   int sum = 0;
   for (int i = 1; i <= 100; i++) sum += i;
   return sum;
}

File main.cpp:

#include "header.h"
int main() {
    return sum100();
}

Without the pragma (and only compiling with g++ main.cpp), sum100() actually does the computation, whilst with the pragma it doesn't and just directly returns the value.

heavenly_

+19

weren't you in that one github PR?

htoshiro

all hail king *fuck you*

Yup, that's me, for better or for worse...

sugo143

actually crazy that people know you from that now

Ylandolsi

so we should use this #pragma GCC optimize("O3") or what ?

qmk

+53

Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized.

While you tried all of the inline flags listed in gcc/Optimize-Options, it might not be enough:

Depending on the target and how GCC was configured, a slightly different set of optimizations may be enabled at each -O level than those listed here. You can invoke GCC with -Q --help=optimizers to find out the exact set of optimizations that are enabled at each level.

Only one way to find out the exact set of optimizations!

> g++ -Q --help=optimizers | grep inline
  -ffold-simple-inlines                 [available in C++, ObjC++]
  -finline                              [disabled]
  -finline-atomics                      [enabled]
  -finline-functions                    [disabled]
  -finline-functions-called-once        [disabled]
  -finline-small-functions              [disabled]
  -finline-stringops                    [disabled]
  -finline-stringops=                   [default]
  -flive-patching                       -flive-patching=inline-clone
  -flive-patching=[inline-only-static|inline-clone]     [default]
  -fno-inline-stringops                 [enabled]

> g++ -O3 -Q --help=optimizers | grep inline
  -ffold-simple-inlines                 [available in C++, ObjC++]
  -finline                              [enabled]
  -finline-atomics                      [enabled]
  -finline-functions                    [enabled]
  -finline-functions-called-once        [enabled]
  -finline-small-functions              [enabled]
  -finline-stringops                    [disabled]
  -finline-stringops=                   [default]
  -flive-patching                       -flive-patching=inline-clone
  -flive-patching=[inline-only-static|inline-clone]     [default]
  -fno-inline-stringops                 [enabled]

As you can see, -finline was missing from the optimizer, simply add it as follows:

#pragma GCC optimize("O3,inline")

main.cpp

#pragma GCC optimize("O3,inline")
#include <bits/stdc++.h>
using namespace std;

#define SZ 10000005

int arr[SZ] = {};

int main() {
    iota(arr, arr+SZ, 1);
    int tgt = 19999473;
    unordered_set<int> s(arr, arr+SZ);
    for (int i = 0; i < SZ; i++) {
        if (s.count(tgt-arr[i])) {
            cout << arr[i] << ' ' << tgt-arr[i] << '\n';
            break;
        }
    }
}

> g++ main.cpp && time ./a.out
9999468 10000005
./a.out  0.38s user 0.12s system 99% cpu 0.498 total

Yields a similar performance as -O3 flag! For more information please refer to C++ and the -O3 compilation flag

If you want a 1:1 match with -O3 and not just -finline, try to match pragma with g++ -O3 -Q --help=optimizers. For the scope of this comment, it is left as an exercise to the readers.

The only place where this is necessary would be competitive programming, since most judges compile with -O2

The one judge you should worry about is leetcode

Not only do they use #pragma GCC optimize ("O2") instead of adding -O2 they added it at the bottom of the merged code. You can see it yourself by adding the following snippet:

int src = [] {
    system("cat /mnt/prog_joined.cpp");
    return 0;
}();

If you click on the icon next to C++ on leetcode submit UI you will see this:

C++ Compiled with clang 17 using the latest C++ 23 standard, and libstdc++ provided by GCC 13.
Your code is compiled with level two optimization (-O2). AddressSanitizer is also enabled to help detect out-of-bounds and use-after-free bugs.
Most standard library headers are already included automatically for your convenience.

So it is compiled with clang++ and clang++ main.cpp -finline && time ./a.out doesn't help improving the runtime. Does anyone know a similar command to g++ -Q --help=optimizers for clang++?

+13

Ah, you're right! Can't believe that I forgot about -finline... haha

In that case, #pragma GCC optimize("O3,inline") would be pretty close to optimal, while still being relatively short.

I've done some light digging on clang optimizations, but it doesn't seem like clang has anything similar to #pragma GCC optimize. I guess this choice was intentional as LeetCode test data is usually weak, and constant optimization is usually enough to pass with suboptimal solutions. I do find it weird that LeetCode includes a useless pragma at the end of the file though...

Блог пользователя kevlu8