Пожалуйста, прочтите новое правило об ограничении использования AI-инструментов. ×

Блог пользователя kevlu8

Автор kevlu8, 29 часов назад, По-английски

GCC has many optimization pragmas that can be prepended to files. Generally, they should speed up your code the same amount as the equivalent command-line argument, however this is not always the case.

Theoretically, you would expect

#pragma GCC optimize("O3")

to optimize your code the same way as

g++ main.cpp -O3

But it doesn't! Let's take a look at an example program:

#pragma GCC optimize("O3")
#include <bits/stdc++.h>
using namespace std;

#define SZ 10000005

int arr[SZ] = {};

int main() {
    iota(arr, arr+SZ, 1);
    int tgt = 19999473;
    unordered_set<int> s(arr, arr+SZ);
    for (int i = 0; i < SZ; i++) {
        if (s.count(tgt-arr[i])) {
            cout << arr[i] << ' ' << tgt-arr[i] << '\n';
            break;
        }
    }
}

This is a pretty simple and well-known solution to a problem. It solves the Two-Sum problem using a hashset.

Here's a chart showing the runtime of the program with and without the pragma (running on Ryzen 7 7700X, compiled with no other arguments, mean of 5 trials):

Optimization Time (s)
None, without pragma ~1.79
None, with pragma ~0.98
-O3, without pragma ~0.36
-O3, with pragma ~0.36

As you can see, the pragma does much worse than the -O3 flag, even though they should be equivalent. Why is this?

Looking into the assembly code generated, we can see that the code generated by the -O3 command-line argument actually does not contain any occurrences of unordered_set, whilst the code generated by the pragma contains loads of occurrences. What does this mean?

This actually tells us that -O3 performs more optimizing than the pragma. This is further demonstrated by the the following example:

static int return5() {
    return 5;
}
int main() {
    return return5();
}

With the pragma, the generated assembly code (simplified) is:

main:
    jmp return5
return5:
    mov eax, 5
    ret

With -O3, the generated assembly code is:

main:
    mov eax, 5
    ret

Anyone can see that return5 should be optimized. It's even a static function! But the pragma doesn't optimize it, whilst -O3 does. Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized. After adding __attribute__((always_inline)) to the function, it finally gets inlined. Why this is the case is beyond me.

There are probably many more examples of optimizations that -O3 does that the pragma doesn't, but the most important thing to take away from this is that the pragma is not quite equivalent to -O3.

Most of the time, there is no reason to use #pragma GCC optimize("O3") over -O3, because you can just modify your compile-time command-line arguments. The only place where this is necessary would be competitive programming, since most judges compile with -O2 and sometimes you're able to squeeze into the time limit by using O3 and avx2.

What can we do with this information? Not much, really. Just be aware that the pragma is not equivalent to -O3, and that you should use -O3 over the pragma whenever possible. However, in situations where -O3 is not possible, the pragma is a decent alternative.

One final note: make sure that if you use the pragma, you use it at the top of the file, before any includes. If you use it in the middle of the file, it will only apply to the code after the pragma.

Thanks for reading my first blog post! I hope you enjoyed!

  • Проголосовать: нравится
  • +26
  • Проголосовать: не нравится

»
91 минуту назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

For your second example, I did see gcc 14 inlines the function, which doesn't happen with previous versions.

I suspect whether in the source file in doc means only the current file, excluding the header it includes, which could explain the first example.

  • »
    »
    55 минут назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    I did consider the possibility of the pragma only optimizing functions defined in the source file, but it doesn't seem to be the case:

    File header.h:

    int sum100() {
       int sum = 0;
       for (int i = 1; i <= 100; i++) sum += i;
       return sum;
    }
    

    File main.cpp:

    #include "header.h"
    int main() {
        return sum100();
    }
    

    Without the pragma (and only compiling with g++ main.cpp), sum100() actually does the computation, whilst with the pragma it doesn't and just directly returns the value.