Блог пользователя Xiaohuba

Автор Xiaohuba, история, 4 месяца назад, По-английски

UPD: I've reduced the code size.

I've recently found that the following code will generate wrong output.

#include <bitset>
#include <iostream>
const int N = 105;
std::bitset<N> ok[N][N];
int n = 5;
int main() {
  ok[2][2].set(2);
  for (int i = n; i; i--)
    for (int j = i; j <= n; j++) {
      ok[i][j] = ok[i][j] | ok[i + 1][j] | ok[i][j - 1];
    }
  std::cout << ok[2][5][2] << '\n';
  return 0;
}

Compiled with -O3 -mtune=skylake -march=skylake, the code outputs 0.

However if you simulate the code you will know that the correct answer should be 1.

Note that the compiler seems to generate wrong sse instruction.

Godbolt link

Again, I believe this code is ub-free, and has nothing to do with implementation-defined stuff.

  • Проголосовать: нравится
  • +63
  • Проголосовать: не нравится

»
4 месяца назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by Xiaohuba (previous revision, new revision, compare).

»
4 месяца назад, # |
Rev. 8   Проголосовать: нравится +13 Проголосовать: не нравится

Further reduced:

typedef struct {
  unsigned long words[2];
} Child;

typedef struct {
  Child child;
} Parent;

Parent my_or(Parent x, const Parent *y) {
  const Child *y_child = &y->child;
  for (int i = 0; i < 2; i++) {
    x.child.words[i] |= y_child->words[i];
  }
  return x;
}

int main() {
  Parent bs[4];
  __builtin_memset(bs, 0, sizeof(bs));

  bs[0].child.words[0] = 1;
  for (int i = 1; i <= 3; i++) {
    bs[i] = my_or(bs[i], &bs[i - 1]);
  }
  return bs[2].child.words[0];
}
»
4 месяца назад, # |
  Проголосовать: нравится -13 Проголосовать: не нравится

I bet AI couldn't do that

»
4 месяца назад, # |
Rev. 2   Проголосовать: нравится +1 Проголосовать: не нравится

It's good with O2. I am always scared to use O3, now I have an additional reason to keep being scared. While searching I found something related to O3 and AVX probably not being addressed for a long time (>10 years) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49001 . This may not be this issue though, as I tried to adjust alignas and that didn't help.