Записи в блоге

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	155
8	TheScrasse	154
9	Dominater069	153
10	djm03178	152

Блог пользователя ToxicPie9

You won't believe how this simple trick defeated the unexplained bug destroying every top LGM

Автор ToxicPie9, 9 месяцев назад, По-английски

TL;DR

Currently people are discussing a slowdown bug on Codeforces that seems to happen randomly, and can cause code to run 100x slower and get TLE. More details in pajenegod's blog post.

In this article, I present a mitigation: add the following to your code.

#include <windows.h>
void *operator new(size_t size) {
    if (void *ptr = HeapAlloc(GetProcessHeap(), 0, size ? size : 1)) return ptr;
    throw std::bad_alloc{};
}
void operator delete(void *ptr) {
    HeapFree(GetProcessHeap(), 0, ptr);
}

If you use malloc/free in C++ (you shouldn't), also change them.

void *my_malloc(size_t size) {
    if (void *ptr = HeapAlloc(GetProcessHeap(), 0, size ? size : 1)) return ptr;
    throw std::bad_alloc{};
}
void my_free(void *ptr) {
    HeapFree(GetProcessHeap(), 0, ptr);
}

If you use aligned new/delete in C++, also change them.

void *operator new(size_t size, align_val_t al) {
    auto align = (uintptr_t)al;
    void *mem = ::operator new(size + align + sizeof(void *));
    void **ptr = (void **)(((uintptr_t)mem + align + sizeof(void *)) & ~(align - 1));
    ptr[-1] = mem;
    return ptr;
}
void operator delete(void *ptr, align_val_t al) noexcept {
    ::operator delete(((void **)ptr)[-1]);
}

DISCLAIMER

TL;DR, part 2

This section is for Codeforces admins including MikeMirzayanov.

I have very high confidence that some poorly implemented functions in Microsoft's C runtime is the root cause of the bug which is outside contestants' control, creating unfair disadvantages in contests. The bug has likely existed since C++17 (64-bit) first came out on Codeforces.

Every C++ submission on Codeforces (and also the entire Codeforces infrastructure) currently depends on that library, on top of the internals of Windows that very few people has any clue about. Both of these are infamous for having bad implementations in many places, and I don't think it's a good idea to rely on those for one of the online judges with the most users.

I ask Codeforces developers to seriously consider other, more reliable possibilities, for example an option for judgment on Linux machines, possibly using one of the plenty judge server implementations already available. An increased stability and robustness would hopefully make contest and judging experience on Codeforces a lot better.

What's happening?

There has been a bug on Codeforces causing random TLEs recently. Sometimes when a seemingly unrelated line of code is added, or the order of two lines are swapped, or a vector size is changed, etc., the code suddenly becomes 100x slower on certain inputs. The bug got some attention because of its unexplained behavior, and the fact that many submissions (include tourist's) that should have passed a problem got TLE or were hacked because of this. An example showing the bug (by kostia244):

#include <bits/stdc++.h>

using namespace std;

int main() {
    vector<vector<int>> TLE(40000, vector<int>(7));

    string s;
    for (int i = 0; i < 17; i++) s += to_string(i) + " ";

    for (int j = 0; j < 10000; ++j) {
        istringstream ss(s);
        int x;
        while (ss >> x);
    }
}

A loop of about 10000 * 17 iterations takes 1.5s to complete. The weird and funny thing about the bug is that if 40000 or 7 is changed slightly, or if the vector is moved to the middle, then nothing happens and the code runs normally. This peculiar behavior indicates something is buggy and TLE is not caused by the code's authors. It only happens on Codeforces, and only with 64-bit C++. No one was able to reproduce it anywhere else or explain why.

After pajenegod also posted a blog on the situation, people in the comments (drdilyor, mtw, et al.) quickly discovered that it's only the (de-)allocations that matter. Here's my attempt at reducing the code to only new and delete:

#include <bits/stdc++.h>
using namespace std;

void *alloc(size_t s) {
    auto c = new char[s];
    asm volatile("" ::"r"(c));
    return c;
}

int main() {
    auto c = alloc(0x1c);
    for (int i = 0; i < 40000; i++) {
        alloc(0x1c);
    }
    delete[] c;
    for (int i = 0; i < 100000; i++) {
        delete[] alloc(0x1c);
    }
}

This consistently takes around 900ms in both C++17 (64) and C++20 (64).

An overview of what I investigated in the past few hours

The vector<int>(7) in one case immediately caught my attention, because it makes an allocation of 28 bytes. 28 is special because it's $$$32 - 4$$$, where $$$32$$$ is a common chunk size of various memory allocation algorithms, and $$$4$$$ is the size of a pointer used to store chunks (if you didn't know, on Codeforces malloc uses 32-bit libraries and returns 32-bit pointers, even with 64-bit C++. you are scammed). This combined with the fact that changing the 7 makes the bug disappear, hints that something is wrong with allocations.

By default new and delete with GCC simply call the malloc and free functions from the C standard library. I tried adding some time measurements to see if malloc is really at fault. And surely enough, it is.

Code

#include <bits/stdc++.h>

using namespace std;
using namespace std::chrono;

nanoseconds alloc_time, dealloc_time;

void *alloc(size_t s) {
    auto t = high_resolution_clock::now();
    auto p = (void *)malloc(s);
    alloc_time += high_resolution_clock::now() - t;
    asm volatile("" ::"r"(p));
    return p;
}

void dealloc(void *p) {
    auto t = high_resolution_clock::now();
    free(p);
    dealloc_time += high_resolution_clock::now() - t;
}

int main() {
    auto c = alloc(0x1c);
    for (int i = 0; i < 40000; i++) {
        alloc(0x1c);
    }
    dealloc(c);
    for (int i = 0; i < 100000; i++) {
        dealloc(alloc(0x1c));
    }
    printf("alloc: %lld ms\n", round<milliseconds>(alloc_time).count());
    printf("dealloc: %lld ms\n", round<milliseconds>(dealloc_time).count());
}

Result in custom test

alloc: 899 ms
dealloc: 0 ms

=====
Used: 904 ms, 1904 KB

Knowing that it's not caused by the C++ STL themselves, but the allocation functions called when creating them, debugging became a little easier.

With my superficial knowledge of heaps, I can quickly guess what is going on. The 40000 mallocs eats the top chunk to a specific size, then the first free sends a chunk into the tcache, or the 0x20 fastbin if that's not a thing. Then the next sequence of malloc/frees keep taking and returning the same chunk, and this combined with the 40000 mallocs somehow breaks the allocation algorithm, making it take a lot more time, is what happened, right? Oh, wait. uh. no...

...wait. WAIT. I forgor 💀

The above is what might happen in GNU's C library. Codeforces runs on Windows, using Microsoft's C library. What does that do?

Codeforces's C++20 (64) uses WinLibs, which uses either MSVCRT (Microsoft Visual C++ Runtime) or UCRT (Universal C Runtime) as the C standard library. We don't know what exact setup Codeforces has, but we can first investigate what each library does. Fortunately, the source code of both are available.

Let's look at UCRT first. In non-debug mode, malloc simply calls _malloc_base, which is just a wrapper to the Windows API HeapAlloc. Similarly free calls HeapFree. These memory management functions are provided by the operating system, could they be the source of bugs? We can test this by replacing malloc and free calls by the API calls directly.

Test code

#include <bits/stdc++.h>
#include <windows.h>

using namespace std;

void *operator new(size_t size) {
    if (void *ptr = HeapAlloc(GetProcessHeap(), 0, size ? size : 1)) return ptr;
    throw std::bad_alloc{};
}

void operator delete(void *ptr) {
    HeapFree(GetProcessHeap(), 0, ptr);
}

void *alloc(size_t s) {
    auto c = new char[s];
    asm volatile("" ::"r"(c));
    return c;
}

int main() {
    auto c = alloc(0x1c);
    for (int i = 0; i < 40000; i++) {
        alloc(0x1c);
    }
    delete[] c;
    for (int i = 0; i < 100000; i++) {
        delete[] alloc(0x1c);
    }
}

This runs normally, which means that UCRT is unlikely the reason of the bug.

Then we look at MSVCRT's source code. It also uses Windows API functions, except when it doesn't. Depending on the system version, there are 2 more implementation of memory allocation algorithms, V6_HeapAlloc and V5_HeapAlloc, which seem to allocate memory by looking through a linked list for free chunks. I did not have enough time or energy to read the implementation, but it is possible that the function has bad performance in certain cases.

Then I found this interesting article, which shows that MSVCRT's malloc might really be some terrible implementation, and Codeforces users are not its first victims.

If we can know for sure which C library C++20 (winlibs) and C++17 (msys2) use, we can basically pin the possibility down to one bad function in a library. However, it's difficult to get the exact setup on Codeforces even with custom test. If only there was a way to download compiled binaries from Codeforces... wait, there is. On Polygon, full problem packages include compiled versions of checkers, validators, etc. I can simply add a generator to a random problem and download the package to obtain an executable file (although only C++17). Fortunately, it has some debug info, and is statically linked.

After briefly inspecting the 3MB, impossible-to-reverse-engineer PE32+ executable, I can confirm that it indeed uses MSVCRT's malloc. Now we can basically be sure that the cause of the weirdest slowdown bugs in Codeforces history is the poor implementation of the C runtime.

A possible fix?

Now we know the exact issue that caused the mysterious slowdowns: Microsoft's astronomically disastrous dumpster fire C library. How to fix this? It's easy, just replace it with memory management libraries that are known to work well, for example jemalloc, mimalloc (ironically made by MS themselves to replace their library functions they themselves know are terrible), or ptmalloc. One could also replace the entire library with, say, UCRT or GNU libc.

But as regular Codeforces users we can't control what libraries we use; only Codeforces is able to change that. However, we can rewrite our code to not use the C library to allocate at all. It's possible to do it by calling raw Windows API functions, as shown above. The following is copy-pasted from the TL;DR section:

#include <windows.h>
void *operator new(size_t size) {
    if (void *ptr = HeapAlloc(GetProcessHeap(), 0, size ? size : 1)) return ptr;
    throw std::bad_alloc{};
}
void operator delete(void *ptr) {
    HeapFree(GetProcessHeap(), 0, ptr);
}

If you use malloc/free in C++ (you shouldn't), also change them.

void *my_malloc(size_t size) {
    if (void *ptr = HeapAlloc(GetProcessHeap(), 0, size ? size : 1)) return ptr;
    throw std::bad_alloc{};
}
void my_free(void *ptr) {
    HeapFree(GetProcessHeap(), 0, ptr);
}

If you use aligned new/delete in C++, also change them.

void *operator new(size_t size, align_val_t al) {
    auto align = (uintptr_t)al;
    void *mem = ::operator new(size + align + sizeof(void *));
    void **ptr = (void **)(((uintptr_t)mem + align + sizeof(void *)) & ~(align - 1));
    ptr[-1] = mem;
    return ptr;
}
void operator delete(void *ptr, align_val_t al) noexcept {
    ::operator delete(((void **)ptr)[-1]);
}

DISCLAIMER

Does this fix the slowdown bug? No one will likely ever know, because unlike Microsoft's C libraries or STL, the lack of source code (and the sheer complexity) makes the Windows API impossible to understand. But at least we now know the bug is probably caused by library functions and can pray that skipping the library solves it.

In a way you could say that the bug is not Codeforces's fault... but it's not really not their fault either, as there is a very simple solution (that would also solve countless other issues reported by users before): run GNU/Linux, like literally every other online judge of considerable scale.

Update: About hacks

After the bug was discovered, people are hacking submissions by using special inputs that cause a certain sequence of allocations to happen. Since this is a problem of the library and not of contestants' codes, I don't think it's fair doing that.

A similar case is hacking bad std::unordered_map, compared to, e.g., hacking bad rolling hashes. The latter is a problem with contestant implementations, and I enjoy hacking them. But the former is just bad implementation from C++ STL, and isn't really the contestants' fault, so I don't usually hack them. This problem is more well-known though, and has many simple fixes available, so it's probably OK to hack them.

Before this slowdown bug and its fixes are well-known, I think hacking using the bug is equivalent to exploiting n-day vulnerabilities on Codeforces to manipulate verdicts, and I'm not very supportive of that. Let's hope a patch is deployed soon.

Afterwords

I'm not touching Windows or PE for the next 6 months. Leave a comment if you play CTFs and understand my feeling.

Subscribe for more posts like this where i solve CP bugaboos with the most absurd techniques

Полный текст и комментарии »

tle, bug, c++, libc, windows

+794

ToxicPie9
9 месяцев назад
26

Will there be April Fools Day Contest 2023?

Автор ToxicPie9, 20 месяцев назад, По-английски

Having a special joke contest on April 1st each year has become a tradition on Codeforces. Usually, April Fools contests are prepared by Nickolas. However, last year there wasn't an official one, and a contest was made by Agnimandur and magnus.hegdahl instead.

Since April 1st is coming soon, I want to ask: Is there a plan for Codeforces to hold April Fools Day Contest 2023?

If an official April Fools contest is not planned this year, a group of friends and I (AlperenT, Ari, BucketPotato, flamestorm, ScarletS, ToxicPie9) already have a lot of problem ideas and are ready to prepare a round (we will need help from MikeMirzayanov to make it happen).

Полный текст и комментарии »

+218

ToxicPie9
20 месяцев назад
4

[Tutorial/Parody Post] Common Mistakes in Competitive Programming and How to Avoid Them

Автор ToxicPie9, 23 месяца назад, По-английски

Yesterday YouKn0wWho posted a blog explaining common mistakes in competitive programming and how to avoid them. I was greatly inspired by that post so I decided to make my own.

I have compiled some of the mistakes that I didn't make in my early Competitive Programming phase. I also mentioned how to avoid them. Also, in most cases, I will give you a chance to find out what the bug is before I reveal the culprit as I tried to make this blog interactive. The codes that I have used in this blog have been written in Rust as it is the most beloved language for CP.

Mistake 1

Check out the following code:

Code

fn main() {
    let x: i32 = 1_000_000_000;
    let y: i32 = 1_000_000_000;
    println!("{}", x * y);
}

The output should be $$$10^{18}$$$. But if you run the code, you will get a different output. Why?

Reason

Because it won't even compile.

error: this arithmetic operation will overflow
 --> foo.rs:4:20
  |
4 |     println!("{}", x * y);
  |                    ^^^^^ attempt to compute `1000000000_i32 * 1000000000_i32`, which would overflow
  |
  = note: `#[deny(arithmetic_overflow)]` on by default

error: aborting due to previous error

Overflow!

$$$10^9 \times 10^9$$$ does not fit inside the i32 type which can only hold numbers up to $$$2^{31}-1$$$. To store large numbers like $$$10^{18}$$$, you'll need bigger integer types like i64.

Mistake 2

Check out the following code:

Code

fn test(v: Vec<i32>) -> usize {
    v.len()
}

fn main() {
    const N: usize = 1_000_000;
    let mut sum = 0;
    let v = vec![0; N];
    for _i in 1..=N {
        sum += test(v);
    }
    println!("{}", sum);
}

Try to run this locally. What is the time complexity of this?

Is it $$$O(n)$$$?

Reason

You guessed correctly! It won't compile.

error[E0382]: use of moved value: `v`
  --> foo.rs:10:21
   |
8  |     let v = vec![0; N];
   |         - move occurs because `v` has type `Vec<i32>`, which does not implement the `Copy` trait
9  |     for _i in 1..=N {
   |     --------------- inside of this loop
10 |         sum += test(v);
   |                     ^ value moved here, in previous iteration of loop
   |
note: consider changing this parameter type in function `test` to borrow instead if owning the value isn't necessary
  --> foo.rs:1:12
   |
1  | fn test(v: Vec<i32>) -> usize {
   |    ----    ^^^^^^^^ this parameter takes ownership of the value
   |    |
   |    in this function
help: consider cloning the value if the performance cost is acceptable
   |
10 |         sum += test(v.clone());
   |                      ++++++++

Types like Vec don't have the Copy trait to avoid accidental copying. So, it is moved into the test function. As you can see from the extremely helpful compile error messages, you moved v into a function during the first iteration of the loop, so you can't use it in the next iterations.

How to fix this? The compiler already told you the answer! You can either borrow the parameter (&Vec instead of Vec), or add .clone() to get an owned version of v. (And as the compiler message suggests, the second method will be slower.)

Mistake 3

Check out the following code:

Code

use std::io::{stdin, BufRead};

const N: usize = 100_000;

fn main() {
    let test_cases = 100_000;
    for _i in 0..test_cases {
        let a = vec![0; N];
        let n = stdin().lock().lines().next().parse();
        let mut sum: i32 = 0;
        for i in 0..n {
            sum += a[i];
        }
        println!("{}", sum);
    }
    // it is guaranteed that total sum of n is <= 100000
}

Notice that it is guaranteed that total sum of n is <= 100000. So how many operations will the code take in the worst case?

Reason

You guessed correctly again! It won't compile.

error[E0599]: no method named `parse` found for enum `Option` in the current scope
 --> foo.rs:9:47
  |
9 |         let n = stdin().lock().lines().next().parse();
  |                                               ^^^^^ method not found in `Option<Result<String, std::io::Error>>`

This is because a BufRead returns a Result<String, std::io::Error> when you read a line. The lines() method returns an iterator of lines. The first() method returns the first value of an iterator, which can be None. Its type is Option<Result<String, std::io::Error>>. Finally, the parse() method parses the string into a number, which may fail, so it returns a Result.

The correct code is to get the "success" values inside Options and Results! You can do it with unwrap (NOT RECOMMENDED). For example: let n = stdin().lock().lines().next().unwrap().unwrap().parse().unwrap();.

However, this is considered bad pratice, as you should always handle the error and none values properly.

Mistake 4

What is happening in the following code?

Code

fn main() {
    const N: usize = 5;
    let a = [0; N];
    a.fill(1);
    println!("{:?}", a);
}

The output is supposed to be [1, 1, 1, 1, 1]. But it's not the case actually! Why?

Reason

Because it won't compile.

error[E0596]: cannot borrow `a` as mutable, as it is not declared as mutable
 --> foo.rs:4:5
  |
3 |     let a = [0; N];
  |         - help: consider changing this to be mutable: `mut a`
4 |     a.fill(1);
  |     ^^^^^^^^^ cannot borrow as mutable

Since you modify a, you need to declare it as mutable with let mut a = [0; N];.

Mistake 5

Don't use endl! If your code needs to print millions of newlines, then using endl turns out to be really slower than using '\n'. Why?

Reason

Mistake 6

Use pow() function for integer calculations.

Why?

Rust has the functions pow(), ilog(), ilog2(), etc. for integer types. These are pretty safe to use, since their results are either integers or rounded to integers so are precise. If you worry about overflowing, functions like checked_pow() return Option to help you check cases where integer overflows happen.

fn main() {
    assert_eq!(5i32.pow(2), 25);
    assert_eq!(69i32.ilog(3), 3);
}

Mistake 7

Run the following code

Code

fn main() {
    let v = Vec::<i32>::new();
    println!("{}", v.len() - 1);
}

You might expect the output to be $$$-1$$$. But the output is actually not! Why?

Reason

Because of integer overflow.

thread 'main' panicked at 'attempt to subtract with overflow', foo.rs:3:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

v.len() is an unsigned integer 0 and is the lowest number an unsigned type can represent. Attempting to subtract a positive value from it will result in an overflow.

Mistake 8

Using eprintln! might be a good way to debug your code as it doesn't output to the standard output. But leaving the eprintln! instances in your code while submitting in OJ might be one of the worst ways of getting TLE.

Smash me for more info.

Mistake 9

Look at the following code

Code

fn main() {
    let a = 1;
    let b = 3;
    // we want to take a + b and bitwise & it with 3
    let result = a + b & 3;
    println!("{}", result);
}

The output is $$$0$$$, which is correct. Why?

Reason

Mistake 11

Consider the following code for calculating the maximum occurrence in an array.

Code

use std::collections::HashMap;
use std::io::{stdin, BufRead};

fn main() {
    const N: usize = 30_000;
    let mut mp = HashMap::<i64, usize>::new();
    for _i in 0..N {
        let x = stdin().lock().lines().next().unwrap().unwrap().parse().unwrap();
        *mp.entry(x).or_insert(0) += 1;
    }
    println!("{}", mp.values().max().unwrap())
}

This code seems like it can be hacked easily but in fact, for almost every valid input, it should get AC.

Why?

Mistake 12

Run the following code:

Code

use std::collections::HashMap;

fn main() {
    let mut mp = HashMap::<i64, usize>::new();
    // add 1 to 5 to the map
    for k in 0..5 {
        *mp.entry(k).or_insert(0) += 1;
    }
    let mut cnt: usize = 0;
    // check how many numbers in the keys exist in the map
    for k in mp.keys() {
        if *mp.entry(*k).or_insert(0) != 0 {
            cnt += 1;
        }
    }
    // now print the size of the map
    println!("{} {}", cnt, mp.len())
}

What will be the size of the map? $$$5$$$?

Check

Wrong, it's actually compile error.

error[E0502]: cannot borrow `mp` as mutable because it is also borrowed as immutable
  --> foo.rs:11:13
   |
10 |     for k in mp.keys() {
   |              ---------
   |              |
   |              immutable borrow occurs here
   |              immutable borrow later used here
11 |         if *mp.entry(*k).or_insert(0) != 0 {
   |             ^^^^^^^^^^^^ mutable borrow occurs here

In Rust, when you have a mutable borrow of a variable, then you cannot have other references to that variable, even immutable ones. The borrow checker of Rust is an extremely powerful feature, as it prevents many bugs (usually data races) that are caused by accidentally changing values while using them. You can learn more about references and borrowing by smashing me.

Mistake 13

I forgot Mistake 10.

Mistake 14

Check this out.

Code

fn main() {
    const N: usize = 1_000_000;
    let mut s = String::new();
    for _i in 0..N {
        s = s + "ゑ";
    }
    println!("{}", s)
}

What is the time complexity of this?

Check

More Mistakes and Non-Mistakes

Do not insert or erase from a container (Vec, HashSet etc) while traversing it using for x in s.iter() like syntax at the same time. This is because the compiler won't let you do that. In Rust, you are not allowed to borrow a variable while it is also borrowed as mutable at the same time (e.g. when inserting).
Create variables only when you need them instead of just declaring let a, b, c, d, e, f, g, h; and 69 other variables at the beginning of your code! This is because Rust's grammar doesn't let you do that.
If you want to count the number of set bits in an i64 number, then use the count_ones() method instead of the __builtin_popcount function. This is because there is no __builtin_popcount function.
If you want to compute the square root of an f64 number, then use the sqrt() method instead of sqrt() because sqrt() function takes self as input whereas sqrt() takes self.
Speaking of Runtime Error, the most likely case of getting Runtime Error is when your code encounters an error while running.
let x: i64 = 1 << 40; will not overflow as Rust will try to deduce the types of integer literals if they're not specified. In this case, the compiler determines that 1 is i64.
Don't accidently write your code in C++.

Thanks for reading the blog. Feel free to add more mistakes that are common in CP in the comment section.

See you on a later episode, friend blobheart!

P.S. Although this is mostly a joke post, I tried to be accurate about the facts and did not intentionally put incorrect information in it (which means you may treat it as a somewhat educational blog). It also aims to showcase some features of Rust that helps you avoid some common CP mistakes in languages like C++.

Полный текст и комментарии »

+337

ToxicPie9
23 месяца назад
6

ICPC World Finals 2021 (Dhaka) Discord Group

Автор ToxicPie9, история, 2 года назад, По-английски

Hello Codeforces!

I have created a Discord server for the upcoming ICPC World Finals in Dhaka, so the finalists can have a place to chat with each other.

You can join it with the link: https://discord.gg/p3W9YCZ3AK

Полный текст и комментарии »

ToxicPie9
2 года назад
8

Is sus taking over the world?

Автор ToxicPie9, история, 3 года назад, По-английски

sus has just overtaken Errichto and became the second top contributor on Codeforces. During the last few days, his contribution points skyrocketed at a terrifying rate.

Will sus eventually top Monogon and become the new ruler of Codeforces? Or is the power of Monogon's comments simply unmatched by mere mortals?

meme

Полный текст и комментарии »

contribution

ToxicPie9
3 года назад
20