EbTech's blog

By EbTech, history, 6 years ago, In English

Having spent a number of Codeforces rounds practicing the ins and outs of Rust, I think it's finally safe to encourage wider participation in the language! This guide is intended primarily for C++ programmers who may have taken interest in Rust, but had doubts regarding its feasibility in timed contests. On Quora, I summarized why I think Rust is suitable for contests in 2019 onward. Granted, the learning curve is substantial enough that it may not seem that way at first. To make the transition easier, for anyone who's interested, here I'll go over some aspects of my Codeforces submissions in more detail, with some tips along the way.

My solution to 1168C: And Reachability

My solution to 1158D: Winding polygonal line

Right away, you'll notice certain characteristics:

  • The only global variable is a compile-time constant. By scoping things appropriately, I don't have to worry about accidentally forgetting to reset data.

  • Very few mutable variables. This makes code easier to reason about, and mistakes less likely.

  • Very strict type-checking. For example, I might use the usize type for array indices, and i64 for geometric coordinates. Conversions between the two must be made explicit. This seems annoying at first, but it caught quite a few of my bugs.

  • A polymorphic Scanner.next() method. It can read space-separated tokens of any type that implements the trait FromStr.

  • Output via BufWriter. This is needed for speed, if you want to write a large number of lines. BufWriter flushes automatically when it goes out of scope, but you'll probably want to flush() manually on interactive problems. Further I/O optimizations are possible (see comments section), but this is the major one and is sufficient for contests.

  • A mix of imperative-style and functional-style constructions, depending on which is clearer.

In Rust, you can read a Vec (i.e., vector in C++, ArrayList in Java) of floats from standard input in imperative style:

let mut v = Vec::with_capacity(n);
for _ in 0..n {
    let elem = scan.next::<f64>();
    v.push(elem)
}

Or you can do it in functional style, rendering the result immutable:

let v: Vec<f64> = (0..n).map(|_| scan.next()).collect();

Both versions are very efficient: the Vec initially allocates space for n elements, similar to v.reserve(n) in C++!

You can "consume" (i.e., move) a Vec if you won't need it anymore:

for elem in v { // equivalent to v.into_iter().for_each(|elem| ...)
    // do something with elem
}

Or borrow its contents mutably, to change some of its elements:

for elem in &mut v { // equivalent to v.iter_mut().for_each(|elem| ...)
    // do something with *elem
}

Or borrow its contents immutably:

for elem in &v { // equivalent to v.iter().for_each(|elem| ...)
    // do something with *elem
}

If the elements implement Copy, the dereference can be copied into the variable elem by pattern matching. This time, let's also keep track of the index:

for (i, &elem) in v.iter().enumerate() {
    // do something with elem
}

Rust Strings are UTF-8. To get random access, you'll have to convert them to .bytes() or .chars(). And if you're reading a String made entirely of 0s and 1s? Convert them to bools as follows:

let s: String = scan.next();
let v: Vec<bool> = s.chars().map(|ch| ch == ‘1’).collect();

My 1168C submission features the following rather magical line:

let (zero_bits, one_bits): (Vec<usize>, Vec<usize>) =
                (0..BITS).partition(|b| (ai & (1usize << b)) == 0);

Where did I get the idea to do this? Well, I wanted to know which positions of the binary number ai contain a 0, and which contain a 1. I knew that the Range object 0..BITS implements the Iterator trait, so I Googled "rust iterator", landed on a doc page, and browsed through the list of methods. (0..BITS).filter().collect() seems close to what I wanted: it selects the bits which satisfy a given condition. However, (0..BITS).partition() does more, giving both the passing and the failing bits! Note that collect() and partition() are polymorphic, so I could just as easily have obtained HashSets instead.

As you can see, the language is very expressive, and the standard library quite flexible. One very interesting finding from my experience with Rust is that "production-quality" code and "quick hacky" code look much more alike than they do in C++. This is because Rust not only makes it harder to do things the wrong way, but also makes it much easier to do things the right way. As a result, I naturally find myself coding in a clearer style, even under time pressure.

Overall, my solutions attain much fewer WA verdicts in Rust than they did in C++. Development time is sometimes more, sometimes less, but it gets better with practice. Best of all, I now find coding to be a less paranoid experience: I can have fun with problems instead of being hyper-alert to programming errors. Try it out and see!

As additional resources, feel free to check out my past submissions on Codeforces, my competitive programming codebook full of algorithms and data structures you can use, and Rust's new dbg!() macro for a more informative way to inspect run-time values. Leave your competitive Rust questions in the comments, and good luck!

  • Vote: I like it
  • +112
  • Vote: I do not like it

| Write comment?
»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

there's my IO template for rust: 53175781. Yeah, it's not pretty, but it is very, very fast, on par with fastest C++ templates (getchar etc).

Also rust compiler version on codeforces is a bit outdated, so you need to put

[package]
edition = "2015"

into your Cargo.toml

  • »
    »
    6 years ago, # ^ |
    Rev. 4   Vote: I like it 0 Vote: I do not like it

    The Rust version on Codeforces is 1.31, which is exactly the release of 2018 edition!

    Just for fun, I re-submitted your solution with my Scanner. It seems to work well enough, though indeed yours is faster: https://codeforces.me/contest/1151/submission/55073106

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      The Rust version on Codeforces is 1.31, which is exactly the release of 2018 edition!

      codeforces compiler can't deal with "non-lexical lifetimes".

      fn main() {
          let mut scores = vec![1, 2, 3];
          let score = &scores[0];
          scores.push(4);
      }
      

      - compilation error. Hence edition = "2015"

  • »
    »
    6 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Actually you can use getchar from Rust.

    https://codeforces.me/contest/947/submission/41213727

    If I understand correctly, at linking stage the program is linked to the symtem's libc. Hence that extern { fn getchar() -> i32 } works. Tho there's no getchar_unlocked since it's not compiled on POSIX system.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      what a dirty hack! I like it. However, my template can read from any BufReader. For example from Vec<u8>, which is good for testing.

    • »
      »
      »
      6 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Looks like my unsafe Scanner is now even faster than getchar()!

      https://codeforces.me/contest/947/submission/55268770

      Thanks for the discussion everyone. It took a while to get to this point, but I think Rust is seriously competitively viable now!!

  • »
    »
    6 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    What I'd like to know is which part of Scanner<StdinLock> makes it slower, and what can we do to get optimal performance while being generic in T: FromStr?

    Could it be that read_line reads too little at a time? But it seems wrapping the StdinLock in a BufReader doesn't help much...

    Is it the fact that Scanner reads to a String, which must be checked to be valid UTF-8? If so, that's a waste since it seems the source code for parse::<i64>() ends up calling .as_bytes() anyway.

    Is it because I'm allocating new Strings for each word and storing them in a Vec? If so, maybe we can use the SplitWhitespace object directly without the allocations.

    Is the parse() method itself much slower than your custom digit multiplication? It shouldn't be, since the implementation seems similar, just with a few extra checks for the sign and such.

    • »
      »
      »
      6 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      yes, it copies bytes from buffer to string, then validates this string. This leads to serious slowdown.

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        Thanks. Alright, time to bring this to the attention of the Rust community: https://stackoverflow.com/questions/56449027

        Gosh I had never been active on sites like Quora, Reddit, Stackoverflow before...

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        I don't know how to profile properly, but my unscientific measurements suggest the Scanner<StdinLock> (maybe even without the lock) is at most twice as slow as FastInput, which is probably good enough!

        More data would be nice though.

      • »
        »
        »
        »
        6 years ago, # ^ |
        Rev. 6   Vote: I like it 0 Vote: I do not like it

        Hey check this out! By avoiding the Vec<String> allocations, I made it almost as fast as yours! It's ugly though because it has to read line by line, passing around references to a String and a StdinLock. I commented the nicer version which, unfortunately, doesn't compile. Would you care to play against the lifetime checker to see if it can be made to work?

        Edit: improved, but with an unsafe operation...

        Edit2: and the really unsafe dark arts version! It's as fast as you could ever want, I think, and generic so you can use it with any FromStr type.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          hello, is their a way to something like "has_next()" to get whether there is non-empty and non-whitespace input remaining or not...???

          If yes, please share it, thanks :D

          • »
            »
            »
            »
            »
            »
            3 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            If you just want to know if the current line has more tokens, you can check the buffer's status. I didn't implement this, but you can derive it from any of the existing Scanner implementations.

            However, checking whether there's a next line might stall if stdin is waiting for the user to type something.

»
6 years ago, # |
  Vote: I like it +16 Vote: I do not like it

Regarding

Very few mutable variables.

C++ has const, but it's not the default, and it's inconvenient to use them because of std::cin.

Very strict type-checking.

g++ also have -Wconversion for warning for e.g. conversion from long long to int. I often use it and it's quite useful (although it's annoying that it doesn't recognize that (long long)a*b%MOD can be converted to int safely if MOD is an int.

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I tried using Rust one year ago. Here's my template: https://codeforces.me/contest/1036/submission/42644334. It's written for Rust 1.15 (atCoder still hasn't updated since) that's why it contains some weird workarounds. It supports reading in tuples, which makes the input very nice in my opinion, especially together with automatic type inference:

fn main() {
    init(); // would not be needed anymore today by using something similar to lazy_static
    let n: usize = read(); // explicitly writing the type
    let mut lines = Vec::new();
    for _ in 0..n {
        let (a, b, c, d) = read(); // type is automatically inferred
        lines.push((Point { x: a, y: b }, Point { x: c, y: d }));
    }
    ...
}

I had to do my own I/O implementation anyways, since stdin() and stdout() are impossible to use efficiently (every time you read or write something, a mutex gets locked and I tested that that's already too slow for some tasks).

Note that I have written my own debug!() macro before dbg!() even existed (or was discussed about).

My two main reason against Rust are the lack of a good codebook (there's this one which is pretty good though: https://github.com/hatoo/competitive-rust-snippets -- it even comes with its snippet library: https://github.com/hatoo/cargo-snippet) and that I am turned down by the old compilers available on CF and atCoder. Now that CF has updated, I might give it another try.

»
6 years ago, # |
  Vote: I like it +21 Vote: I do not like it

The current command line to compile Rust solutions is rustc --edition=2018 -O -C link-args=/STACK:268435456 --cfg ONLINE_JUDGE %1

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    It'll be really nice if you add text_io crate within compilation so we can use easier io more easily without the need of writing it down by ourselves.

  • »
    »
    5 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Why no second level compiler optimization is used here similar to C/C++?

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

rng_58 it would be great to upgrade version of Rust in Atcoder to 1.35.0 as in Codeforces. The current version (1.15.0), is practically useless and the language has evolved a lot since that.

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I modified your code and added a reader field for Scanner

pub struct Scanner<U: Sized + Read> {
    pub buffer: Vec<String>,
    pub reader: U,
}
impl<U: Sized + Read> Scanner<U> {
    pub fn next<T: std::str::FromStr>(&mut self) -> T {
        loop {
            if let Some(token) = self.buffer.pop() {
                return token.parse().ok().expect("Failed parse");
            }
            let mut input = String::new();
            self.reader.read_to_string(&mut input).expect("Failed read");
            self.buffer = input.split_whitespace().rev().map(String::from).collect();
        }
    }

    pub fn new(reader: U) -> Self {
        return Scanner {
            buffer: vec![],
            reader,
        };
    }
}

And my test for p71a looks like

#[test]
    fn test_solution_of_p71a() {
        let mut input_file = BufReader::new(
            "4
word
localization
internationalization
pneumonoultramicroscopicsilicovolcanoconiosis
"
            .as_bytes(),
        );
        let mut out_file = BufWriter::new(Vec::new());
        solution_of_p71a(&mut input_file, &mut out_file);

        assert_eq!(
            "word
l10n
i18n
p43s
",
            String::from_utf8(out_file.into_inner().unwrap())
                .unwrap()
                .as_str()
        );
    }

The stdin can be replaced with a string reader, so easier to test

  • »
    »
    5 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Yep! My contest submissions are a bit abbreviated, but you can find my full Scanners here.

    I had to use BufRead because interactive problems need to read line-by-line. Reading from Strings is still possible (see unit tests), meeting the BufRead requirement using either as_bytes or Cursor.

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Would it be possible to add itertools to the available crates when compiling? It's pretty much a must-have, especially when writing algorithms.

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Anyone uses Rust to solve SGU problems? My solutions in Rust even fail at the simplest ones, e.g., SGU 118 (Wrong answer on test 2). I compared my code with others' in C/C++ but couldn't find any substantial differences. My code:

// The IO part removed for brevity.

fn dr(x: u32) -> u32 {
    if x == 0 {
        // The digital root of 0 should be 0. Some code posted online omit this condition. Anyway,
        // it doesn't seem to make a difference.
        0
    } else {
        let d = x % 9;
        if d == 0 {
            9
        } else {
            d
        }
    }
}

fn digital_root(a: Vec<u32>) -> u32 {
    dr(
        a
        .iter()
        .scan(1, |p, &x| {
            let x = dr(x);
            *p = dr((*p) * x);
            Some(*p)
        })
        .sum()
    )
}

fn solve<B: BufRead, W: Write>(mut scan: Scanner<B>, mut w: W) {
    let k: usize = scan.token();
    for _ in 0..k {
        let n: usize = scan.token();
        let mut a = Vec::with_capacity(n);
        for _ in 0..n {
            let x: u32 = scan.token();
            a.push(x);
        }
        writeln!(w, "{}", digital_root(a)).expect("failed write");
        w.flush().expect("failed flush"); // not necessary?
    }
}

Could anyone take a look? Thanks.

»
5 years ago, # |
  Vote: I like it +8 Vote: I do not like it

[submission:https://codeforces.me/contest/793/submission/77476009]

Byte by byte reading using macros.

input! {
    n: usize,
    v: [i32; n],
}

Reads n and Vec<i32> of size n. That's all what you need to write to read an input.

»
4 years ago, # |
Rev. 4   Vote: I like it 0 Vote: I do not like it

MikeMirzayanov, can codeforces add proconio crate for rust just like what atcoder does? This will be useful for competitive programming in rust.

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

When I solved this problem, my optimal program in Rust fails because of TLE.

I tried fast I/O from here, and found that fast writing save more time than fast reading, so I suggest to use write!() instead of println!() because it's faster. Use it this way:

// Run this once in start of main
let stdout = std::io::stdout();
let mut writer = std::io::BufWriter::new(stdout.lock());

// Use this everytime you need to print
writeln!(writer, "{}", result).unwrap();

My solution with println!() works more than 1000ms and gets TLE, but the same solution with write!() works 171ms.

Also, when I added the fast read from this comment, I get 140ms instead of 171ms.

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

How do you deal with random values for several tasks like maintaining treaps, or just randomized solutions?

»
3 years ago, # |
Rev. 2   Vote: I like it +8 Vote: I do not like it

Any insights about Rust-vs-C++ performance from people who've tried it out in programming contests?

As far as I know, in theory either can happen: Rust can outperform C++ in some aspects (based on the compiler having better understanding of lifetimes, ranges, etc.), but can also be behind in others (e.g., in case one uses array indices extensively and the compiler can't optimize away the boundary checks). Also the differences can stem from different standard library algorithms: Rust's BTreeMap B-Tree vs. C++'s std::map red-black tree (usually).

But I'm curious whether anyone could share real-world, non-synthetic data, e.g., from solutions with segment trees, complex DP, etc.?

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Some time ago I was also interested in whether Rust can indeed outperform C++ in lifetimes and so on, so I decided to test and came up with a benchmark. The benchmark idea was to take $$$10^7$$$ roughly random numbers (powers of 3 modulo $$$2^{32}$$$), push them all into skew heap and pop in sorted order, while computing some hash. The idea was that Rust's mem::replace and similar functions that are useful when you rotate trees should do better job (in theory) than what you usually do in C++, and therefore Rust should be slightly faster.

    And it turned out that it is not. Moreover, my first Rust implementation works ~27s while my sample C++ implementation (with unique_ptr inside) works ~17s (10s of difference, not a typo). Then, after some experiments, I found an implementation in Rust that works ~17s. The difference was using manual casework instead of mem::swap inside merge procedure (see the code below, function subheap_merge). The interesting thing is that both manual casework and std::swap work the same time in C++! I spent a lot of time trying to understand why mem::swap gives 60% slowdown in Rust, but I still have no clue (and thus, I would be very grateful if someone could explain this to me).

    Here is the minimal working example of the effect. The difference between fast and slow versions of subheap_merge is only one line.

    MWE

    As for the comparison C++ vs Rust. The fastest Rust implementation I came up with was slightly faster than sample C++ implementation, with the difference less than the average fluctuation. However, I am more concerned about the unexpected 60% slowdown which is not present in C++. All the results (and the slowdown) are consistent on both my PCs.

    • »
      »
      »
      2 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      Do you still have the C++ code that was used for this comparison? Was it compiled by Clang++ using the same LLVM version as Rust or have you used GNU C++?

      GCC and LLVM are both modern state of the art code generation backends and generally provide similar performance in practice, but in each individual case one of them may be faster than the other. By the way, GCC will eventually have Rust as one of the supported languages too: https://www.phoronix.com/news/GCC-Rust-v4-Cleared-For-Landing

      • »
        »
        »
        »
        2 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I don't, but it was pure line-to-line translation with std::unique_ptr playing the role of Option<Box<Node<T>>>. I think, I can reimplement it, if it is really necessary. Or you can even do it yourself.

        For most of my tests I used g++, but I ran clang a couple of times and it was giving roughly the same results. Not sure if LLVM backends had exactly the same versions, I was simply using the most modern versions of compilers that stable Debian was able to provide at the moment.

        Anyway, I believe that the effect persists across the versions of the compilers, so you can choose whatever you want for your investigation.

        • »
          »
          »
          »
          »
          2 years ago, # ^ |
            Vote: I like it +10 Vote: I do not like it

          I have profiled your fast/slow Rust code with rustc 1.66.0 on Intel Core i7 860 and compared them side by side.

          The IPC is low and this is caused by a lot of cache misses. An interesting thing is that the results are rather unusual, because the "slow" version was supposed to be actually better by all metrics when looking at the hardware performance counters (fewer mispredicted branches, fewer executed instructions, fewer cache misses). But cachegrind shows nearly identical number of cache misses for both slow and fast versions on an emulated processor and this is what we would normally expect.

          Comparing the actual generated instructions:

          So for the "fast" version LLVM generates a conditional jump and for the "slow" version it generates a pair of CMOVs. The choice between conditional branches and CMOVs in the LLVM generated code is a rather hot topic and in some cases CMOVs are preferable.

          Why are CMOVs causing a slowdown in this test even though they reduce the number of mispredicted branches? The answer is that CMOV instructions block speculative execution. So the processor is just waiting in the case if a CMOV needs some value that is still being fetched from memory. Whereas a conditional branch gets predicted and the execution continues doing various stuff and reading data from memory. If branch prediction was wrong and the execution took a wrong path, then the results of this speculative execution are discarded. But the data that had been fetched into the CPU cache while executing the wrong path still remains in the cache and manifests itself as more cache misses reported by perf stat. This also enables Spectre.

          The problem is not in the Rust frontend, but in the LLVM backend. How can we tell LLVM not to use CMOVs here? I tried to find an answer, but the only usable workaround that had any effect was to compile it as rustc --target=i586-unknown-linux-gnu -O test_slow.rs. The old 32-bit i586 processors did not support CMOV instructions.

          • »
            »
            »
            »
            »
            »
            2 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Cool! And why it did not happen with clang then? Pure luck? Some tricky C++ optimization?

            • »
              »
              »
              »
              »
              »
              »
              2 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              The version of LLVM used by Clang could be possibly different or it was indeed just pure luck. But just like Rust, without the -mllvm -x86-cmov-converter-force-all=true option Clang tends to generate CMOV instructions too. See the code generated for functions foo1 and foo2 at https://godbolt.org/z/3TPhjf1fP (the whole program can be used as a benchmark).

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    It's hard to say as you usually won't have both c++ and rust solutions naturally. I had not had any unexpected problems with getting tl in rust where I had not expected it

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Although one time I had problem with fitting my solution with segment tree and treap into tl I wrote same in C++ and it was 1 second (out of 7) slower than my rust solution

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    I saw a couple of times when Rust is slower than C++ if you are not careful enough (but could be optimized to have the same speed as C++ if you actually receive TL).

    For example if you try to implement the Floyd-Warshall algorithm with Vec<Vec<_>>, it will be ~1.5 slower than C++ version just because on every g[i][j] Rust does additional bounds-checks. Usually Rust can eliminate such checks, so they don't add almost any overhead, but in case of 2-dimensional arrays it doesn't work very well. But it is possible to "prove" to the compiler that checks are not neccessary.

    Another example — if you have a lot of short strings, C++ will use small string optimization, while Rust doesn't have it, so you potentially could get ML (or TL because it doesn't fit into the cache of some level).

    One more difference — by default in Rust you need to use usize (which is usually 64 bits) when you want to store some index, while in C++ you probably use just int (which is 32 bits). For example it is natural to store graphs as Vec<Vec<usize>> in Rust, which takes x2 memory compared to vector<vector<int>> in C++.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    While I haven't studied this scientifically, my experience on Codeforces suggests that "natural" Rust code (meaning, without crazy/unsafe low-level optimizations) is about equally likely to be faster or slower than natural C++, so that overall it's about as competitive as C++. Some of the standard library algorithms (like sort_unstable) and data structures (sets and maps) are really fast!

    The only bit of unsafe I actually use in contests is my I/O Scanner: I don't trust myself to write new unsafe code under time pressure.

»
2 years ago, # |
Rev. 2   Vote: I like it -10 Vote: I do not like it

Cool. As a Rust fan, I'm really thankful for this blog. By the way, I have a question.

How to write interactive problems using Rust

I had this question but I couldn't get any response from the community. If anyone knows, please help https://codeforces.me/blog/entry/109692