Some concerning trends related to system test duration

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Hello all,

While waiting to submit my solution for last div3's G, I realized that the system tests last way more as of late in div3, div4 and educational rounds. System tests started around 5 hours ago and we are only around 00:50 into the contest, as of the time I am writing this blog.

Although some of the difference is caused by the ever increasing number of contestants for these rounds, this still doesn't explain what seems to be the huge downgrade of the judging servers. From my memory, this phase used to not last more than 1h, maybe 2h in exceptional cases.

Educational Codeforces Round 166 (Rated for Div. 2) System tests lasted around 5 hours

Codeforces Round 946 (Div. 3) System tests lasted around 3 hours

Codeforces Round 944 (Div. 4) System tests lasted around 8 hours

Codeforces Round 943 (Div. 3) System tests lasted around 3.5 hours

Here is a series of 4 contests around 1 year ago

Codeforces Round 898 (Div. 4) System tests lasted around 2.5 hours

Codeforces Round 895 (Div. 3) System tests lasted around 2 hours

Educational Codeforces Round 154 (Rated for Div. 2) System tests lasted around 1 hour

Codeforces Round 894 (Div. 3) System tests lasted around 30 minutes (Probably due to few hacks)

While this is usually not a huge problem, it can lead to situations where in case of back to back rounds, people get assigned to wrong divisions and in general it leads to an annoying experience for the users.

As some suggestions, maybe the hacking phase can be reduced to 8 hours so that the system tests can be hold around what would be early morning in Europe, when the user activity is the smallest? I guess another option would be to be more selective as far as hack tests go so that we don't see 100 tests which test the same thing (i.e: unordered map hacks in the case of this round)?

I think finding some solution to this problem is very important for the purpose of future rounds.

UPD: Codeforces Round 964 (Div. 4)'s system testing beat all the records with over 10 hours, as shown by these judging times from my friends' submissions 274711443 and 274945800, however Div. 1 and Div. 2 rounds seem to generally have reasonably fast system tests, does anyone know why is this the case?

Comments (35)

Write comment?

SirPh

6 months ago, # |

the servers are just rusty :(

→ Reply

2020TAndr

6 months ago, # ^ |

-57

I do not believe Rust solutions are the issue)))

Though there can be slow solutions and algorithms in any language..

bgopc

bro rusty means old, as old as dust(the particle of dirt) gathered on it, not that it used rust programming language

pathetique

5 months ago, # ^ |

+10

no he literally meant rust is bad

It's not bad, it's shit.

Raiden1507

Agreed. Even We have to wait if we want to upsolve and submit.

DP_is_tough

Agreed to your point totally . This is a very concerning issue.

paramveer5404

Yeah they shouldn't add all the the hacks to the main test cases. The C problem has 150 test cases but most of them are for Unordered Map. They can try ignoring similar purposed hacks by categorizing them or something. Now the C problem is taking over a minute per submission to be tested.

riteshfyi

can you explain why does unordered map causes TLE in the Q3.

https://codeforces.me/blog/entry/62393 You can read this blog about it. It is mostly about how an unordered map stores data in a hash table and how collisions can make data extraction O(n) instead of O(1).

ad1tya_kumar

-8

Its quite surprising but worst case of unordered_map is O(n) its better to use normal maps sometimes, its much more consistent for large test cases.

-is-this-fft-

In my experience, in competitive programming it's almost always better to use normal maps.

Yes, theoretically unordered_map does in $$$O(1)$$$ what map does in $$$O(\log n)$$$, but the constant is large. In problems where you have a tight time limit and e.g. $$$O(n \log^2 n)$$$ is too slow, replacing maps with unordered_maps and getting it down to $$$O(n \log n)$$$ is still too slow. There are maybe 4-5 times when I have successfully gotten below the time limit by just changing map to unordered_map.

And the theoretically better time complexity is really the only advantage unordered_map has over map. On the other hand, map can support lower_bound, is correctly ordered when you loop over it, and is not prone to adversarial cases.

With unordered map used correctly and with every performance boost possible, you can at best make the running time at half, but so many people just spam unordered map without any optimizations and hashing that it makes it 3x worst than better. there should be a rating limit on using unordered map

Fork512Hz

How can you categorize similar hacks? I don't know. Maybe some neural network magic?

stefdasca

-25

Most unordered map hacks are done via some script, that should simplify things a lot.

Vladosiya

+52

Just because C has around 100 tests after hacks. I guess 90+ of them are antihash tests.

+15

This explains some of the issues but at the same times in the other cases mentioned there were not that many hacks. Maybe in the future add some pretests to avoid such situations? Though in a way people learn better if they see themselves being hacked from such a mistake.

Again, the point of the blog is not directed at you or your colleagues because it's not your fault. The round was pretty nice overall and I enjoyed solving the problems.

djm03178

+24

The latest Educational round was a special case, because not long after starting system test we had Codeforces Round 949 (Div. 2), so it could only be very slow during that time. Also, when we had Codeforces Round 944 (Div. 4), 2023 Post World Finals Online ICPC Challenge powered by Huawei was also on the way which took like forever to test one solution.

In Codeforces Round 943 (Div. 3) we had 193 tests on G1 and 69 on G2, and in Codeforces Round 946 (Div. 3) we had some good number of TL hacks on B and C and especially problem C had very time-consuming solutions with 4 seconds of TL.

It also depends on many other factors as well, not just the number of hacks. Basically system test gets really long when 1. a problem that many people solved 2. of which the execution time is normally long 3. gets hundreds of TL hacks. This time the hacks on problem C over-satisfies all of the conditions by a huge amount. I could see submissions on C were still running in like page 15 when problem A or B solutions were accepted in page 1.

vstiff

-7

Any problem that has a slightest chance to be solved with hash map should have a warning that it will TLE. Nobody benefits from scrolling pages of comments about it and unnecessary cpu load on testing servers.

AkiLotus

Having that would be practically similar to having a warning of some sorts such as

"GNU C++ rand() might not works properly with large ranges"
"string hashing with modulo $$$2^{64}$$$ is known on Codeforces to be easily breached"
etc.

, which on hindsight isn't too bad, but in reality, turns terrible as it directly gives away one possible approach for the problem in question. It's not like the kind of warning that states "answers might not fit in 32-bit integer representation", as most of the times it's just to save some minor hassles and it doesn't really distinguish between a wrong and correct approach.

It's not that obvious in many cases that a solution would involve hash table, unordered_map for counting is a thing, but what about checking appearances through unordered_set? And that was just one possible counterexample.

Good point, then this notification should be for every task, or round announcement. Like it was about 64 bit integers, about large input and so on until it basically became well-known and non-existent.

Then the reverse problem might appear though. Something like "hey, why are you warning me about string rolling hash incorrectness in a full-fledged number theory problem?"

Like, the kind of warnings I disagreed above are of very specific niches and not some general usages, so warning about them in all tasks/rounds seem pretty wacky in another sense.

physics0523

+51

TBH too long systest is actually harmful. In CF, we can't submit any solutions during the systest(if it's a usual round, we can't submit anything from the end of the official round until the systest ends, and it takes 1-2h usually).
Very personal opinion, but this quarantine period demotivates me from doing the review of the round.

Umi

I believe this can (or at least could) be bypassed by having a mashup contest and adding the problems there.

Can anyone confirm if this work?

alexwice

+13

The 12 hour hack period should just go. Nowadays, the meta is to have strong pre and sys tests. So most hacks are hacking unordered_map/dict. In my opinion this is not appropriate, especially for Div3 or Div4 level. Imagine being a 1500 level new member and getting your otherwise correct dict solution hacked because you didn't use a Wrapper(int) class. Why do we allow this? Why such an emphasis on hacks? This was maybe a thing in the topcoder era, to copy topcoder's format, but now it doesn't need to be here. Everytime I brought this up I was met with insane amounts of resistance, I guess by those that love breaking some kid's day over not knowing the requisite arcane lore. It's like hazing or something.

SuperJ6

← Rev. 2 →

I love breaking kid's day over not knowing requisite arcane lore.

I think this is clown every time you bring it up. (I agree with stef idea to limit dup tests or similar)

Mazga

How do they hack? Dict breaks?

ishwarendra

In ~~all~~ most of the Division 3 and 4 rounds, it is always those anti-hash testcases. Maybe it is better if we can start including these tests in the pretest itself.

carolinux

Yes, in my opinion it's better that a participant gets a TLE during the contest, where they have the chance to ponder what could be causing it, than a hack that appears 24 hours later.

I do advocate the idea of adding exploitative tests into original testsets, since as it's under one umbrella's control, it's more likely to have ~5-10 tests that covered most of the known implementations needed to be exploited (i.e. different compiler versions and different envs) — especially so when multitest is a thing now that we could stack stuff upon each other, maybe two $$$10^5$$$ tests packed in a file for a $$$2 \cdot 10^5$$$ limit.

However, I still feel like this idea is lacking. In other kinds of heuristics/unstable methodologies that implementations tend to be more creative, i.e. string rolling hashes, or exploits upon questionable aspects i.e. compiler alloc bugs, the choice of adding the tests from the start may raise some ethical issues and/or turn out to be too wide to handle, the latter case practically makes the idea defeat its own purpose.

Heh. The thing is a dilemma.

Md_Pranto

is there any website more popular than codeforces ? I don't think so .

Newbie_99

I don't think so.

Piyash_

Leet code , Codechef and Atcoder also popular .

Is it possible for problemsetters to explicitly disable hacks for some problems in favor of hashing solutions and reducing server load?

But this might work as a hint for participants：if a string problem says hacks are disabled, it's a clue that hashes should be used.

4 months ago, # |

Auto comment: topic has been updated by stefdasca (previous revision, new revision, compare).

stefdasca's blog