A Type of Tests That Might Be Effective for Many Multi-Test Problems

Правка en1, от djm03178, 2024-03-08 19:36:37

I've always loved hacking in CF rounds, especially for open hacking and uphacking. Recently, I've been focusing on hacking for TLE because there are plently of solutions that can pass every single randomly generated tests but can still have broken worst case time complexity and the tests for that can only be found under deep investigation on the code.

One of the findings that I saw during hacking in recent rounds, is that there is a fairly common mistake that many people make but most problems don't have tests against it. The conditions for such mistakes are:

  1. The problem must be a multi-test problem (around $$$10^4$$$ or more if possible).
  2. The constraint on the $$$n$$$ variable of a single test should be as large as the constraint on the sum of $$$n$$$ in all tests.
  3. The solution has both of these slow aspects described below but none of them are that slow to get TLE when only one of them is attacked.

Condition #1 and #2 are very common in recent rounds so it's not hard to find two or more problems of such types in every round. Condition #3 is usually found when you just sort accepted solutions by execution time.

So, the slow aspects are:

  1. Slow operations for every test case: Most commonly, using endl (or just alternating cin and cout without tie(0)) or initializing a whole array with memset or defining a vector with maximum $$$n$$$ size, etc.
  2. Bad time complexity or bad constant but still fitting in TL.

Now you can see what I'm talking about. Most such problems have tests that are against #1 or #2, but not both. #1s are just easily be attacked by any random test with maximum number of test cases. Attacking #2s may require further preparation of the writer to come up with specific edge cases, but usually it is pretty well-done. However, I have never seen a round myself where they prepared a case that can counter both.

For example, let's see a problem from a recent round: 1923D - Slimes. The constraint on the number of test cases is $$$10^4$$$ so the #1 constraint is satisfied, and we have $$$n \le 3 \cdot 10^5$$$ so the #2 constraint is also satisfied.

Now let's take a look at one of the hacks I made for it: https://codeforces.me/contest/1923/hacks/998199. If you look at the submission (247991883), you'll see that the solution was accepted before the hack while the original test set already had a test with maximum number of tests (test #2-4) and various maximum $$$n$$$ tests (tests #8~). The maximum $$$t$$$ test took 1794 ms (test #3) while the maximum $$$n$$$ test took 639 ms (test #18).

The reason why it took long on maximum $$$t$$$ is simple: it calls con() for every test, which does some calculations that take $$$O(N)$$$ time where $$$N$$$ is the maximum $$$n$$$ possible. Therefore, just by having $$$10^4$$$ tests the code will perform like $$$10^4 \cdot 3 \cdot 10^5$$$ lightweight operations, but it's still fitting in TL. It is likely that test #3 and #4 will also reach almost at the limit of sum of $$$n$$$ in all test cases, but they really didn't add up much, because each $$$n$$$ is too small.

For maximum $$$n$$$ tests I didn't even try to find anything special about the worst case test, though if anything something with a series of small answer would have been more effective by the looks of the tests.

So, here's what I did: https://codeforces.me/contest/1923/hacks/998199/test. If you look at the generator, it's simply making $$$9999$$$ test cases with $$$n=1$$$, and a single $$$n=300000-9999$$$ test which wasn't even against its weakness (it was made for another submission before). A $$$n=290001$$$ test shouldn't be much different from a $$$n=300000$$$ test, but having $$$t=10000$$$ itself caused huge slowdown. So you know what happened: Successful hacking attempt.

Similar situations can occur in almost every problem with similar constraints. Therefore, I think it is something that future writers should consider: Add several tests of such type for such problems. I hope this blog would help strengthening main tests against these solutions that really should not pass even without my hacks.

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en2 Английский djm03178 2024-03-08 21:58:29 5
en1 Английский djm03178 2024-03-08 19:36:37 4133 Initial revision (published)