How do authors ensure that their solutions are correct? - Codeforces

→ Pay attention

Before contest
Codeforces Round 986 (Div. 2)
10:23:00
Register now »

*has extra registration

→ Streams

Codeforces Round 986 (Div 2) - Solution Discussion

By Shayan

Before stream 12:27:58

View all →

→ Top rated

#	User	Rating
1	tourist	4009
2	jiangly	3821
3	Benq	3736
4	Radewoosh	3631
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3388
10	gamegame	3386

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	cry	164
1	maomao90	164
3	Um_nik	163
4	atcoder_official	161
5	-is-this-fft-	158
6	awoo	157
7	adamant	156
8	TheScrasse	154
8	nor	154
10	Dominater069	153

View all →

→ Find user

→ Recent actions

Detailed →

mazihang2022's blog

How do authors ensure that their solutions are correct?

By mazihang2022, history, 3 years ago, In English

In English

Is it possible to hack authors' solutions?

Tags

question, hack, contest

+38

mazihang2022
3 years ago
6

Comments

Comments (6)

Write comment?

»

3 years ago, # |

Vote: I like it

+54

Vote: I do not like it

Sometimes it is possible to hack the authors' solution, as seen here. However, it is generally rare, as there are many people (the coordinator and the testers) who read the problem and the solution and verify that they are correct.

Another way that authors ensure that the solutions are correct is by writing an exponential brute force, and checking their solutions against thousands of small cases. This is usually enough to catch most errors.

→ Reply

»

»

3 years ago, # ^ |

Vote: I like it

+4

Vote: I do not like it

So will the contest be unrated if this happens?

→ Reply

»

»

»

3 years ago, # ^ |

Vote: I like it

+17

Vote: I do not like it

It depends on how much it affects the contest. For example. when issue is fixed quickly enough (by changing solution or problem statement) or when error in solution appears only on systests and correct solution exists, contest will be rated.

→ Reply

»

3 years ago, # |

← Rev. 2 →

Vote: I like it

+4

Vote: I do not like it

Multiple solutions: different authors, various approaches (including slow ones).
Stress-testing.
Setter should write down a proof (if any) and somebody else should check it.

→ Reply

»

»

3 years ago, # ^ |

Vote: I like it

+13

Vote: I do not like it

Also, reference solutions with added bugs to ensure tests actually catch them.

→ Reply

»

3 years ago, # |

Vote: I like it

+49

Vote: I do not like it

For Codeforces, the problem selection procedure is pretty high quality, so the chance of this happening is very low. What follows below is only from my experience with problemsetting/coordinating for my uni contests (which spans about a dozen contests or so, and about a hundred problems).

Whenever I coordinate contests for my university CP club contests, I ask problemsetters to submit a formal mathematical proof of correctness of the intended solution as well as implementation (line-by-line, if it isn't standard) before setting the problem on Polygon (for instance, proof of claims needed for greedy, dp and binary search problems; structural properties for graph problems and so on). Any problem that I have ever approved using this strategy has had a correct solution.

From my experience, the more formal (and low-level) the proofs are, the less probable it is to make mistakes and miss out corner cases. For instance, while a high-level proof might have a chance of missing the fact that there are no cycles with length $$$\le 2$$$ in an undirected graph but there are such cycles in directed graphs, a low-level proof will definitely not miss out that detail. Maybe this arises out of my mathematical inclination, but I've seen that the more "machine-verifiable" a plausible proof seems to be, the more the chances are that it is correct.

An interesting thing that we've been thinking of implementing is to somewhat formalize the test-generation pipeline (since the judge has the responsibility to minimize false positives and eliminate false negatives), which is overlooked by quite a few new problemsetters. Testing with a brute-force solution also helps detect any incorrect author's solution, so it is imperative that the tests comprise a diverse enough subsets of all permissible inputs which can eliminate false positives and negatives on most sane approaches, even for the author's solution. Some of the things we currently think of while making tests are as follows, but they are just heuristics and not totally formal:

TLE tests for both heuristic solutions as well as plain dumb solutions (using stress tests)
WA tests for stuff like greedy/dp, for heuristic solutions with the wrong greedy or dp
Exhaustive tests for small cases
Random tests (with certain structural properties, so not completely random) to account for cases we can't manually find
RE tests (out of bounds, overflow)
Ensure that if the problem has subtasks, then if a subtask A generalizes subtask B, then tests of subtask A should contain those of subtask B.
Look through the proof of the author's solution (as well as the tester's solution and others who have solved the problem), and consult a list of generic mistakes that people make while implementing various algorithms (for instance, for binary search, people sometimes write an infinite loop, make off by one errors, miss out the boundaries and so on), and modify some pre-built tests to fit that problem.

If there are some more test-generation ideas that I missed out on, or some strategies that can "formalize" this procedure to a good extent, please let me know!

→ Reply