An iterating AI-solving framework may boost o1's cheating ability on Codeforces.

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	155
8	TheScrasse	154
9	Dominater069	153
10	djm03178	152

I must say that I have no ideas about the details how OpenAI tested o1 model in IOI and Codeforces contests. This framework may not work or they have tried it.

Here are some facts:

o1 performs relatively poor in IOI with 50 tries each.
o1 achieves IOI Gold Medal with 10000 tries each.
o1 only achieves 1807 rating (far from IOI Gold Medal) on Codeforces.
According to the survey by community (https://codeforces.me/blog/entry/133887), o1 can solve very hard problem (2700) but also fail some very easy problems (800)
Codeforces's rule prohibit o1 from having too many tries.

4 and 5 may be the reason why o1 only achieve 1807 on Codeforces. The difference between IOI Gold and 1807 is, that IOI rules provide a no-cost validation so its final score is max(for each try).

I believe, OpenAI didn't pay much attention to how to conquer the submission limitation of Codeforces. They may also independently generate 50 or 10000 codes. Thus the potential of AI cheating is suppressed and can soon threat to higher rating players.

The point is, is there a way to validate each piece of code without submitting it? YE5.

Any well-trained CPers / OIers may easily come up with their practice in some contests where participants can only submit once. They write a pretest generator, a true but slow brute-force solution and their final solution. Keep comparing the results of both until after a bunch of tests there is a difference or not.

Brute-force is always easier to write, some extremely slow brute-force like exponential algorithms can hardly be wrong. Solving problems iteratively is the common experience of us.

So the simple framework works like this:

generate and validate an exponential solution can pass all given pretests.
generate larger pretest and use the exponential solution to validate newly generated n^2 solution.

...

generate total scale pretest and use previous fast solution to validate final solution.
submit

If it's stuck at step 2 for a long time. The exponential solution is wrong, generate a new one and ask for more human-made pretests. The validation process may consume much time and should be accelerated with multi-threads strategy. Also next stage solutions and be generated and validated parallel.

Rev.	Кто	Когда	Δ	Комментарий
en4	piaoyun	2024-09-16 19:36:25	4	Tiny change: 'Gold and 1807 is, that ' -> 'Gold and 1600 is, that '
en3	piaoyun	2024-09-16 19:36:09	9
en2	piaoyun	2024-09-16 19:07:17	22
en1	piaoyun	2024-09-16 19:06:47	2364	Initial revision (published)

Rev.

Язык

Кто

Когда

Комментарий

en4