Блог пользователя piaoyun

Автор piaoyun, история, 2 месяца назад, По-английски

I must say that I have no ideas about the details how OpenAI tested o1 model in IOI and Codeforces contests. This framework may not work or they have tried it.

Here are some facts:

  1. o1 performs relatively poor in IOI with 50 tries each.

  2. o1 achieves IOI Gold Medal with 10000 tries each.

  3. o1 only achieves 1600+ rating (far from IOI Gold Medal) on Codeforces.

  4. According to the survey by community (https://codeforces.me/blog/entry/133887), o1 can solve very hard problem (2700) but also fail some very easy problems (800)

  5. Codeforces's rule prohibit o1 from having too many tries.

4 and 5 may be the reason why o1 only achieve 1600 on Codeforces. The difference between IOI Gold and 1600 is, that IOI rules provide a no-cost validation so its final score is max(for each try).

I believe, OpenAI didn't pay much attention to how to conquer the submission limitation of Codeforces. They may also independently generate 50 or 10000 codes. Thus the potential of AI cheating is suppressed and can soon threat to higher rating players.

The point is, is there a way to validate each piece of code without submitting it? YE5.

Any well-trained CPers / OIers may easily come up with their practice in some contests where participants can only submit once. They write a pretest generator, a true but slow brute-force solution and their final solution. Keep comparing the results of both until after a bunch of tests there is a difference or not.

Brute-force is always easier to write, some extremely slow brute-force like exponential algorithms can hardly be wrong. Solving problems iteratively is the common experience of us.

So the simple framework works like this:

  1. generate and validate an exponential solution can pass all given pretests.

  2. generate larger pretest and use the exponential solution to validate newly generated n^2 solution.

...

  1. generate total scale pretest and use previous fast solution to validate final solution.

  2. submit

If it's stuck at step 2 for a long time. The exponential solution is wrong, generate a new one and ask for more human-made pretests. The validation process may consume much time and should be accelerated with multi-threads strategy. Also next stage solutions and be generated and validated parallel.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +14
  • Проголосовать: не нравится

Автор piaoyun, история, 19 месяцев назад, По-английски

I don't mean setting a rating limit like only pupil or higher people can post a blog or send talk . That's discrimination.

But asking who writes comments to participate in at least one Codeforces contest is proper , even they get WA on A and don't pass any problem.

I'm just sad to see many spams and disrespectful comments due to many causes. People want to make spoofs but don't want to get downvoted, so they create another account. Most of them are children under 14. I hope this can help to increase the time cost of it.

No kid would create an account ,wait for another contest, just to look funny in public , I think.

Полный текст и комментарии »

  • Проголосовать: нравится
  • +368
  • Проголосовать: не нравится