Let's stop making anti-GPT problems?

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

#	User	Contrib.
1	cry	167
2	-is-this-fft-	165
3	Dominater069	160
4	atcoder_official	159
4	Um_nik	159
6	djm03178	156
7	adamant	153
8	luogu_official	149
8	awoo	149
10	TheScrasse	146

There seems to be a general trend of problem-setters going out of their way to ensure that their problems aren't solvable by AI. This obviously restricts the meta to the kinds of problems that AI is bad at: which, in practice seem to primarily be ad-hoc problems that require multiple observations. This approach doesn't always (and in my opinion, rarely) yield the most enjoyable/natural problems to humans.

The only motivation for this tendency is to ensure that CF ratings and leaderboards in contests retain some meaning.

What I want to express from this blog:

I think we should stop intentionally making anti-GPT problems and optimize problem-setting for human enjoyment, with a complete disregard for how AI models (and people using said models to cheat) perform on these problems.

I will try to explain why I think so below.

There are only two ways (distinct in their relevance to CP) in which AI gets better in the future:

It becomes smarter than humans and creating problems which are unnaturally hard (with respect to other kinds of problems) for it becomes infeasible because it doesn't have any weaknesses which humans are good at. This seems like a very probable outcome based on how quickly it has improved in the last few years (I can recall discussions in which people made fun of GPT's incompetence and how it would never be able to solve even slightly non-trivial problems due to fundamental issues in how AI models work, back in 2022). However, I don't have any real knowledge about how AI works or what the current AI "scene" looks like so let's not focus too much on whether this is bound to happen.

Let's say it does happen in 5 years, what happens to CP then? Assuming that sites like CF don't go down to due to a lack of revenue (I don't actually know how CF generates revenue), we would realize that since there is no way to prevent misuse of AI, we might as well maximise enjoyment for actual humans (enjoyment for humans + cheating > no enjoyment for humans + cheating).

Now, you may ask: how is the future relevant to the present moment? Well, I think it's because most people are subconsciously thinking of the current decision to make anti-AI problems as going down a path which leads to AI never making cf ratings meaningless, at the cost of some eternal but minor decrease in problem quality (akin to making greedily optimal choices in an optimization problem; they aren't seeing far enough). If one does recognize that ratings will become meaningless in a significantly short amount of time (5 years being a good estimate) then one views the two choices in a different light:
- Keep making anti-GPT problems, which become increasingly hard to create, and less pleasant for humans to solve, as AI keeps getting smarter, cheaper and faster. This also entails dealing with the countless blogs and comments by crying cyans about how a guy using o69 caused his rating delta to go from +2 to -10. Do this for 5 years until there is a pivotal moment when people realize the futility of this charade. If you were to look back upon the last few years from that point, giving up the prestige and meaning we attach to ratings in exchange for preserving the actual enjoyment we derive from this activity would seem like the right thing to have been done.
- Stop caring about misuse of AI and how it affects ratings. Optimize for elegance and enjoyment when creating problems.
In other words, the destination is the same- we can either choose to be slowly dragged to it while throwing a hissy fit or accept reality and walk to it ourselves, enjoying every part of the journey.
Somehow, magically, AI hits a "wall" which it cannot meaningfully improve beyond. For us, this would mean that it's expected performance rating would asymptote to some value. Keep in mind that this value would still be quite high, since it currently seems to be able to perform at a performance rating of at least 2000 in most contests, which places it at the 95th percentile(https://codeforces.me/blog/entry/126802) of human competitors. Once again, I don't want to speculate about how much this score would be in the future, but it seems safe to say that it would at least be 2300, implying that the vast majority of human competitors would benefit in some measure by using AI during contests.

Until the "wall" is hit, the intermittent period would be more-or-less identical to the period where AI gets better in the first situation. The meta would have to rapidly change, problem-setters would have to put effort into making sure that their problems weren't solvable by the SOTA, there would be a lot of volatility, people would constantly complain about AI cheaters and their effects on rating distributions, etc. This period would be as unpleasant as in the first scenario. There are two possible final destinations here:
- We stop caring about misuse of AI and how it affects ratings. Optimize for elegance and enjoyment when creating problems. The same endpoint as the first scenario. We would realize that we should have stopped caring about AI cheating earlier in this timeline too.
- The CF meta also asymptotes to a certain place, and is significantly based around the limitations of the then-SOTA AI. It is obviously unlikely that the kinds of problems AI would be bad at would be the ones that humans enjoy and find natural. It would, also not be perfect wrt cheating and there would still be a lot of it since AI would still be smarter than most humans at all kinds of problems (but worse at certain kinds of problems than a small percentage of humans). Maybe there will be a weird gap between the kinds of problems at offline events like olympiads and online contests, since the former would likely always optimize for human enjoyment (or at least not consider AI capability when creating problems). It could just be me, but I certainly wouldn't want this wonderful activity to end up in such a place and would much prefer having gone the other way earlier.

On a more positive note for people who care about ratings

I shouldn't have started writing this at 6 AM, now the sun is out and I don't feel like sleeping.

Comments (45)

Write comment?

WhiteSupremacy

37 hours ago, # |

-142

not reading all that go ahead and draft this blog because it sucks

→ Reply

29 hours ago, # ^ |

← Rev. 2 →

-47

here is a better blog:

Instead of crafting problems specifically to thwart AI and preserve contest ratings—which leads to less enjoyable and natural challenges for humans—we should design problems that maximize human engagement, since AI’s inevitable advancement makes such anti-AI measures ultimately unsustainable.

notice how i didn't need to write a 2000 word essay like you

cry

+31

coldminded

-35

i ain’t reading allat. I asked ChatGpt to summarize a blog:
The text argues that current contest problem-setting is too focused on making problems unsolvable by AI, which leads to contrived, less enjoyable challenges for human solvers. The author contends that instead of deliberately creating "anti-GPT" problems to preserve contest ratings and leaderboards, problem setters should prioritize crafting elegant, naturally engaging problems for humans—even if that means ignoring how well AI or cheaters might perform. They outline two potential future scenarios:

AI Continues to Improve: As AI becomes increasingly capable, designing problems that AI fails at will become harder and more unnatural for humans. This could lead to constant adjustments, complaints about rating manipulation, and an overall unpleasant meta until a tipping point is reached.
AI Hits a Performance Ceiling: If AI eventually plateaus at a high but fixed performance level, human contests might gradually shift back to focusing on human enjoyment, making AI's impact less significant over time.

Ultimately, the author advocates for embracing the inevitable evolution of AI and redirecting the focus toward human enjoyment in problem creation. They predict that although ratings may initially lose their meaning due to rampant AI-assisted cheating, a future equilibrium could be reached where ratings regain value and cheating diminishes, benefiting genuine human competitors.

negative-xp

37 hours ago, # ^ |

+25

there is literally only one line of text in bold… also that summary isn’t accurate lol, ask another model

askd

36 hours ago, # |

+49

I think your opinion is quite biased since you dislike anti-GPT problems. I kinda have the opposite view, and suspect this might be the more common one.

Also, I don't think this sentence is true lol: "In other words, the destination is the same- we can either choose to be slowly dragged to it while throwing a hissy fit or accept reality and walk to it ourselves, enjoying every part of the journey."

I don't think the people who would throw hissy-fits would be so quick to accept the inevitability of AI beating them. If many problems were GPT-solvable, I bet there would be a lot more hissy-fits (especially if it happened in div1 :p).

hashman

35 hours ago, # ^ |

← Rev. 6 →

Actually, problems being good / natural is not something related to being anti GPT, or not being anti GPT. For example, AtCoder has some of the best problems in ARCs and AGCs, and AI still can't solve most of them. So, there is definitely something to be said about the possibility of being anti GPT while still having good problem. But I don't think the quality of Codeforces (Div. 2 at least) can be made as good as ARCs, because of difficulty concerns, so for Codeforces, problems being anti GPT probably must come at the cost of them being not so natural or seeming more artificial than the ones not being anti GPT. For example, in the last round, I think problem D, the formulation was quite natural (of course, it was standard as well, but quite natural too), but problem B and problem C seemed very artificial and not so fun to solve (my opinion).

34 hours ago, # ^ |

+11

Wait just to clarify, I do like anti-GPT problems, so your example of ARCs and AGCs supports that :p

Your point about something like B and C from last round being artificial is valid for sure though — I personally still prefer B and C (since I solved D really fast in my head due to its standardness, but then just had to spend a while typing it up...), but I can definitely see why some people would prefer D.

33 hours ago, # ^ |

-8

I don't like D per se, because it's so standard, but if I had to choose, just based on reading the statement which problem is probably more "fun" to do, I'd choose D. Of course, this doesn't mean that it's actually more fun to do, but just at first glance, it looks like a better problem. My experience with problems B and C looks like so:

B: Can I achieve $$$2$$$ always? Of course not, sample says I can't. But well, $$$k = n$$$ is a special case, and for $$$k \ne n$$$ I can do some shifting to always achieve $$$2$$$. Can I achieve $$$1$$$? Sometimes, yes. When? Only when I can split into enough subarrays, and now it's trivial.

I believe this sort of problem can serve as a good shock value, for example Buying Jewels, which in my opinion is a very troll but very nice puzzle. But if you put such problems at B, people are just going to metagame the solution based on "it's B" anyway. I definitely prefer problems like Round 1000's B, which require at least some algorithmic mindset, instead of just being adhoc puzzles (which are trivial at position B anyway).

C: Ok, so of course this is going to be a suffix sum. Suffix sum being $$$0$$$ means we consumed the last operation. It being $$$1$$$ means we consumed the second last... Oh wait all values are $$$\ge 1$$$, so for each column I only have some leeway on how much I can take. (Tiny nitpick: It's usually more natural to think that the rows are the days, and the columns are the waiters, so I automatically assumed that, but unfortunately it was not, and I realized this afterwards) Now it's just simple greedy.

I think C was not that bad, except the statement, which did not use the real life model at all ($$$\text{MEX}([a_1, a_2, \dots, a_n])$$$ is a totally reasonable metric to care about /s) but B I don't like that much, because it's just the same idea which has been used in so many other problems before ("can I achieve $$$x$$$?" -- yes. now just find when $$$x - 1, x - 2, \dots$$$ are achievable, and you're done), and especially when it's put on position B, where you know it's gonna be something like this, so it even loses its shock value.

Mindeveloped

+56

We've never tried to make problems intentionally anti-GPT. Just many non anti-GPT problems are also uncoincidentally bad.

djm03178

Obviously there was an attempt to reject GPT-solvable problems before the new rule about restricting AI was made. I'm not sure if that's still the case, but as you can see many people still complain when problems were solvable by GPT (and thus many people cheated) so it's normal that some problemsetters would also want to avoid GPT-solvable problems.

35 hours ago, # |

-9

I kind of agree with this, but I think most Div. 2, and in fact a few of the Div. 1 as well are of the "throwing a hissy fit" type. And that already makes up a majority of the users, so I suspect a majority of the users won't be happy with what you said above. However, this is only my prediction, and I would be interested to know the statistics on how many people fall into each of the three categories below:

Does not care about AI cheating, does not think we should make problems AI proof,
Does care about AI cheating, but knows that it's kind of futile to stop it, does not think we should make problems AI proof,
Does care about AI cheating, and thinks it's actually useful to stop it, at least for a while, so does think we should make problems AI proof.

I think most people fall into the third category above, but again, I would be interested to know the statistics on this.

you can use this to turn your comment into a poll

+19

I would be interested to know the statistics on this

Me too. Let's make one now.

Does not care about AI cheating, does not think we should make problems AI proof
Does care about AI cheating, but knows that it's kind of futile to stop it, does not think we should make problems AI proof
Does care about AI cheating, and thinks it's actually useful to stop it, at least for a while, so does think we should make problems AI proof.

26 hours ago, # ^ |

So apparently I was wrong, and most people think otherwise...

Well that's actually good news! I think the more people believe in the second option, the better it is for the overall future of competitive programming as a whole, so this is actually good.

temporary1

← Rev. 3 →

+42

generally agreeable blog. as a tester of some recent rounds, ive seen many good problems get rejected just because AI solves them, which i have had mixed feelings about. on one hand, it somewhat degrades the problemsets' quality, but on the other hand, it feels kind of "necessary" that some action be done to try and prevent these AI models from performing too well.

that being said, i believe one of the main reasons authors may feel compelled to make their round "ungptable" is the minimal amount of transparent action being done by codeforces itself to prevent cheating and maintain contest integrity, or the lack thereof. we know very little about how the rules are actually being enforced. sometimes it feels unimpactful, sometimes it feels unreasonable, sometimes it feels impossible. there are blogs concerning the use of AI after almost every contest, and it is mostly well known that these blogs do absolutely nothing. it was incredibly disappointing to see some blatantly obvious cheaters with high rankings go unpunished in recent rounds. this is why i put "necessary" in quotations; it only feels so because you barely see any reactive anti-cheating measure.

of course, cheating isnt and will never be 100% preventable, nor should it ever significantly impact the joy of CP, but this doesnt mean efforts to detect or prevent it are useless. that said, cheating prevention should be handled through active moderation and anti-cheating systems, not by authors slamming their heads against the wall repeatedly coming up with problems until one doesn't get solved by o1 on their attempts.

that said, cheating prevention should be handled through active moderation and anti-cheating systems, not by authors slamming their heads against the wall repeatedly coming up with problems until one doesn't get solved by o1 on their attempts.

I second this. Although it's likely that we won't be able to detect most of the cheats, as someone said, cheaters will keep cheating and they will eventually be caught. As normal contestants, we can only convince ourselves that cheating is bad, and stop spreading the temptation to cheat to others.

I just hope this won't turn into a similar situation as many people still confidently using alts and thinking that alts are not bad.

+53

I third this. I used to be one of those authors that automatically disregard problems that GPT can solve. Of course, when only 4o was a thing, this was still realistic. Now, with o3-mini, I think it's a whole different situation.

I've probably rejected countless problems (either by myself, or by my fellow team of authors) just because GPT can solve it. I used to uphold this standard, but now I'm convinced it's not worth the time spent and the quality tradeoff. After all, lower division contestants are supposed to be educational, and I've always believed that. I don't think my fellow authors deserve to be frustrated when they propose a perfectly fine problem, but it can't be used just because of AI.

If you're an author, I know it can be quite frustrating when you spend so much time making a round, just to get hated because of complaints about AI. I'm completely in support of ignoring these complaints and make the round quality the best possible. I hope other coordinators and authors will agree with this.

-firefly-

I fourth this.

N29

33 hours ago, # |

I concur with this blog. AI is highly effective for problem creation, solving, and translation. However, cheating is a human tendency driven by the desire to excel, yet depending on AI for it ultimately hinders personal growth.

vjudge45

At least you can make it anti-GPT, but you can't make it anti-vjudge!

New_and_simple

After reading through some posts here, personally, I am here to learn and enjoy some logical problems, even if I am bad at it. Not everyone has a fast brain, I know I don't, but I believe it can be cultivated. Those who want to get a higher rating through any means possible will continue to find ways to do it regardless of AI based problems. I would rather not miss good problems just because AI can solve it, especially since I struggle at them.

sauron271

32 hours ago, # |

if you want questions to be more enjoyable for human beings then add a pic of sydney sweeney without a top in every question description.

she isn’t even hot, you have poor taste

28 hours ago, # ^ |

+20

Different men find different types of girls attractive based on a combination of personal preferences, cultural influences, and psychological factors. Some may be drawn to physical features like body type or facial symmetry, while others prioritize personality traits such as intelligence, kindness, or humor. Cultural norms and societal expectations also play a role in shaping attraction, with some men valuing traditional qualities, and others preferring traits like independence and ambition. Ultimately, attraction is highly subjective, and the qualities that appeal to one man might be different from what another man finds compelling, reflecting the diversity of human experiences and desires.

Hi ChatGPT!

Hey! What's up?

_D-One_

23 hours ago, # ^ |

eww ... noooo!
Syndey is definitely hot...

22 hours ago, # ^ |

Spoiler

would.

no one here is doubting your general inadequacy in life, there is no need to remind everyone again

yes i am a loser for saying i am attracted to sydney fucking sweeney.

123gjweq2

21 hour(s) ago, # ^ |

is this the silicon valley everyone keeps talking about?

sleepntsheep

30 hours ago, # |

agreed

omsincoconut

hello

25 hours ago, # ^ |

orzorzorz

ismoilmirzouz

28 hours ago, # |

I think it is just matter of time before rating loses a bit of interest and thus people wouldn't care about rating and it becomes just like chess.

donpaul2718

It's the only logical way forward. Trying to make anything under div2 D GPT proof will be extremely hard. Rating are gonna lose relevance as a consequence which will make most cheaters leave and people who genuinely enjoy problem solving are gonna stay.

aveok

26 hours ago, # |

No one has ever created anti-GPT problems. Artificial intelligence can't even solve the easy problems on codeforces. Your words are inappropriate.

nik_exists

Artificial intelligence can't even solve the easy problems on codeforces.

Have you kept up with advancements in AI recently, AI can 100% solve CF problems, even ones that are much more difficult.

nomoreplayer

25 hours ago, # |

The "easy" problems in the last two rounds were ok-ish. Before that, 3-4 rounds had some of the worst problem statements for "easy" problems. I mean, one of them kinda had double negative phrased in such a way that it was even painful to think about it. I just read it thought 20 minutes and left the contest. I mean, I love thinking but not in this "artificially difficult" way.

I don't know if it was done on purpose to make it difficult for AI to solve but you should NEVER do that for easy problems.

retard.

The issue with "anti-gpt" problems in div.2 and below is that usually author thinks that he created such a problem, when in reality he just can't write a good prompt and the problem is actually 1-shot solvable by o1-mini and above level models.

As for the cheaters: they are a tiny fraction of the total contestant pool. If you take large sample of contests your rating should converge to your real one, even if you were unlucky a couple of times or even if there were 10-20 cheaters above you each time. It shouldn't affect ELO that much in the long run, statistically speaking.

So yeah, you are right.

jkulanko

24 hours ago, # |

I agree. There’s nothing we can do at this point, let’s not ruin it for those of us who actually enjoy problem solving and getting better.

blitztage

+38

Maybe I am of a unpopular view but I do not understand how rated contests provide value other than being well-rated (over unrated contests). For enjoyable problems, can't you just solve them in archives, also grab a bunch of them and solve a timed mashup. In fact, there are lots of unrated contests in gyms which you can use to solve that purpose. Maybe more unrated official codeforces rounds can be held for people with similar views like that of yours. But for the likes of people like mine who attempt codeforces contests for the feeling of competition and the rush that provides, the moment you flip the 'AI Okay' switch, ratings will cease to be something we enjoy.

The point is that, to make div2 D and below gpt proof you will have to make it harder than usual.

negative-xp's blog