Let's stop making anti-GPT problems?

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	165
2	-is-this-fft-	160
2	Qingyu	160
4	atcoder_official	157
5	Dominater069	155
6	adamant	154
7	djm03178	151
8	luogu_official	149
9	awoo	147
10	Um_nik	146

There seems to be a general trend of problem-setters going out of their way to ensure that their problems aren't solvable by AI. This obviously restricts the meta to the kinds of problems that AI is bad at: which, in practice seem to primarily be ad-hoc problems that require multiple observations. This approach doesn't always (and in my opinion, rarely) yield the most enjoyable/natural problems to humans.

The only motivation for this tendency is to ensure that CF ratings and leaderboards in contests retain some meaning.

What I want to express from this blog:

I think we should stop intentionally making anti-GPT problems and optimize problem-setting for human enjoyment, with a complete disregard for how AI models (and people using said models to cheat) perform on these problems.

I will try to explain why I think so below.

There are only two ways (distinct in their relevance to CP) in which AI gets better in the future:

It becomes smarter than humans and creating problems which are unnaturally hard (with respect to other kinds of problems) for it becomes infeasible because it doesn't have any weaknesses which humans are good at. This seems like a very probable outcome based on how quickly it has improved in the last few years (I can recall discussions in which people made fun of GPT's incompetence and how it would never be able to solve even slightly non-trivial problems due to fundamental issues in how AI models work, back in 2022). However, I don't have any real knowledge about how AI works or what the current AI "scene" looks like so let's not focus too much on whether this is bound to happen.

Let's say it does happen in 5 years, what happens to CP then? Assuming that sites like CF don't go down to due to a lack of revenue (I don't actually know how CF generates revenue), we would realize that since there is no way to prevent misuse of AI, we might as well maximise enjoyment for actual humans (enjoyment for humans + cheating > no enjoyment for humans + cheating).

Now, you may ask: how is the future relevant to the present moment? Well, I think it's because most people are subconsciously thinking of the current decision to make anti-AI problems as going down a path which leads to AI never making cf ratings meaningless, at the cost of some eternal but minor decrease in problem quality (akin to making greedily optimal choices in an optimization problem; they aren't seeing far enough). If one does recognize that ratings will become meaningless in a significantly short amount of time (5 years being a good estimate) then one views the two choices in a different light:
- Keep making anti-GPT problems, which become increasingly hard to create, and less pleasant for humans to solve, as AI keeps getting smarter, cheaper and faster. This also entails dealing with the countless blogs and comments by crying cyans about how a guy using o69 caused his rating delta to go from +2 to -10. Do this for 5 years until there is a pivotal moment when people realize the futility of this charade. If you were to look back upon the last few years from that point, giving up the prestige and meaning we attach to ratings in exchange for preserving the actual enjoyment we derive from this activity would seem like the right thing to have been done.
- Stop caring about misuse of AI and how it affects ratings. Optimize for elegance and enjoyment when creating problems.
In other words, the destination is the same- we can either choose to be slowly dragged to it while throwing a hissy fit or accept reality and walk to it ourselves, enjoying every part of the journey.
Somehow, magically, AI hits a "wall" which it cannot meaningfully improve beyond. For us, this would mean that it's expected performance rating would asymptote to some value. Keep in mind that this value would still be quite high, since it currently seems to be able to perform at a performance rating of at least 2000 in most contests, which places it at the 95th percentile(https://codeforces.me/blog/entry/126802) of human competitors. Once again, I don't want to speculate about how much this score would be in the future, but it seems safe to say that it would at least be 2300, implying that the vast majority of human competitors would benefit in some measure by using AI during contests.

Until the "wall" is hit, the intermittent period would be more-or-less identical to the period where AI gets better in the first situation. The meta would have to rapidly change, problem-setters would have to put effort into making sure that their problems weren't solvable by the SOTA, there would be a lot of volatility, people would constantly complain about AI cheaters and their effects on rating distributions, etc. This period would be as unpleasant as in the first scenario. There are two possible final destinations here:
- We stop caring about misuse of AI and how it affects ratings. Optimize for elegance and enjoyment when creating problems. The same endpoint as the first scenario. We would realize that we should have stopped caring about AI cheating earlier in this timeline too.
- The CF meta also asymptotes to a certain place, and is significantly based around the limitations of the then-SOTA AI. It is obviously unlikely that the kinds of problems AI would be bad at would be the ones that humans enjoy and find natural. It would, also not be perfect wrt cheating and there would still be a lot of it since AI would still be smarter than most humans at all kinds of problems (but worse at certain kinds of problems than a small percentage of humans). Maybe there will be a weird gap between the kinds of problems at offline events like olympiads and online contests, since the former would likely always optimize for human enjoyment (or at least not consider AI capability when creating problems). It could just be me, but I certainly wouldn't want this wonderful activity to end up in such a place and would much prefer having gone the other way earlier.

On a more positive note for people who care about ratings

I shouldn't have started writing this at 6 AM, now the sun is out and I don't feel like sleeping.

Comments (55)

Write comment?

cry

5 weeks ago, # |

+50

→ Reply

coldminded

-43

i ain’t reading allat. I asked ChatGpt to summarize a blog:
The text argues that current contest problem-setting is too focused on making problems unsolvable by AI, which leads to contrived, less enjoyable challenges for human solvers. The author contends that instead of deliberately creating "anti-GPT" problems to preserve contest ratings and leaderboards, problem setters should prioritize crafting elegant, naturally engaging problems for humans—even if that means ignoring how well AI or cheaters might perform. They outline two potential future scenarios:

AI Continues to Improve: As AI becomes increasingly capable, designing problems that AI fails at will become harder and more unnatural for humans. This could lead to constant adjustments, complaints about rating manipulation, and an overall unpleasant meta until a tipping point is reached.
AI Hits a Performance Ceiling: If AI eventually plateaus at a high but fixed performance level, human contests might gradually shift back to focusing on human enjoyment, making AI's impact less significant over time.

Ultimately, the author advocates for embracing the inevitable evolution of AI and redirecting the focus toward human enjoyment in problem creation. They predict that although ratings may initially lose their meaning due to rampant AI-assisted cheating, a future equilibrium could be reached where ratings regain value and cheating diminishes, benefiting genuine human competitors.

negative-xp

5 weeks ago, # ^ |

← Rev. 2 →

+25

there is literally only one line of text in bold… also that summary isn’t accurate lol, ask another model

chief_27

you have become the very thing you swore to destroy

askd

+62

I think your opinion is quite biased since you dislike anti-GPT problems. I kinda have the opposite view, and suspect this might be the more common one.

Also, I don't think this sentence is true lol: "In other words, the destination is the same- we can either choose to be slowly dragged to it while throwing a hissy fit or accept reality and walk to it ourselves, enjoying every part of the journey."

I don't think the people who would throw hissy-fits would be so quick to accept the inevitability of AI beating them. If many problems were GPT-solvable, I bet there would be a lot more hissy-fits (especially if it happened in div1 :p).

hashman

← Rev. 6 →

Actually, problems being good / natural is not something related to being anti GPT, or not being anti GPT. For example, AtCoder has some of the best problems in ARCs and AGCs, and AI still can't solve most of them. So, there is definitely something to be said about the possibility of being anti GPT while still having good problem. But I don't think the quality of Codeforces (Div. 2 at least) can be made as good as ARCs, because of difficulty concerns, so for Codeforces, problems being anti GPT probably must come at the cost of them being not so natural or seeming more artificial than the ones not being anti GPT. For example, in the last round, I think problem D, the formulation was quite natural (of course, it was standard as well, but quite natural too), but problem B and problem C seemed very artificial and not so fun to solve (my opinion).

+21

Wait just to clarify, I do like anti-GPT problems, so your example of ARCs and AGCs supports that :p

Your point about something like B and C from last round being artificial is valid for sure though — I personally still prefer B and C (since I solved D really fast in my head due to its standardness, but then just had to spend a while typing it up...), but I can definitely see why some people would prefer D.

-8

I don't like D per se, because it's so standard, but if I had to choose, just based on reading the statement which problem is probably more "fun" to do, I'd choose D. Of course, this doesn't mean that it's actually more fun to do, but just at first glance, it looks like a better problem. My experience with problems B and C looks like so:

B: Can I achieve $$$2$$$ always? Of course not, sample says I can't. But well, $$$k = n$$$ is a special case, and for $$$k \ne n$$$ I can do some shifting to always achieve $$$2$$$. Can I achieve $$$1$$$? Sometimes, yes. When? Only when I can split into enough subarrays, and now it's trivial.

I believe this sort of problem can serve as a good shock value, for example Buying Jewels, which in my opinion is a very troll but very nice puzzle. But if you put such problems at B, people are just going to metagame the solution based on "it's B" anyway. I definitely prefer problems like Round 1000's B, which require at least some algorithmic mindset, instead of just being adhoc puzzles (which are trivial at position B anyway).

C: Ok, so of course this is going to be a suffix sum. Suffix sum being $$$0$$$ means we consumed the last operation. It being $$$1$$$ means we consumed the second last... Oh wait all values are $$$\ge 1$$$, so for each column I only have some leeway on how much I can take. (Tiny nitpick: It's usually more natural to think that the rows are the days, and the columns are the waiters, so I automatically assumed that, but unfortunately it was not, and I realized this afterwards) Now it's just simple greedy.

I think C was not that bad, except the statement, which did not use the real life model at all ($$$\text{MEX}([a_1, a_2, \dots, a_n])$$$ is a totally reasonable metric to care about /s) but B I don't like that much, because it's just the same idea which has been used in so many other problems before ("can I achieve $$$x$$$?" -- yes. now just find when $$$x - 1, x - 2, \dots$$$ are achievable, and you're done), and especially when it's put on position B, where you know it's gonna be something like this, so it even loses its shock value.

Mindeveloped

+66

We've never tried to make problems intentionally anti-GPT. Just many non anti-GPT problems are also uncoincidentally bad.

djm03178

+10

Obviously there was an attempt to reject GPT-solvable problems before the new rule about restricting AI was made. I'm not sure if that's still the case, but as you can see many people still complain when problems were solvable by GPT (and thus many people cheated) so it's normal that some problemsetters would also want to avoid GPT-solvable problems.

-9

I kind of agree with this, but I think most Div. 2, and in fact a few of the Div. 1 as well are of the "throwing a hissy fit" type. And that already makes up a majority of the users, so I suspect a majority of the users won't be happy with what you said above. However, this is only my prediction, and I would be interested to know the statistics on how many people fall into each of the three categories below:

Does not care about AI cheating, does not think we should make problems AI proof,
Does care about AI cheating, but knows that it's kind of futile to stop it, does not think we should make problems AI proof,
Does care about AI cheating, and thinks it's actually useful to stop it, at least for a while, so does think we should make problems AI proof.

I think most people fall into the third category above, but again, I would be interested to know the statistics on this.

you can use this to turn your comment into a poll

+19

I would be interested to know the statistics on this

Me too. Let's make one now.

Does not care about AI cheating, does not think we should make problems AI proof
Does care about AI cheating, but knows that it's kind of futile to stop it, does not think we should make problems AI proof
Does care about AI cheating, and thinks it's actually useful to stop it, at least for a while, so does think we should make problems AI proof.

So apparently I was wrong, and most people think otherwise...

Well that's actually good news! I think the more people believe in the second option, the better it is for the overall future of competitive programming as a whole, so this is actually good.

temporary1

← Rev. 3 →

+60

generally agreeable blog. as a tester of some recent rounds, ive seen many good problems get rejected just because AI solves them, which i have had mixed feelings about. on one hand, it somewhat degrades the problemsets' quality, but on the other hand, it feels kind of "necessary" that some action be done to try and prevent these AI models from performing too well.

that being said, i believe one of the main reasons authors may feel compelled to make their round "ungptable" is the minimal amount of transparent action being done by codeforces itself to prevent cheating and maintain contest integrity, or the lack thereof. we know very little about how the rules are actually being enforced. sometimes it feels unimpactful, sometimes it feels unreasonable, sometimes it feels impossible. there are blogs concerning the use of AI after almost every contest, and it is mostly well known that these blogs do absolutely nothing. it was incredibly disappointing to see some blatantly obvious cheaters with high rankings go unpunished in recent rounds. this is why i put "necessary" in quotations; it only feels so because you barely see any reactive anti-cheating measure.

of course, cheating isnt and will never be 100% preventable, nor should it ever significantly impact the joy of CP, but this doesnt mean efforts to detect or prevent it are useless. that said, cheating prevention should be handled through active moderation and anti-cheating systems, not by authors slamming their heads against the wall repeatedly coming up with problems until one doesn't get solved by o1 on their attempts.

+43

that said, cheating prevention should be handled through active moderation and anti-cheating systems, not by authors slamming their heads against the wall repeatedly coming up with problems until one doesn't get solved by o1 on their attempts.

I second this. Although it's likely that we won't be able to detect most of the cheats, as someone said, cheaters will keep cheating and they will eventually be caught. As normal contestants, we can only convince ourselves that cheating is bad, and stop spreading the temptation to cheat to others.

I just hope this won't turn into a similar situation as many people still confidently using alts and thinking that alts are not bad.

+71

I third this. I used to be one of those authors that automatically disregard problems that GPT can solve. Of course, when only 4o was a thing, this was still realistic. Now, with o3-mini, I think it's a whole different situation.

I've probably rejected countless problems (either by myself, or by my fellow team of authors) just because GPT can solve it. I used to uphold this standard, but now I'm convinced it's not worth the time spent and the quality tradeoff. After all, lower division contestants are supposed to be educational, and I've always believed that. I don't think my fellow authors deserve to be frustrated when they propose a perfectly fine problem, but it can't be used just because of AI.

If you're an author, I know it can be quite frustrating when you spend so much time making a round, just to get hated because of complaints about AI. I'm completely in support of ignoring these complaints and make the round quality the best possible. I hope other coordinators and authors will agree with this.

-firefly-

+11

I fourth this.

N29

I concur with this blog. AI is highly effective for problem creation, solving, and translation. However, cheating is a human tendency driven by the desire to excel, yet depending on AI for it ultimately hinders personal growth.

vjudge45

At least you can make it anti-GPT, but you can't make it anti-vjudge!

oguzhesenli94

New_and_simple

After reading through some posts here, personally, I am here to learn and enjoy some logical problems, even if I am bad at it. Not everyone has a fast brain, I know I don't, but I believe it can be cultivated. Those who want to get a higher rating through any means possible will continue to find ways to do it regardless of AI based problems. I would rather not miss good problems just because AI can solve it, especially since I struggle at them.

sauron271

-12

if you want questions to be more enjoyable for human beings then add a pic of sydney sweeney without a top in every question description.

-7

she isn’t even hot, you have poor taste

+20

Different men find different types of girls attractive based on a combination of personal preferences, cultural influences, and psychological factors. Some may be drawn to physical features like body type or facial symmetry, while others prioritize personality traits such as intelligence, kindness, or humor. Cultural norms and societal expectations also play a role in shaping attraction, with some men valuing traditional qualities, and others preferring traits like independence and ambition. Ultimately, attraction is highly subjective, and the qualities that appeal to one man might be different from what another man finds compelling, reflecting the diversity of human experiences and desires.

Hi ChatGPT!

Hey! What's up?

_D-One_

eww ... noooo!
Syndey is definitely hot...

Spoiler

would.

no one here is doubting your general inadequacy in life, there is no need to remind everyone again

+15

yes i am a loser for saying i am attracted to sydney fucking sweeney.

123gjweq2

is this the silicon valley everyone keeps talking about?

sleepntsheep

agreed

omsincoconut

hello

orzorzorz

ismoilmirzouz

I think it is just matter of time before rating loses a bit of interest and thus people wouldn't care about rating and it becomes just like chess.

donpaul2718

It's the only logical way forward. Trying to make anything under div2 D GPT proof will be extremely hard. Rating are gonna lose relevance as a consequence which will make most cheaters leave and people who genuinely enjoy problem solving are gonna stay.

aveok

No one has ever created anti-GPT problems. Artificial intelligence can't even solve the easy problems on codeforces. Your words are inappropriate.

nik_exists

Artificial intelligence can't even solve the easy problems on codeforces.

Have you kept up with advancements in AI recently, AI can 100% solve CF problems, even ones that are much more difficult.

nomoreplayer

The "easy" problems in the last two rounds were ok-ish. Before that, 3-4 rounds had some of the worst problem statements for "easy" problems. I mean, one of them kinda had double negative phrased in such a way that it was even painful to think about it. I just read it thought 20 minutes and left the contest. I mean, I love thinking but not in this "artificially difficult" way.

I don't know if it was done on purpose to make it difficult for AI to solve but you should NEVER do that for easy problems.

retard.

The issue with "anti-gpt" problems in div.2 and below is that usually author thinks that he created such a problem, when in reality he just can't write a good prompt and the problem is actually 1-shot solvable by o1-mini and above level models.

As for the cheaters: they are a tiny fraction of the total contestant pool. If you take large sample of contests your rating should converge to your real one, even if you were unlucky a couple of times or even if there were 10-20 cheaters above you each time. It shouldn't affect ELO that much in the long run, statistically speaking.

So yeah, you are right.

jkulanko

I agree. There’s nothing we can do at this point, let’s not ruin it for those of us who actually enjoy problem solving and getting better.

blitztage

Maybe I am of a unpopular view but I do not understand how rated contests provide value other than being well-rated (over unrated contests). For enjoyable problems, can't you just solve them in archives, also grab a bunch of them and solve a timed mashup. In fact, there are lots of unrated contests in gyms which you can use to solve that purpose. Maybe more unrated official codeforces rounds can be held for people with similar views like that of yours. But for the likes of people like mine who attempt codeforces contests for the feeling of competition and the rush that provides, the moment you flip the 'AI Okay' switch, ratings will cease to be something we enjoy.

The point is that, to make div2 D and below gpt proof you will have to make it harder than usual.

Um_nik

The official contest is different for the reason that it is promoted by the website, you receive the email notification, it is shown on calendars like clist etc. To sum it up: a lot of people participate in it. At the same time. I don't think rating is what makes it fun.

4 weeks ago, # ^ |

On further thought, I can concede that the flagship contest of a website receiving additional participants and recognition making it event-like is an important point

I don't think rating is what makes it fun.

This is where I would disagree. I don't claim that it is a problem if someone doesn't think rating induces fun, but making that choice for the community as a whole doesn't seem like the right idea to me. You can instead have both flagship CF rated and unrated rounds (where any kind of problems could go) like the ICPC Mirrors or the older EDU rounds on contests page.

stakudog

I agree with you. If it's just for having fun, the website can just host old problems as contests. Probably 5-10% of participants will have seen any problem before, and even if they have seen 1 or 2, what's the issue, they can enjoy the other problems.

For me personally, solving problems has always been fun. But, as a very important by-product, it's improving my programming skills so investing time in this hobby also helps me in my career while I am just enjoying the process. Otherwise, I'd be trying to learn speedcubing or speedsolving sudoko but that would be a waste of time.

But, if that progress is futile, i.e. after a year of regular practice and participation, if GPT is better than me, then yeah, CP just becomes another timepass like sudoku. People can do it for fun, but not as a skill to improve.

Most importantly, fun is not the reason why Codeforces is free. The fact that this kind of problem solving is desirable to companies and institutes, is why CP is sustaining itself at the moment, it's what keeps Codeforces running. But, if AI becomes the benchmark, these companies or institutes won't need to sponsor or host Competitive Programming contests.

So, I think competitive programming will have to evolve to sustain itself. Here's my prediction, For one, AI will set the base threshold below which anything is unrated. I think Atcoder is doing a good job at the moment. ARC and AGC are pretty difficult for AI right now (That's what I have heard, pardon me, I am not very active on Atcoder). But, the way CF is designed, where there's a whole difficulty gradient, will need to up the minimum difficulty of A,B,C problems like Atcoder.

If CP goes the route of allowing AI (because umm, now AI is here to stay, we can't hide from it), implementation is going to get way easier with AI, the main thing will be finding the right ideas and guiding it to solve the problem. This seems like a reasonable route to me in theory, although it seems very difficult from a problemsetting perspective.

Anyway, those were my thoughts. Thanks if anyone read all the way!

H_a_r_M_i_t

-32

Dual-Round System: Balancing Fair Competition and Enjoyment

There’s a growing trend where problem-setters design ad-hoc challenges specifically to foil AI, all in the name of preserving contest ratings. Unfortunately, this often forces them to sacrifice the natural, engaging problems that we truly enjoy.

What if we could have both fair competition and creative freedom? Here’s a proposal for a Dual-Round System that caters to both:

1. Normal Round

2. Hard Round

By adopting this dual-round approach, we can enjoy an ecosystem where creative exploration and competitive fairness coexist. As AI advances, this system allows us to celebrate human ingenuity while still preserving a meaningful competitive arena.

What do you think of this proposal?

lis05

Thank you ChatGPT!

AksLolCoding

For the hard rounds, a "strict" contest environment isn't going to stop anyone from using telegram or o3 or r1 on another device to cheat (I'm not sure any solution for that exists), so it would be ineffective against most common forms of cheating. For the "normal" rounds, you can just virtual participate and have the same experience. VP is unrated and nothing is stopping you from using aids, and are a good learning experience.

Upvote if you remember the time when the biggest issue with contests was that a lot of problems were simply copied from atcoder/chinese websites. Ahh, good old times...

Six_Seconds

+18

Sorry, I couldn't remember such days. So, I downvoted you.

AGRU

4 weeks ago, # |

FYI thinking models will become the norm and their price will come down by year end as most companies are focused on building better thinking models. So this problem is just going to get worse.

I don't know how to solve it but it does look like cp will require drastic change.

Although I keep asking myself one question though. If a model can perform better then me in some thing, then does it makes sense for me to pursue that thing? It feels similar to jobs getting automated in industrial revolution. Machines replacing humans. And now Ai replacing coders. Maybe its time to find something Ai can't do because thats the only place I see my value.

this feels similar to monkeys swinging in trees because humans can do everything else more optimally

negative-xp's blog

Dual-Round System: Balancing Fair Competition and Enjoyment