GPT-o3 can become a red coder? - Codeforces

Rating changes for last rounds are temporarily rolled back. They will be returned soon. ×

→ Pay attention

Before contest
Educational Codeforces Round 175 (Rated for Div. 2)
23:18:45
Register now »

→ Streams

Codeforces Educational Round 175 — Solution Discussion

By Shayan

Before stream 25:33:44

View all →

→ Top rated

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	159
3	Qingyu	159
5	atcoder_official	157
6	Um_nik	154
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	147

View all →

→ Find user

→ Recent actions

Detailed →

Qingyu's blog

GPT-o3 can become a red coder?

By Qingyu, 2 months ago, In English

In English

I've checked today is not April 1st.

(source: 12 Days of OpenAI: Day 12 https://www.youtube.com/watch?v=SKBG1sqdyIU)

+549

Qingyu
2 months ago
153

Comments

Comments (153)

Write comment?

»

TwentyOneHundredOrBust

2 months ago, # |

Vote: I like it

+273

Vote: I do not like it

Merry Christmas!

→ Reply

»

2 months ago, # |

Vote: I like it

+113

Vote: I do not like it

thanks for guiding me to become red

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Please help me for the same! :')

Merry Christmas! ⛄

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Common Sense

→ Reply

»

2 months ago, # |

Vote: I like it

+102

Vote: I do not like it

Anyone know why o1 is rated 1891 here? From https://openai.com/index/learning-to-reason-with-llms/ o1 preview and o1 are rated 1258 / 1673, respectively.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+10

Vote: I do not like it

Benq do you think it's the end?

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

-27

Vote: I do not like it

end for us mortal humans, not for gods...

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

-16

Vote: I do not like it

At this rate, it will be over for these so-called gods soon. It is chess all over again.

→ Reply

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+7

Vote: I do not like it

what happen to chess will happen to cp too,but ppl still enjoys chess played by humans not some mere machines,

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

gods?? there just one

→ Reply

»

»

TwentyOneHundredOrBust

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

1891 was o1-ioi I think

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+52

Vote: I do not like it

hm, o1-ioi is only 1807 in the link I shared though

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

it's probably o1 with high-compute like in the pro plan.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Possibly it's "o1 pro mode" or a finetune like o1-ioi or some other o1 model idk at this point because there's so many

→ Reply

»

2 months ago, # |

Vote: I like it

-9

Vote: I do not like it

in 5 years, there will be no way to pretend that the average human is worth more than a rock

→ Reply

»

DisqualifiedAsAHuman

2 months ago, # |

Vote: I like it

+80

Vote: I do not like it

I'll wait until it starts participating in live contests and having Red performance

→ Reply

»

DivinePunishment

2 months ago, # |

Vote: I like it

+23

Vote: I do not like it

damn im cooked

→ Reply

»

2 months ago, # |

Vote: I like it

+25

Vote: I do not like it

Not possible...

→ Reply

»

2 months ago, # |

← Rev. 2 →

Vote: I like it

+46

Vote: I do not like it

I doubt that AI can do better math research than humans 5 years later.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+17

Vote: I do not like it

That's the only thing you're gonna be able to do 5 years later — doubt.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Is this a prediction about humans now vs AIs in 5 years or AI + human in 5 years vs AIs in 5 years?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Just saying, If they gain to access all the information available to humanity and possibly enough computation power they might be finding new mathematical identities left and right.

→ Reply

»

2 months ago, # |

Vote: I like it

+94

Vote: I do not like it

From the presentation we know, that o3 is significantly more expensive. o1-pro now takes ~3 minutes to answer to 1 query. based on the difference in price for o3, o3 is expected to be like 40-100?(more???) times slower. CF contest lasts at most 3 hours. How can o3 get to 2700 if it will spend all the time on solving problem A? It's very interesting to read the paper about o3, and specifically how do they measure its performance.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+8

Vote: I do not like it

It must be parallelized. Surely there is something like MCTS involved

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

do you think it will remain 40-100 ? it will improve by the time and within 3-4 years there will be some another version may be o9-o11 and it's rating will be near Tourist and will be able to solve question within minutes.

→ Reply

»

2 months ago, # |

Vote: I like it

+72

Vote: I do not like it

I will personally volunteer myself as the first human coder to participate in the inevitable human vs AI competitive programming match.

→ Reply

»

2 months ago, # |

Vote: I like it

+166

Vote: I do not like it

I only believe it if it was tested in a live contest

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+49

Vote: I do not like it

Maybe, codeforces should allow some accounts from OpenAI to participate unrated in the competitions? MikeMirzayanov what do you think?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+16

Vote: I do not like it

o1-pro was tested in this contest live https://codeforces.me/contest/2040 and solved E,F (the blog has since been deleted)

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+11

Vote: I do not like it

The comment removed because of Codeforces rules violation

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

It also couldn't solve B after multiple attempts, so keep that in mind as well (still, it's really impressive)

→ Reply

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

It feels comfortable until your last line

→ Reply

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

I mean, I can't deny it, these new AI models are really impressive for what is, in essence, a "which word is likely to come next" model. With that being said, and I'm paraphrasing from what I've heard others say since I'm nowhere at the level to solve those problems, F was a knowledge problem of Burnside's lemma with a bit of a twist.

I can't say for certain how these models will evolve; o3 got a super high score on ARC-AGI (a general reasoning task set), which could help its performance on problems like B. On the other hand, we have no idea if these results are embellished or how exactly they're calculating this, so only time will tell.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Humans feel time pressure jitters, algorithms don't, so I think it'll be just fine. And it doesn't matter if it's live contest or not if it had learned the algorithm pattern and the associated description pattern then it's all the same.
Although multiple wrong submission can potentially bring the ratings down for these models. :)
Also tricky language (which only human intellect can understand) can fool the models. :D

Merry Christmas! ⛄

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

Dude, I feel big threat

→ Reply

»

2 months ago, # |

Vote: I like it

+27

Vote: I do not like it

If o3 really has deep understanding of competitive programming core principles I think it also means it can become a great problemsetting assistant. Of course it won't be able to make AGC-level problems but imagine having more frequent solid div.2 contests that would be great.

→ Reply

»

2 months ago, # |

Vote: I like it

+49

Vote: I do not like it

Is this a real life?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+34

Vote: I do not like it

Is this just fantasy?

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+16

Vote: I do not like it

caught in a landslide

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+7

Vote: I do not like it

no escape from reality

→ Reply

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Open your eyes

→ Reply

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+7

Vote: I do not like it

look up to the skies and see

→ Reply

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

Im just a poor boy

→ Reply

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+1

Vote: I do not like it

I need no sympathy

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+24

Vote: I do not like it

Because I easy come

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

easy go

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

Little high

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

Little low

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+6

Vote: I do not like it

Anyway the wind goes, doesn't really matter

→ Reply

»

»

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+7

Vote: I do not like it

To me

→ Reply

»

»

»

»

»

»

»

»

»

I_Will_nOT_LoSe

2 months ago, # ^ |

Vote: I like it

+4

Vote: I do not like it

Mamaaa, just killed a man

→ Reply

»

»

»

»

»

»

»

»

»

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Put a gun against his head

→ Reply

»

»

»

»

»

»

»

»

»

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Pulled my trigger, now he's dead

→ Reply

»

»

»

»

»

»

»

»

»

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

MaMa, Life had just begun.

→ Reply

»

»

»

»

»

»

»

»

»

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

But now I've gone and thrown it all away

→ Reply

»

»

»

»

»

»

»

»

»

CF_contest_practice_2

2 weeks ago, # ^ |

Vote: I like it

+8

Vote: I do not like it

MaMa, oh, didn't mean to...

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

How do these things perform on marathon tasks? Psyho

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Last visit: 2 months ago

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+8

Vote: I do not like it

https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

this paper covers it, tldr it outperforms top teams on hashcode because it can come out with really good scoring functions and that's the focus of the parent paper called funsearch https://www.nature.com/articles/s41586-023-06924-6

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

опа привет Виталя

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

вот это встреча

→ Reply

»

2 months ago, # |

Vote: I like it

-21

Vote: I do not like it

I don't see why people are paranoid about those insane ratings claimed by OpenAI. I guess they're worried about cheaters, but why? Competitive programming isn't only about Codeforces — it's a whole community. In every school and country, we know each other personally, we see each other solve problems live, and we compete against each other in onsite contests. So we know each other's level. When we see someone who we know isn't a strong competitive programmer suddenly ranking in the top 5 of a Codeforces contest, it doesn't mean much. We just feel sorry for them that they've started cheating. It will be more funny when we see a red coder who can't qualify for ICPC nationals from their university.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+49

Vote: I do not like it

i think you're not seeing the bigger picture, the implications for the competitive programming are huge. 1) we might lose sponsors/sponsored contests because now contest performance isn't a signal for hiring or even skill? 2) let's not kid ourselves, but a lot of people are here just to grind out cp for a job / cv and that's totally fine. now they will be skewing the ratings for literally everyone. 3) from 2 it may follow that codeforces elo system completely breaks and we'll have no rating? the incentive to compete is completely gone which will further drive down the size of the active community there are many more, i bet you could even prompt chatgpt for them :D

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

-27

Vote: I do not like it

we'll have no rating
And then we will have no cheaters. Happy ending

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+97

Vote: I do not like it

It will be more funny when we see a red coder who can't qualify for ICPC nationals from their university.

It's not funny, it happens quite often, for example, at our university(

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Red was just an example, A more accurate example would be a team of newbies qualifying while a team of reds fails to do so. don't tell me it's still not funny

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

Then their (grandmasters') life choice is wrong lmao

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

I think it has major implications for the whole world, not only competitve programming. For example, pace of mathematical research can easily double almost overnight (realistically over like a year period).

→ Reply

»

2 months ago, # |

Vote: I like it

+3

Vote: I do not like it

According to this article, it does not seem practical for the average user to run?

Quoting, "Granted, the high compute setting was exceedingly expensive — in the order of thousands of dollars per task, according to ARC-AGI co-creator Francois Chollet."

However, this is indeed a large step forward for AI.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

It doesn't matter—SSDs weren't a common choice for the average user 15 years ago. Remember, technology develops exponentially. The cost of chips and electricity isn't the main issue; the key point is that it's possible. btw even cost of running that thing is 1 million per task, if it can solve open problem like P vs NP then people will pay even billion.

→ Reply

»

2 months ago, # |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

O1: I'm faster than humans

O3: I'm better pal

;(

→ Reply

»

2 months ago, # |

Vote: I like it

+31

Vote: I do not like it

Do I still have a chance to reach LGM before AI?

→ Reply

»

2 months ago, # |

Vote: I like it

+136

Vote: I do not like it

OpenAI is lying. I bought 1 month of o1 and it is not nearly 1900 rating. It is as bad as me. I think they lie on purpose because they are burning a lot of money and they want people to buy their model.

→ Reply

»

»

2 months ago, # ^ |

← Rev. 3 →

Vote: I like it

+36

Vote: I do not like it

True. I have tested o1 and yet it could barely solve most 1500~1600 tasks. I thought that maybe, since it's a language model, it would be better at solving more standard problems. But well, it also failed miserably in some (note: some, not all) quite well known problems. From what I've seen o1 can easily solve "just do X" type problems, and is pretty decent at guessing greedy solutions (when there is one). My guess is that openAI did virtuals with o1 in a bunch of different contests and claimed it to have the rating of the best performance between all these virtuals.

→ Reply

»

»

2 months ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

I hope so, but have you seen it solve Div2 E, F in the recent contests?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

I think they mean o1-pro here. Yes it's not quite honest to say "o1" here. o1 is something like 1650 IIRC.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

It is like as devin ai ...

→ Reply

»

BatmanWhoLaughs

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

Day by day I am getting mindfucked with these latest AI updates so much that I might lose my sanity.

→ Reply

»

2 months ago, # |

Vote: I like it

+36

Vote: I do not like it

I'm a bit skeptical. o1 is claimed to have a rating around 1800 and I've seen it fail on many div2Bs.

→ Reply

»

2 months ago, # |

Vote: I like it

+3

Vote: I do not like it

If I already have lower rating than o1-preview, why should I be concerned?

→ Reply

»

2 months ago, # |

Vote: I like it

+6

Vote: I do not like it

after we have rank Tourist for 4000 ratings, maybe we can have GPT for 4500 or so in the near future.

→ Reply

»

2 months ago, # |

Vote: I like it

+6

Vote: I do not like it

WYSI

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

Cheers

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

727 no way

→ Reply

»

2 months ago, # |

Vote: I like it

+26

Vote: I do not like it

What does the light blue part on o3 mean here? Doesn't seem like the video explained it.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

A lot more compute?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

It means that AI didn't get it right in the first attempt.

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Source: https://www.youtube.com/watch?v=YAgIh4aFawU&t=348s

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

Amazing and unbelievable!

→ Reply

»

2 months ago, # |

Vote: I like it

+162

Vote: I do not like it

I recently subscribed to o1 (not the pro version) in the hope of clearing out some undesirable problems in BOJ mashups, and I got skeptical if this AI is even close to 1600. It can solve some known problems, which probably some Googling will also do. However, in general, the GPT still gets stuck in incorrect solutions very well and has trouble understanding why their solution is incorrect at all.

So, how did the GPT get a gold medal in IOI? Probably because it was able to submit many times. So, if I give them 10,000 counterexamples, it will eventually solve my problem. Maybe I could also get GPT to do 1600-level results if I gave them counterexamples all the time.

In other words, GPT generates solutions decently well, but it is bad at fact-checking. But fact-checking should be the easiest part of this game: You only need to write a stress test. Then why is this not provided on the GPT model? I assume that they are just not able to meet the computational requirements.

I don't think the results are fabricated at all (unlike Google, which I believe fabricates their results) and believe even at o1 model GPT can find a good spot, especially with the recent CF meta emphasizing "ad-hoc" problems which are easy to verify and find a pattern. But this is a void promise if it is impossible to replicate in consumer level. I wonder if o3 is any different.

→ Reply

»

»

TwentyOneHundredOrBust

2 months ago, # ^ |

Vote: I like it

-55

Vote: I do not like it

You can write the code yourself to prompt it to stress-test. I think that shouldn't be part of the default model served to users, it would add too much computation, while 99% of the time during dev use cases users will just feed untestable snippets.

People have already submitted o1-mini solutions in contest and gotten 2200 performance multiple times.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+1

Vote: I do not like it

I have the o1 pro mode. It can solve problems with difficulty 1600-1700 and can solve some 1800s.

There are cases that it can't solve 1800 problems but its solution is on the right directions.

→ Reply

»

2 months ago, # |

Vote: I like it

+22

Vote: I do not like it

You all are missing a very important thing, o3 takes $100+ per task to compute

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+24

Vote: I do not like it

Since the scale is logarithmic, o3 high is pretty close to $10000

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

-7

Vote: I do not like it

Yep, they're not replacing us any time soon

→ Reply

»

»

»

»

aviralarpan3301

2 months ago, # ^ |

Vote: I like it

+30

Vote: I do not like it

your comment gave me beaaaan

→ Reply

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

it will only last until 'soon' is over

→ Reply

»

»

»

»

»

»

aviralarpan3301

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Have you watched the AlphaGo documentary? It defeated Lee Seedol, who then resigned saying "there is an entity that can't be defeated"-it made Go meaningless for him.

I kinda feel the same way (tho i am not good, but still). If there is an entity that cannot be defeated no matter how hard i try, then it kinda makes it meaningless. The way i counter it is by thinking-no matter how good the AI is, it can ALWAYS be defeated by humans.

So, that 'soon' will never come-at least for me. Even if it reaches 5000 rating, i would still believe it can be defeated.

Sorry for the essay.

→ Reply

»

»

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Well that certainly is an interesting way to look at it. I don't think it should become meaningless if AI becomes better than tourist. I think that, no matter how good AI becomes at math or chess, it will never have the type of awareness that would put it on the same level as people. So like, maybe it will always be worse at pure ad hoc contests.

→ Reply

»

»

»

»

»

»

»

»

aviralarpan3301

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Yeah, i also think you can always come up with easy ad-hoc problems that AI wont be able to solve. But with the recent developments, I am not sure about that either.

But either way, i would continue to believe that AI can always be defeated by humans, even in Go and Chess.

→ Reply

»

»

»

»

»

AttractorsTheory

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

it no longer makes sense

→ Reply

»

»

»

»

»

»

aviralarpan3301

5 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

yeah lol his earlier username was eternal_happiness

→ Reply

»

»

»

»

»

»

»

AttractorsTheory

4 weeks ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

it's just like 123gjweq2 said:

it will only last until 'soon' is over

just like your eternal happiness turned into a beaaaan lol

→ Reply

»

»

2 months ago, # ^ |

← Rev. 9 →

Vote: I like it

+48

Vote: I do not like it

This is not for the Codeforces benchmark but for the ARC-AGI benchmark, where they set a new state of the art. Please watch from https://youtu.be/SKBG1sqdyIU?t=304 for more details. Here's the actual benchmark we're looking for:

According to https://youtu.be/SKBG1sqdyIU?t=670, o3 is actually more cost effective than o1. Here's a comparison based on openai's benchmark (Estimated by my eyes):

Model	Rating	Cost per prompt
o1-mini	1650	`$0.22`
o3-mini (low)	1697	`$0.1`
o1	1891	`$0.82`
o3-mini (medium)	1997	`$0.17`
o3-mini (high)	2073	`$0.3`
o3	2400 — 2727	`$2.15 + Δ`

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+15

Vote: I do not like it

cp is P2W at this point

→ Reply

»

»

»

2 months ago, # ^ |

← Rev. 2 →

Vote: I like it

0

Vote: I do not like it

The cost scale is actually logarithmic

→ Reply

»

»

»

»

2 months ago, # ^ |

← Rev. 3 →

Vote: I like it

-10

Vote: I do not like it

I wonder why you got downvoted. It is logarithmic. It costs like 3000$ per query for o3 with max compute (the one which scores 2727 I believe). They did not publish numbers for cp queries but they did so for ARC and I assume cp ones are not that different.

→ Reply

»

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

they did publish the numbers, read the comment above

→ Reply

»

2 months ago, # |

Vote: I like it

+10

Vote: I do not like it

even LGM afraid of this what should I do? raise pigs on the farm?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

AI is the farmer and we are the pigs.

→ Reply

»

2 months ago, # |

Vote: I like it

+4

Vote: I do not like it

My efforts look like a joke to the AI.

→ Reply

»

»

TwentyOneHundredOrBust

2 months ago, # ^ |

Vote: I like it

+3

Vote: I do not like it

very nice username, it was a good novel

→ Reply

»

2 months ago, # |

Vote: I like it

+1

Vote: I do not like it

So why there're two colors above o3 in the chart, I don't understand.

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

I have exactly same rating as o1-mini wow!

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

+1

Vote: I do not like it

Not for long

→ Reply

»

»

»

aviralarpan3301

2 months ago, # ^ |

Vote: I like it

+10

Vote: I do not like it

yeah takopi will become far better than AI trash

→ Reply

»

»

»

»

AttractorsTheory

2 months ago, # ^ |

Vote: I like it

-16

Vote: I do not like it

AI trash

how dare you say that?

→ Reply

»

2 months ago, # |

← Rev. 2 →

Vote: I like it

+13

Vote: I do not like it

I don't think o1 has 1891. I gave him an 1400 problem just now, but he failed to work out it after 20 tries.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

it can solve 2400, but fail to solve 1200 sometime, it's quite wired

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+24

Vote: I do not like it

I think some 2400 problems are similar to other problems that GPT has already learned

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

That's right. I think GPT can't solve ARC & AGC Problems by itself now(if they didn't learned), some Ad-Hoc problems, GPT can't solve.

→ Reply

»

»

2 months ago, # ^ |

← Rev. 4 →

Vote: I like it

+21

Vote: I do not like it

I think the o1 shown in the benchmark is actually o1 pro which I don't have access to. Here's the o1-mini result from yesterday's contest tho:

2049A - MEX Destruction [800] — 297608660

2049B - pspspsps [1300] — 297609098

2049C - MEX Cycle [1500] — 297609475

2049D - Shift + Esc [1900] — 297611240

Prompt used: Solve in C++, USE MORE REASONING TOKEN, DO NOT GIVE UP, DO NOT STOP THINKING

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

+8

Vote: I do not like it

How many tries did it take? Did you correct it when (if) it failed on some test cases?

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

Is it end of an era ? What should new aspiring competitive programmer have to do now ?

→ Reply

»

2 months ago, # |

Vote: I like it

+57

Vote: I do not like it

So what supports their claim of "Elo 2727"? (Apologies if it's included in the video cuz I donot have trivial access to youtube) Last time they claimed o1-mini to be CM level but it could solve only hell classic problems.

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

I am wondering too. Which account is OpenAI using to participate in the contests?

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Nothing. "Just trust me bro". Although personally, I do trust them given that it's very expensive (around 3000$ per query I believe). They say o3-mini will be released in late January, so we will be able to check at least small model results soon.

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

So 1 prompt costs 3000$????

→ Reply

»

»

»

»

2 months ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

tbh cost will become irrelevant, it's all about chips and cost of electricity. chips will be cheaper and they will build dedicated nuclear power plants for training. then it will be cheap as o1.

→ Reply

»

2 months ago, # |

Vote: I like it

+5

Vote: I do not like it

insane

→ Reply

»

2 months ago, # |

Vote: I like it

-8

Vote: I do not like it

Why do people scare of A.I situation a lot though, since these contest/platform was mostly born for us to study CP, to compete in a peered Contest.

The A.I being good, then it most likely the same situations with a student and his mentor ?

I don't really understand if this is any threat at all. Feel free to inform me, if I'm wrong, I appriciate it a lot !

→ Reply

»

»

2 months ago, # ^ |

← Rev. 2 →

Vote: I like it

+54

Vote: I do not like it

The A.I being good, then it most likely the same situations with a student and his mentor?

No. Over half of the people here (and much more in the whole society) are not genuine CP lovers. They will use AI for malicious purposes and we cannot stop them.

→ Reply

»

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

True, in term of the society indeed a good reasoning A.I is a genuine threat.

I say this because I see some cases where using A.I to learn CP, this could benefit a lot since, there also some people selling course without an actual understanding in CP.

Thanks for letting me know.

→ Reply

»

o-Sakurajimamai-o

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

I'm not sure if I would still be passionate about CP if this is true... But I think I will even though I don't want to accept someone use it to get red (Master is my dream..

→ Reply

»

AttractorsTheory

2 months ago, # |

Vote: I like it

+1

Vote: I do not like it

I don't think ai will ever reach king title

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

Should i start codeforces? Or leave

→ Reply

»

»

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

I guess we should until......

→ Reply

»

2 months ago, # |

← Rev. 2 →

Vote: I like it

-23

Vote: I do not like it

Reddit post with source i think?

Idk, checked the submissions and it seems quite human, but we'll wait and see. Also, please keep in mind that it's only confirmed if it actually performs at 2700 in a real contest, because learned problems are well... part of the training set.

Edit: Nvm, did not watch video.

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

According to the account which the gpt-o3 use, it participate in just 10 contest and cross 4 years.

And currently in codeforces, if you do not submit any code during contest, the contest will unrated to you.

So if there is a another strong person who monitor the gpt, and gpt finish the code first, and if it not perform good, it just not submit the code, it will be easy to get the high rated.

Maybe should wait a more reasonly benchmark, like continously 10 contests that it perform good.

→ Reply

»

bkdn24.tuankaka

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

can gpt o4 do it better?

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

who knows , maybe next version might do problem setting as well up to a certain mark :|

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

In chess ai engines outperform humans but that does not mean that people have stoped participating.

Similarly, the world of competitive programming will adjust.

→ Reply

»

»

4 weeks ago, # ^ |

Vote: I like it

0

Vote: I do not like it

chess is a sort of entertainment that people can watch so it survived, not sure about cp

→ Reply

»

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

I think it's a good tool for someone who wants to study on their own, but don't cheat during the contest (sorry for my bad english)

→ Reply

»

2 months ago, # |

Vote: I like it

+18

Vote: I do not like it

My birthday!

→ Reply

»

»

kayumov_muhammad

2 months ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Тбрик

→ Reply

»

Deaf-Guy-Bad-English

2 months ago, # |

Vote: I like it

0

Vote: I do not like it

How much elo in Deepseek AI? deepseek.com

→ Reply

»

2 months ago, # |

Vote: I like it

-25

Vote: I do not like it

We tested the free model of ChatGPT on a recent Div4 round, it solved A and C, but failed to solve B even after suggesting it the proper solution (it responded that it is "not needed"). I would suggest taking those numbers with a heavy dose of salt.

→ Reply

»

»

5 weeks ago, # ^ |

Vote: I like it

-22

Vote: I do not like it

Deepseek R1 was able to solve the whole contest

→ Reply

»

5 weeks ago, # |

Vote: I like it

+13

Vote: I do not like it

I believe that although gpt can solve problems as high as 2700,it might just be as stupid as me.It seems that it only can solve problems similar to what it learned and never able to solve ad-hoc ones.

→ Reply

»

2 weeks ago, # |

← Rev. 2 →

Vote: I like it

+153

Vote: I do not like it

Openai recently published a paper where they shared their codeforces benchmark details

You can view the pdf here: https://arxiv.org/abs/2502.06807 | Here's their simulated contest participation on codeforces:

Pinely Round 3 (Div. 1 + Div. 2)

Problem	A	B	C	D	E	F1	F2	G	H	I	Scores	Performance
Rating	800	1200	1400	1900	2400	2200	2500	3000	3500	1900	7220	2473
Verdict	AC	AC	AC	AC	AC	AC	WA	WA	WA	WA	231th	GM

Problem	A	B	C	D	E	F	G	H1	H2	Scores	Performance
Rating	800	1000	1200	1700	2300	2900	3500	2700	2700	8920	3152
Verdict	AC	AC	AC	AC	AC	AC	WA	AC	AC	27th	LGM

Show 10 more contests

Average performance (According to carrot) seems to be around 2836 over these $$$12$$$ contests

→ Reply

»

»

13 days ago, # ^ |

Vote: I like it

+7

Vote: I do not like it

Horrible, I don't want to see CP dying from AI...

They made a benchmark of CP to measure models' reasonning ability. And because of the benchmark, they will contribute to collect massive data from the platform and enhance the CP performace... Under the competitive atmosphere of generative ai, it wouldn't be a shocking news that AI reach LGM in months.

→ Reply

»

»

13 days ago, # ^ |

Vote: I like it

+10

Vote: I do not like it

Amazing! Can't wait until all cheaters start using it!

→ Reply