beaaaan's blog

By beaaaan, history, 3 hours ago, In English

Apparently, it's codeforces performance beats both o1-mini, o1 and deepseek r1 in codeforces rating:

It's price is on par with o1-mini, I guess they felt the heat from deepseek r1:

  • Vote: I like it
  • +10
  • Vote: I do not like it

»
3 hours ago, # |
  Vote: I like it -6 Vote: I do not like it

o1 is claimed to be 1800 and it fails div2b, i'm suspicious of this

  • »
    »
    3 hours ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    which round is it from may i ask?

    • »
      »
      »
      3 hours ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I remember someone posting a blog about how o1-pro (the 200$ a month version) couldn't solve 2040B - Paint a Strip, but o3-mini can solve it :(

    • »
      »
      »
      75 minutes ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      most of the rounds

»
3 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

I just tested it on all the problems that Deepseek R1 failed that I had tested (from https://codeforces.me/blog/entry/138735 ), it solved all of them (though it took 2 attempts on Maximum AND Queries (Easy version)). I also tested it on Paint a Strip, which o1-pro (which was 200$) wasn't able to solve.

I'm also on the free plan, meaning my o3-mini is on low compute (if it means anything, it also has way shorter wait times)

»
2 hours ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

nvm

  • »
    »
    88 minutes ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    the edit...

    It's nice to know that the lower rated problems aren't entirely screwed, though it's a bit nerve wracking seeing it solve problems on the free plan that it used to not be able to solve on the 200$ plan