Are problem ratings busted? - Codeforces

→ Обратите внимание

До соревнования
CodeTON Round 9 (Div. 1 + Div. 2, Rated, Prizes!)
15:56:44
Зарегистрироваться »

*есть доп. регистрация

→ Трансляции

Leetcode BiWeekly Contest 144 — Solution Discussion

Shayan

До начала 17:26:43

Codeforces CodeTON Round 9 (Div 1 + Div 2) — Solution Discussion

Shayan

До начала 18:56:43

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

Страны | Города | Организации

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя neal

Are problem ratings busted?

Автор neal, 4 года назад, По-английски

По-английски

Let's look at two problems from the last round, Round 657 (Div. 2):

Problem D 1379D - New Passenger Trams currently has 367 solvers and had 77 solvers in the official contest (among rated participants).

Problem E 1379E - Inverse Genealogy currently has 44 solvers and had 0 solvers in the official contest.

Meanwhile both problems have the same difficulty rating of 2400. How does that make any sense?

Теги

problem, ratings

+186

neal
4 года назад
23

Комментарии

Комментарии (21)

Показать архивные | Написать комментарий?

»

4 года назад, # |

Проголосовать: нравится

+61

Проголосовать: не нравится

On top of that, problem F1 1379F1 - Chess Strikes Back (easy version) currently has 97 solvers and had 10 solvers in the official contest, which is still clearly easier than problem E. But it has a difficulty rating of 2700.

→ Ответить

»

4 года назад, # |

Проголосовать: нравится

+55

Проголосовать: не нравится

Another problem that is certainly not worthy of 2900 rating, 1372E — Omkar and Last Floor

→ Ответить

»

4 года назад, # |

← Rev. 8 →

Проголосовать: нравится

-47

Проголосовать: не нравится

I also have examples for the opposite case. problems https://codeforces.me/contest/1385/problem/E and https://codeforces.me/contest/1385/problem/F are 2000 and 2300 which doesn't make sense. E is a small variation to a classic problem, and F is a little hard but really not that hard. Should be more like 1700-1800 and 2000-2100.

Edit:

Wow I'm sorry if I offended anybody, I didn't mean to diminish anyone's achievement!

→ Ответить

»

4 года назад, # |

Проголосовать: нравится

+18

Проголосовать: не нравится

Good news, 1397D and E are now 2300 and 2800 per your suggestion.

→ Ответить

»

4 года назад, # |

Проголосовать: нравится

+70

Проголосовать: не нравится

Thanks. Rarely some heuristics work not good. In this case, I need to change ratings manually.

→ Ответить

»

»

4 года назад, # ^ |

Проголосовать: нравится

+8

Проголосовать: не нравится

Could you also fix this one https://codeforces.me/problemset/problem/86/D

Its rating is 2900, and like... That should not be true right ?

→ Ответить

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

+9

Проголосовать: не нравится

Yes , of course it should be around 2000-2100

→ Ответить

»

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

0

Проголосовать: не нравится

Imho this explains very well why 2700+ rating is correct for 86D.

→ Ответить

»

»

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

0

Проголосовать: не нравится

But bro , in current situtaion i dont think that it is a non standard thing .

→ Ответить

»

»

»

»

»

»

4 года назад, # ^ |

← Rev. 2 →

Проголосовать: нравится

0

Проголосовать: не нравится

In that case, you should ask mike to take into account upsolved submissions as well. Rating of people when they upsolved it. click click2

Problem rating is correct as far as the spirit of rating formula is concerned and it shouldn't be changed just for the sake of it.

→ Ответить

»

»

4 года назад, # ^ |

Проголосовать: нравится

+39

Проголосовать: не нравится

Can you explain what the heuristics are, or what the overall system is? I used to think problem difficulties were calculated by a single formula with a simple interpretation.

Thanks!

→ Ответить

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

0

Проголосовать: не нравится

I guess the mystery is somewhere here, in UPD2 "coefficients".

→ Ответить

»

»

»

»

4 года назад, # ^ |

← Rev. 2 →

Проголосовать: нравится

0

Проголосовать: не нравится

Yeah, the original blog announcing problem difficulties (https://codeforces.me/blog/entry/62865) said they are calibrated so that if your rating is $$$R$$$ and the problem rating is $$$r$$$, your probability of solving it during an official round is $$$f(R - r)$$$ for some function $$$f$$$ with $$$f(0) = 0.5$$$, similar to Elo and similar ratings for two players.

But I don't really understand where they come from, like... is it just based on fitting to the ratings of the participants during the official contest? And if so, is it pre-contest ratings, post-contest ratings, or per-contest "performance ratings"?

Plus, as you said, it seems from Mike's comments like there are probably some ad-hoc heuristics/hacks on top of this basic formula, but we don't know what they are.

→ Ответить

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

+54

Проголосовать: не нравится

I can't easily explain all the details. In the perfect world problem rating is such a rating of opponent that your probability to win him equals to the probability to solve the problem. But in the real world, the data is dirty: consider tourist tried div3 but A is too boring for him. So statistically he didn't solve it and it will give a great boost to the problem rating. I tried to count only official submissions, but for example, for hard div3 problems official submissions give less information than unofficial. So my current way to calculate problem ratings full of some weights, coefficients and heuristics. You can try yourself using API, but I don't think there is a silver bullet to calculate ratings much better. I think now in 98% ratings are quite good, and rest ratings can be tuned manually.

→ Ответить

»

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

+16

Проголосовать: не нравится

Thanks! I was mostly curious for my own understanding, rather than trying to suggest I could come up with a better system. It would be nice to know the underlying system that originated these ratings while looking through the problemset.

Is it common for highly-rated participants to skip easy problems in low divisions? Anecdotally, when I look at Div3 results, I see GMs and IGMs at the top of the rankings, usually having done the problems in order. I guess I don't see the GMs/IGMs who do the problems out of order though, since they aren't at the top of the rankings. :P

I was also curious if the "rating" of a participant in a contest is considered as their pre-contest, post-contest, or per-contest-performance rating?

→ Ответить

»

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

+11

Проголосовать: не нравится

Can you just open source the formula (Just like you did with rating formula)? And allow other interested people to do a PhD.

→ Ответить

»

»

4 года назад, # ^ |

Проголосовать: нравится

+3

Проголосовать: не нравится

1384B2 - Koa and the Beach (Hard Version) had 307, 1384D - GameGame had 144 and 1384B1 - Koa and the Beach (Easy Version) had 846 solves in contest-time in Div2. But Currently they have 2200, 1900 and 1900 difficulties. Please fix them.

→ Ответить

»

4 года назад, # |

← Rev. 2 →

Проголосовать: нравится

+22

Проголосовать: не нравится

What about these: 1183E(Easy version) being rated at 2000 while 1183H(Hard version) being rated at 1900?

→ Ответить

»

4 года назад, # |

Проголосовать: нравится

+59

Проголосовать: не нравится

not ratism

→ Ответить

»

»

4 года назад, # ^ |

Проголосовать: нравится

+24

Проголосовать: не нравится

I think there's a big difference between "unrated" and "rated", moreso than "rated low" and "rated high".

Unrated is typically either people who

don't participate in contests, so they may not really have enough experience with Codeforces to meaningfully talk about contest ratings or logistics.
people who make a second account to post things. If even the author doesn't think their post is good enough to want it associated with their primary account, how likely is it that the post is actually good?

→ Ответить

»

»

»

4 года назад, # ^ |

Проголосовать: нравится

0

Проголосовать: не нравится

Also, probably the new account was made so that whatever it posts cannot be traced to the original one, for example, when it was used to post "Reveal how xxx cheats" stuff.

→ Ответить

Codeforces (c) Copyright 2010-2024 Михаил Мирзаянов

Соревнования по программированию 2.0

Время на сервере: 23.11.2024 01:38:17 (l1).

Десктопная версия, переключиться на мобильную.

При поддержке