Let's look at two problems from the last round, Round 657 (Div. 2):
Problem D 1379D - New Passenger Trams currently has 367 solvers and had 77 solvers in the official contest (among rated participants).
Problem E 1379E - Inverse Genealogy currently has 44 solvers and had 0 solvers in the official contest.
Meanwhile both problems have the same difficulty rating of 2400. How does that make any sense?
On top of that, problem F1 1379F1 - Chess Strikes Back (easy version) currently has 97 solvers and had 10 solvers in the official contest, which is still clearly easier than problem E. But it has a difficulty rating of 2700.
Another problem that is certainly not worthy of 2900 rating, 1372E — Omkar and Last Floor
I also have examples for the opposite case. problems https://codeforces.me/contest/1385/problem/E and https://codeforces.me/contest/1385/problem/F are 2000 and 2300 which doesn't make sense. E is a small variation to a classic problem, and F is a little hard but really not that hard. Should be more like 1700-1800 and 2000-2100.
Edit:
Wow I'm sorry if I offended anybody, I didn't mean to diminish anyone's achievement!
Good news, 1397D and E are now 2300 and 2800 per your suggestion.
Thanks. Rarely some heuristics work not good. In this case, I need to change ratings manually.
Could you also fix this one https://codeforces.me/problemset/problem/86/D
Its rating is 2900, and like... That should not be true right ?
Yes , of course it should be around 2000-2100
Imho this explains very well why 2700+ rating is correct for 86D.
But bro , in current situtaion i dont think that it is a non standard thing .
In that case, you should ask mike to take into account upsolved submissions as well. Rating of people when they upsolved it. click click2
Problem rating is correct as far as the spirit of rating formula is concerned and it shouldn't be changed just for the sake of it.
Can you explain what the heuristics are, or what the overall system is? I used to think problem difficulties were calculated by a single formula with a simple interpretation.
Thanks!
I guess the mystery is somewhere here, in UPD2 "coefficients".
Yeah, the original blog announcing problem difficulties (https://codeforces.me/blog/entry/62865) said they are calibrated so that if your rating is $$$R$$$ and the problem rating is $$$r$$$, your probability of solving it during an official round is $$$f(R - r)$$$ for some function $$$f$$$ with $$$f(0) = 0.5$$$, similar to Elo and similar ratings for two players.
But I don't really understand where they come from, like... is it just based on fitting to the ratings of the participants during the official contest? And if so, is it pre-contest ratings, post-contest ratings, or per-contest "performance ratings"?
Plus, as you said, it seems from Mike's comments like there are probably some ad-hoc heuristics/hacks on top of this basic formula, but we don't know what they are.
I can't easily explain all the details. In the perfect world problem rating is such a rating of opponent that your probability to win him equals to the probability to solve the problem. But in the real world, the data is dirty: consider tourist tried div3 but A is too boring for him. So statistically he didn't solve it and it will give a great boost to the problem rating. I tried to count only official submissions, but for example, for hard div3 problems official submissions give less information than unofficial. So my current way to calculate problem ratings full of some weights, coefficients and heuristics. You can try yourself using API, but I don't think there is a silver bullet to calculate ratings much better. I think now in 98% ratings are quite good, and rest ratings can be tuned manually.
Thanks! I was mostly curious for my own understanding, rather than trying to suggest I could come up with a better system. It would be nice to know the underlying system that originated these ratings while looking through the problemset.
Is it common for highly-rated participants to skip easy problems in low divisions? Anecdotally, when I look at Div3 results, I see GMs and IGMs at the top of the rankings, usually having done the problems in order. I guess I don't see the GMs/IGMs who do the problems out of order though, since they aren't at the top of the rankings. :P
I was also curious if the "rating" of a participant in a contest is considered as their pre-contest, post-contest, or per-contest-performance rating?
Can you just open source the formula (Just like you did with rating formula)? And allow other interested people to do a PhD.
1384B2 - Koa and the Beach (Hard Version) had 307, 1384D - GameGame had 144 and 1384B1 - Koa and the Beach (Easy Version) had 846 solves in contest-time in Div2. But Currently they have 2200, 1900 and 1900 difficulties. Please fix them.
What about these: 1183E(Easy version) being rated at 2000 while 1183H(Hard version) being rated at 1900?
http://codeforces.me/blog/entry/80016
I think there's a big difference between "unrated" and "rated", moreso than "rated low" and "rated high".
Unrated is typically either people who
Also, probably the new account was made so that whatever it posts cannot be traced to the original one, for example, when it was used to post "Reveal how xxx cheats" stuff.