After Round 512 finished, I took a look at the rating changes from Div 1 and ended up calculating their average.
To my surprise, the sum of the rating changes was -5383 (anyone want to help double check the math?). This means that for the 500 participants the average rating change was -10.77. I know this contest is probably an outlier, but this seems way too extreme to be reasonable.
If anything, I think the average rating change ought to be slightly positive, in order to reward participation over time. For example if the average rating change per contest is +0.5, then if someone participates in 100 contests over two years (which is some serious dedication), the most this could contribute to their rating is +50, which seems perfectly fine. This also serves as encouragement for relatively inactive people to be more active.
However, averaging more than a 10-point loss in a round is unreasonable and likely to discourage people over time from participating if it keeps happening; i.e., people who maintain a similar level of performance will see their rating go down over time, and people who improve slightly will see their rating stay flat. If my calculations all check out, the rating algorithm likely deserves some reconsideration.
EDIT: After some more observation, it looks like what's happening is that new accounts who do just a few contests, lose rating to everyone else, and go inactive are potentially offsetting this effect overall. It's hard to say for sure without more specific data.
Most contests have negative average and I was bothered by this too. Because this means the system is losing some of its total points every contest. So the only way to keep the total points increasing is to create new account which adds 1500 points but never increase its rating. This means that for each participant with +ve rating change there are multiple accounts that lost more than this participant gain. I think this has something to do with the inflation fighting strategy (which is that the sum of the rating changes of the top participants is not larger than 0).
I think the inflation fighting strategy is the reason for the bias because, on average, the top x participants have performed better than they usually do and therefore gain rating. Thus, the effect of this is to make participants that placed worse than expected to lose even more points than they should. A fairer approach would be to make the sum of the rating changes of every contestant (not just the top x) to be 0.
I think MikeMirzayanov should look into this.
While it's probably a minor effect, don't forget that churn between divisions will introduce some bias: those losing points and going down to division 2 will be included in your average, while those gaining points and moving up to division 1 won't.
That’s true. Another factor that is harder to measure is the effect of new users who do one or two contests and then go inactive.
This effect doesn't matter very much because the sum of the rating changes in any contest should be zero regardless of who's participating. Besides, division 2 winners don't take points from division 1 losers, but rather they take them away from other division 2 losers.
In any case, I think that we would see a negative value if we computed average rating change for division 2.
I'm curious if anyone has looked into it. I never explored that system in detail, but from what I can see and read here, it seems to work like this:
CF has a huge problem of people adding more and more rating into the system by repeatedly creating new accounts after performing poorly on the old one, or simply abandoning their accounts and CP after first several bad contests; there are some weird ways to try fighting rating inflation, like having formulas which result into significant negative sum of rating changes after the contest — to take that rating back out of the system; it doesn't work well as we still have rating inflation issue.
Is this impression reasonable, or am I completely missing something?
In case it is — maybe it would be better to try resolving other part of the issue? Especially doing something to decrease number of people who violate rules by creating multiple accounts? Though that bad initial user experience of "My friends have higher rating than me because they do less contests, and I just keep losing more and more rating" isn't great either.
I think that the rating inflation issue comes from the fact that you can get a MASSIVE amount of rating points by doing very well (something like top 30) in a single contest which wasn't possible under the old system. However, I don't have an issue with this because it's reasonable to reward participants who did very well.
About people abandoning their accounts after a bad performance... well the solution to that is a system like AtCoder, where you start at 0 rating. It solves this problem because if you abandon your account you have to start all over, and it would also solve the bad initial user experience because hey, you will always gain rating on your first contest if you start at 0.
In any case, rating inflation really only affects those at the top of the leaderboard, so subtracting points from everyone equally is not the greatest solution. What bothers me the most about this is that, if you get exactly your expected rank, then you should lose 0 points, but what actually happens is that you lose 20 points.
Starting at 0 rating just punishes people who don't have time to do a lot of contests
I always considered the function f in the AtCoder rating system to be the size of a particular confidence interval. The rating system can, based on the measurements of your performance, conclude that your actual skill is in the interval (APerf - f(n), APerf + f(n)) with some high probability. For various reasons (incl. stability), only the lower bound of the interval is reported.
If anything, AtCoder punishes people that compete and improve. I would have higher rating if I abandoned my account after half of my contests. Still seems better than the high volatility of CF rating, though.
How about a slow rating creep towards 1500 for inactive accounts? Yes, even those with very low rating — if someone tries a contest once, fails and tries again in 3 years, it's fine to let them start from 1500 again. The rating gained-lost this way could form the net rating loss in a contest (or it could be done over the course of several contests etc.).
Perhaps we should stop introducing arbitrary hacks to fix the effects of our previous arbitrary hacks.
I thought all rating systems were just throwing shit on a wall and seeing what sticks...
I believe someone calculated that average rating loss in div1 was ~10 and average rating loss in div2 was ~6 on a number of contests in the recent past (six months to a year). That's a lot of deflation.
Still looking for the source.
I made a small script to show averate rating loss/gain from the last few contests:
Nice work!
Wow, it turns out -10 is not an outlier at all.
Interesting. It clearly contradicts what I wrote below. So apparently we observe rating deflation at macro scale, but rating inflation at top places.
Rich getting richer poor getting poorer yeah capitalism!
Is it only me who is feeling I was denied close to 500( 73 contests*10*0.8) otherwise I am nutella:(
I rather see rating inflation all over the place and I was convinced it is pretty obvious to all people. When nutella color was introduced (~2 years ago) 4 people had it and cutoff was 2900. Now 36 people have >=2900. Maybe at minor scale we can find some weirdly looking data, but overall tendency can't be denied. (Or maybe ... does that apply to top places only?)
I also have an impression that the inflation is still there (at least at ratings >2000..2100). Stats in comment above, describing last 25+ contests, clearly show that there is a negative change of rating sum after pretty much every contest.
These two statements don't necessary contradict — because we are not talking about set of participants being constant over time.
I described one possible issue in my other comment — I suspect that distribution of inactive accounts may give us a hint about where the rating comes from. Does anyone have any stats on it? Some numbers to estimate how much "free rating" we got over last 1 year from accounts which got abandoned...
It is also possible that when we are talking about users with ratings 2900+, or even 2100+ — it doesn't really describe overall picture. These contestants are only a subset of all users, and I wouldn't even be very surprised if all our hacked formulas work somewhat differently for different parts of the range.
I doubt that there could be a rating inflation for >2100, because div 1 contests also have -10 average rating change. The 1900-2100 people can't eat that much negative.
It is possible if rating is being transferred from Div 2 to Div 1.
Well, the average rating is about 1450 according to https://codeforces.me/blog/entry/52470, but you start at 1500. To say rating goes down over time just because average decreases each contest seems wrong.
For rating to be stable, the average account must quit after 5 games. It seems that rating is inflation, so, probably, the average person does not stay more than 5 contests. I don't know if anyone has ever calculated that number, though.
Yes, after observing for a while longer I think the effect of many people who do just a few contests, lose rating, and go inactive (something I mentioned above) is actually quite large and offsets the deflationary effect of the contests. I'll add an edit to the post to note this.
I noticed this post was on the sidebar again. Annoyingly nobody had ever released their code to obtain these data (I just accepted them as true).
So I slapped together a really hacky method if you want to do it yourself, or to find your own data for other contests:
In this case I used this on the latest contest (Lyft final round mirror, div1) and discovered that average rating change for purples is -18.5. I also discovered that for div2-only rounds purples gain an average from anywhere between +8 to +15 points.
Thanks for the data. This helps provide some evidence to the theory that new accounts generally donate rating to Div 2, and then some of that rating gain balances out in Div 1 via competitors newly promoted to Div 1.
I decided that one data point wasn't enough, and I took a good look at about 3 other contests. Turns out the average rating change for purples in those contests was slightly positive, from +0.8 to +3. Maybe the idea that div2 gains trickle down to div1 isn't entirely solid.
That's interesting; I didn't expect the number to vary so much from contest to contest. Sounds like we'd need a lot more data to draw good conclusions.