We broke Topcoder ratings

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	Qingyu	155
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

It's long been known that certain rating systems, namely Glicko-2 and Topcoder, are not monotonic. In other words, there are cases where losing can eventually result in a higher rating. We wanted to know just how severe the issue can be. In joint work with inutard at WWW 2021, we computed how tourist's rating would evolve according to both Topcoder and our custom rating system. The dataset consists of Codeforces rounds up to Looksery Cup 2015, accessed via the Codeforces API. Here, we see that tourist's Topcoder rating is 3284, but could have been as high as 3807 if he were willing to lose on purpose!

Plot

More details on the adversarial strategy: for his first 45 rounds, we simulate tourist playing normally, following historical data. In the next 45 rounds, he purposely becomes last place whenever his Topcoder rating is above 2975, but plays normally otherwise. Then finally, he returns to playing normally for an additional 15 rounds.

A similar strategy recently broke the Pokemon Go Battle League rankings, which seem to be based on Glicko-2: https://www.reddit.com/r/TheSilphRoad/comments/hwff2d/farming_volatility_how_a_major_flaw_in_a/.

Comments (14)

Show archived | Write comment?

TwentyOneHundredOrBust

4 years ago, # |

Does it work on lichess?

→ Reply

EbTech

4 years ago, # ^ |

← Rev. 2 →

If it uses Glicko-2 then I suspect the same exploit will work. The trick is to massively inflate your volatility by alternating between losing and winning.

sam000013

Does Codeforces uses Glicko-2(or similar) too?

← Rev. 3 →

The Codeforces system is monotonic: https://codeforces.me/blog/entry/20762

Although I think Elo-MMR produces nicer ratings, both systems are free from such exploits! Glicko-1 should also be monotonic I think, though I haven't verified it.

dllu

+28

I raised the issue on the Github for Lila (the implementation of Lichess). However the maintainers dismissed it. https://github.com/ornicar/lila/issues/7862

dalex

I don't think so. When you reach your_normal_rating + 100, your opponents will crush you and will not let you climb higher. And weak opponents (your_normal_rating — 300) will not accept your challenges.

It's easier to get this your_normal_rating + 100 with fluctuations.

I'll just try it myself.

https://lichess.org/@/VolatilePlayer

Most of players who played a lot of games have 45-46 deviation. I'll stay on 2100 rating until I reach 45 deviation, then I'll start playing full strength. Let's see if I will overcome my typical 2350-2400. I think not.

inutard

I would be very curious too! Lichess uses Glicko-2: https://i.imgur.com/bOjm17e.png

But given how many players are on the platform, it would be weird if they haven't hacked a fix for this attack.

Please feel free to email / message us your results! We can include it in our repo and credit you :)

Lichess has easy API and open source code, you can actually experiment with other players' results.

Also, to get maximum profit, I should return to this account after some time (maybe 1 year), when the volatility greatly increases and I'll be getting +100 for my first games. But that's not the case of your study.

aryanc403

+48

rainboy for topcoder 2021.

malvika.shalvika

-6

You theoretically broke Topcoder ratings. It's still a long way to actually see this happen.

proudnerd99

This blog seems really informative. I just have a small question and hope you won't be offended. Why did you make an effort into this? I mean, was this some sort of academic research or just something you're passionate about? I ask because your work seems genuine and tough and I've never had a drive to do something similar.... Once again, I don't mean any offense.

+25

It started as a fun project a few years ago, out of a curiosity to see whether good theoretical foundations would solve some of the issues with programming contest rating systems. The more recent work was undertaken to turn that project into an academic publication. Hope this helps!

dantrag

Well, it is kind of a good measure of reliability of rating system.

EbTech's blog