Discussion on a bug of mojibake(especially regarding Chinese) in the Ghosts uploading in gym
Hello guys, I am the contest uploader of 2021 Jiangxi Provincial Collegiate Programming Contest and by the time I uploaded it, I encountered a serious bug of mojibake when uploading Ghosts containing Chinese characters. As it is at the middle of Chinese algorithm contests season, I find it necessary to discuss this bug in public as well as to provide a temporary solution to this bug for Chinese uploader or anyone who suffering from this bug.
Discovery of the bug
In October $$$25^{th}$$$, after I upload the Ghosts for the 2021 Jiangxi Provincial Collegiate Programming Contest through FTP servers, all the Chinese characters in the team name became mojibake as the screenshot below.
As I was convinced that gym supports the unicode display of team name in Ghosts, I consulted another uploader who have successfully uploaded team name with Chinese characters only to find the bug didn't show up before until recents. Then I tried different methods to upload including paste the content right into the Textboard or use other unicode format and they all failed.
However, I accidentally make it right by turning off the proxy server on my computer as someone suggested that I may have encounter some bug during transmitting the data(But it actually doesn't make any sense since the size of the file have never changed as I checked in ftp). And the most weird part of it is that I tried multiple times afterwards with both FTP and Paste with proxy servers off and it all works! So I stuck with the explanation of proxy server.
Reappearance of the bug
In November $$$1^{th}$$$, when 2021年中国大学生程序设计竞赛女生专场 (China Collegiate Programming Contest for Girls) was uploaded in gym, the same problem appears again.
By that time I was just thinking that they may have just encountered the same bug during data transmitting. This thought is invalidated when The 2021 CCPC Guilin Onsite (Grand Prix of EDG) was encountering the same bug when uploading. Apparently it wasn't that simple as I thought it was, since both the problem uploaders, chenjb and Claris are the two of the most prestigious contestants in China. They are definately not the first time to deal with Ghosts uploading comparing to a newbie as myself. So I try to fix this issue with my same old solution and it fails this time.
Temporary solution
As the problem may more likely to be misdecoding, I started to find the decoding method it use to decode Chinese in unicode. The result is ISO 8859-2, which is an encoding method used in Central Europe. But the most anti-instinct part is that it does not support Russian characters, which means that Russian may be encoded with Unicode. As I scanned though the contests in gym and re-uploaded a Ghosts file of Russian contest to my test contest, my conjecture is proved.
So the temporary solution for now is to add some (less than 30 is enough) unreal contestants at the bottom of contestant list with their name in Russian. As they have no submission history, they will not show up in the standings page. But they do help the Codeforces to recognize unicode. The only problem is that if you check the contest carefully, you will find the real number of Ghosts does not match the standing.
Written in the end
Now the standing of those contests are fixed without fixing the bug. While fixing this issue, there is another bug/feature which is rather annoying that each time when I delete the Ghosts, the overall counts of passing the problems remains.
I am more than grateful to MikeMirzayanov for this amazing platform which indeed helps me a lot. Hope the bug will be fixed soon and Codeforces gets better in the future!
Thank you so much for solving our problems! Actually the chinese ghosts can be uploaded normally into Gym before (e.g. one year ago, gp of nanjing). I think the update of Codeforces must trigger some potential bugs.
CCPC Guilin is one of the best contests I ever attended with high quality and well-balanced problems. I am such a fan of you and wish you guys the best! XD (Sadly we didn't reach the precision required for F in the end, but indeed a good lesson to learn in Computational Geometry)
You hacked codeforces successfully!
Orz Claris~
Thanks! Please share direct links to standings that can't be parsed correctly now. The issue is that I switched to another way to guess file encoding, which works worse.
I just created a private contest for test in Gym and uploaded the exact same .dat Ghost file as I have in 2021 Jiangxi Provincial Collegiate Programming Contest.
Here is the invitation link https://codeforces.me/contestInvitation/ecdae1975313aaccd65ba72070f251483c21933c
The standing now became a mess even if the .dat file is exactly the same as I have in 2021 Jiangxi Provincial Collegiate Programming Contest.
I fixed it. Please, check.
Yes, it is all fixed! Thanks!
Hi, It seems that the problem still exists. In this mashup https://codeforces.me/gym/559844/ , I uploaded the ghost data ( https://paste.mozilla.org/1kMRTFpR ), and the encoding is wrong. Please take a look.
It seems that those team name are read in Windows 1251 encoding, while they are Chinese Characters
I fixed it by add a magic line "@t 2003,0,1,有有有有有有有有有有有有有有有有有有有有有有有有" at bottom of team. This is quite weird. Change to any other character will not work.