Hello, Codeforces!
I am one of the moderators of Luogu, which is the largest competitive programming online judge system and online community in China. We hold online contests frequently.
To emphasize academic integrity, we run an anti-plagiarism system to catch the cheaters after every contest.
Traditionally, all cases of cheating are like “person A uses person B’s code and directly submit it in the contest”. But as more and more people are knowing how to use ChatGPT or other AI language models, the situation is becoming more and more complex.
Now after every easy enough contest (we also have Div. 3 and Div. 4), those easy problems which are capable for ChatGPT to solve, will contain an amount of AI generated submissions.
I kindly ask other contest-hosting platforms, what’s your solution to this problem? How to deal with these AI generated submissions, should we regard these as cheating and disqualifies the submitters? If so, is there any possibilities that there will be many false positives, which their codes are written by themselves but are misjudged as AI generated?
Here in Codeforces, does anyone know how did Codeforces do? Or how did AtCoder do, as their AtCoder Beginner Contests’ first few problems are easy enough? If you are organizing other online platforms, and have some experience, please comment below! Thank everyone who discuss constructively!
off-topic : why not make luogu an international platform just like atcoder ? Atcoder also was only for Japanese people at first , then it converted into international platform
in fact people from anywhere can use luogu in www.luogu.com.cn
but they probably can't read Chinese LOL
Actually Luogu, as a company, has plans to going internationalized, but this depends on our CEO’s thought, I’m just a student moderator. As the community and user will change completely overseas, this plan is suspended. You may not know, Luogu’s CEO has gone to Japan, where he had a friendly conversation with, and asked for advice from AtCoder’s CEO, Takahashi-kun (his handle is chokudai on Codeforces).
chokudai
the Luogu Dev Team is working on it.
the new international is at https://www.luogu.com, but it is only half-done yet.
My opinion: using AI should not disqualify the users. Why bother with it? Too much work with false positives. Lets just hope AI does not get good enough for Div1/Div2 :)
PinkieRabbit why you don't make it English??
It's going to be English.
I don't think codeforces does anything about AI generated code, because its rules state that generated code is allowed if:
And ChatGPT is definitely counted as one of those tools.
That’s very convincing, this rule is very tolerant. Though I doubt that this “pre-ChatGPT” rule may be considered outdated now?
Also in Luogu we have different rule
which means “during the contest, participants may use their own codes written before the contest, as long as the codes are written by themselves; it is prohibited to use codes written by others, no matter if they were written before the contest or not.”
This rule forbids participants to directly use code templates by others (tourist’s template for example (if you (the reader) is not tourist himself)). Actually USACO has similar rules. No matter if you agree with this rule or not, the Codeforces’s rule and result are obviously not applicable here.
Even if the rule is out of touch with current reality, there is little you can do about ChatGPT. Tools that predict if the text was AI generated exist for half a year, and what? They flag US constitution or Bible as AI generated. It's very unreliable, any semi official text is likely to be labeled as AI generated, just because of how slick ChatGPT is.
It will probably be worse for code, since there are much less individual quirks for programming and they are much more subtle, training some model to distinguish that is a hell of a job.
The situation is, ChatGPT may gives very similar codes according to the prompts, and that triggers our anti-plagiarism system, now that’s very subtle. Caught participants may complain about they weren’t copying others’ code, but ChatGPT’s. Though we can interpret the word “others” as it includes ChatGPT, then those participants can say nothing.
It's not a problem.. in (div3,div4) most people and ai models can solve a,b,c but only a real person can solve d,e,f in my opinion the problem setters can make a small trick in the problems makes the ai models not able to solve'em
Make problems difficult enough or contain corner cases(or smth else) so that ChatGPT won't get AC.
The soul purpose of such rounds is to encourage newcomers to give contests.
I don't see making them harder as a good solution, but maybe that's the best we have...
If chatgpt can solve the problem correctly then even a monkey can.
That is not the case, today a Luogu Monthly Div.1 and Div.2 has ended. The problem Div.2 B is
And ChatGPT solves this problem.
You can think of the problem for a while. I don’t think this problem is easy enough for a monkey to solve.
I just fed this to chatgpt. It threw up a bunch of code and a completely nonsense explanation and because the problem is a trick question it just happens that it's right...
Maybe the lesson is not to give trick questions?
If you are worried, why not just check if the problem is solvable by chatgpt before the contest? You can use the api, which they claim is not used for training data and deleted after 30 days. (Of course if someone works at openai and knows your account that's another problem.)
Can a language model do an anti-plagiarism check?
I am not sure if that is possible with code
How do you think it is done now?
There even was a contest where participants had to write their own anti-plagiarism check: https://codeforces.me/contest/537
like when it is creating sentences it is easy to detect it , but when someone generates a code they can easily change things and it wouldnt be accurate + they can still get the idea from gpt and code it on their own
PinkieRabbit would you share which algorithm you implement in anti-plagiarism module. I googled but could not find a promising solution. We can't use jaccard, cosine similarity as they do not catch sequences leading to more false positives. Could only think of LCS on tokenized code submissions. How do you deal with false positives? There are thousands of submissions and there is high chance two user thinking same sequences.
I've been contemplating sharing my observations on a this topic, and I believe it's the time to share things anyway.
I have conducted several experiments using LLms to assess their performance in various programming contests. Among the models I tested, GPT-4 has consistently stood out, hence I predominantly utilized it. To enhance its capabilities, I integrated an agent on top of GPT-4 that automates the processes of code writing, testing, and evaluation. Additionally, I employed prompt engineering to simulate a chain of thought.
The results have been quite astounding. I allowed the model to participate in three contests autonomously, and it achieved impressive scores of +1900 and +1600 in div3 and div4 contests "fully solved a Div4", respectively. I've also observed that the manner in which the problem statement is presented to the model significantly impacts its performance. On occasion, minor interventions are required to guide the model to the correct solution.
For those interested in replicating my findings, I recommend using ChatGPT Plus combined with a code interpreter. The outcomes should be commendable. As a side note, my current configuration of the model has a rating of +2000 on LeetCode and even achieved a +2400 performance rating in one of the rounds.
I'm sharing this to highlight the potential of such models. While many believe these models don't make a significant difference, I speculate that an advanced model beyond GPT-4 could potentially solve even Div1 problems.
Congrats, this is the best troll I have read in weeks :)
This is one of the reasons I didn't want to post such comment. And I'm not here to argue but anyways I will record a screencast of gpt4 solving div3 or div4 contest and I'll share it publicly and we'll see if it's a troll.
How long does it take to solve before there is no improvement? A couple of minutes at most? If you can show video of gpt4 autonomously full solving a div3/4 in live contest in 15 minutes (and not "I solved the problem in my brain as a human and now I'm handholding it through writing the code") I'll gift you 20 bucks.
I don't recall claiming that it can solve the contest in 15 minutes. Are you now challenging whether it can solve the problem, or are you questioning the speed at which it can solve the problem? Either way, at least the last time, it managed to ace a Div4 contest in around an hour.
Ok, then just show me it autonomously full solving the live contest with no time limit and I'll gift you 20 bucks. Also claim your 1000 free citations and international fame.
proof or it didn't happen
No, really. You see OpenAI themselves saying that GPT-4 got a whopping 400 rating on Codeforces, and now someone comes claiming it gets to 1600+ rating? I don't (necessarily) accuse you of lying, but such bold claims must be supported by equally robust evidence.
I agree that these are bold claims that require proof, but I don't think OpenAI is interested in or investing time in making their model more optimal for programming contests. As for the 400 rating, I'm sure it's based on the raw output from OpenAI's API. If you understand my comment, you'll see that I'm not claiming it always provides the correct solution on the first try. However, in many cases, the way the user feeds the model samples and prompts helps the model debug or fix the code. In any case, I will post the proof as soon as I can.
I admit that I was too quick to judge. Though, it still sounds to me as too good (or too bad?) to be true, so I'm eager to see your results in more detail.
Ok... I don’t think this is reliable...
My main concern is that ChatGPT may give different participants similar codes, and that will be caught by the system. In this case, should we accept participants’ appeal complaining about they weren’t cheating, and they were just using AI instead?
PinkieRabbit there is an youtuber who uploads solution in youtuber shorts at contest time. There are many cheaters who took help from it. Can't we stop them?
You should contact MikeMirzayanov for this.
I guess use AI modal to check the submission of a particular contestant like how he/she use the write code in there previous submission including all the previous contest he/she had given before.
chatgpt can solve some problems that most people can solve it but I think in the future it will be able to solve difficult problems that need advanced techniques so we must solve these issues before this happens
My opinion is that using language models to generate code is not cheating. They are legitimate tools for code generation, even if their performance is lacking at the moment. All they do is lower the "skill floor" of programming.
If you think about it, higher level languages like python already do that: you can write powerful code without thinking about memory allocation or integer overflow or ...
They can be used just as well in contests as in real-world problems outside contests.
Thanks PinkieRabbit for taking the steps to prevent cheating, truly amazing!
I just used ChatGPT to help solve the uva problem 1600.
https://onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&category=0&problem=4475&mosmsg=Submission+received+with+ID+29688775
Prompt:
ChatGPT:
but case 2 is wrong.
Prompt:
Prompt again,
Then it gives correct code to pass.
In summary, I tried the solutions for several hours. And then used ChatGPT for some fixes. After some prompts, it gives the correct code to pass.
Full conveseration:
https://chatgpt.com/share/f24e2525-0318-4c46-8f77-743d664330c0