yet another competitive programming AI

#	User	Rating
1	tourist	4009
2	jiangly	3773
3	Radewoosh	3646
4	ecnerwala	3624
5	jqdai0815	3620
5	Benq	3620
7	orzdevinwang	3612
8	Geothermal	3569
8	cnnfls_csy	3569
10	Um_nik	3396

#	User	Contrib.
1	Um_nik	163
2	cry	161
3	maomao90	160
4	-is-this-fft-	159
5	awoo	158
5	atcoder_official	158
7	adamant	155
8	nor	154
9	maroonrk	152
10	Dominater069	149

there have been quite a few attempts to create language models able to solve competitive programming problems. i'm nowhere near this ambitious, but i think providing quality problem analyses by providing more detailed tags compared to what Codeforces currently offers, showing who people took which approach to solving a problem, and creating problem embeddings to enhance problem search (e.g. capable of finding a problem similar to the intersection of two problems) could have a more positive direct impact on competitive programmers than behemoths like AlphaCode. don't you wish you knew what proportion of submissions to a problem used push-relabel vs ford-fulkerson, or whether a problem tagged data structures used lazy segment trees or simply a fenwick tree?

if you think this is a good idea -- i might be going insane with tunnel vision because i've spent a lot of time on this concept -- you can help me by annotating your own solutions on my site. if you find anything missing there (tags or features), or have other concerns please let me know. i've trained an AST-based ~30M BERT-like transformer on masked language modeling and just need some data to fine-tune it to predict specific tags. if just 50 people annotate 20 submissions, that would definitely be enough! this will also enhance the embedding, helping it encode salient aspects of approaches to a problem. i've tried using ChatGPT for this task, and the results were pretty bad. i think better, or at least human, data could fix this.

tldr: please consider donating data to my little project here

thank you! :>)

Comments (12)

Write comment?

formidablechief27

3 months ago, # |

i was submitting some solutions and noticed some tags missing (which we need to choose)

Two Pointers Maps Priority Queue Sorting with Custom Comparators There are many ways of dp — Only few are listed

Maybe these additions might help !!

→ Reply

canine

3 months ago, # ^ |

thanks for your feedback! i added some more dp tags and two pointers, but i don't think STL container usage or sorting w/ custom comparators should be input by a human, since they can easily be automatically identified from source code / AST.

cry

orz

terracottalite

2 months ago, # |

The "why are you even here" thing should only appear once I think.

2 months ago, # ^ |

you are absolutely right! should be fixed

nor

← Rev. 2 →

+11

Nice project! I am considering tagging some of my own solutions (and hopefully revisiting some tricks I have used in the past), but I got the following error:

Spoiler

Edit: seems to have been because a round was ongoing during that time. Maybe CF becomes completely different when there is a round going on?

thanks -- i got blocked by cloudflare and thats probably why. i added some throttling to respect CF's ratelimit, should be better now

KrrishBhakt

Just a random suggestion, but wouldn't it be better to create embeddings over editorials? Like embeddings are anyways clustered over a 3D space by virtue of their existence, so wouldn't it be better to create embeddings of edi's and then label clusters using the tag data you're collecting?

meow.

usernames which contain periods are perceived as invalid

thanks for pointing that out, fixed!

SilverTongue1729

Could you add the tags for Chinese Remainder Theorem, or perhaps for extended Euclids?

whoops, i don't know how i missed those! added, thanks

canine's blog