My Quest to Answer the Ultimate Question Through Machine Learning: If a hamster was born in aquarium, does it become a fish?

Правка en1, от codetiger927, 2021-08-04 08:31:44

If a hamster was born in aquarium, does it become a fish? — Rotavirus 2021

When the hour hand is way past twelve and the sky is dark, I often find myself pondering about this question, alone.

Of course, a hamster is a hamster, and it cannot just become a fish right? But then again, if its mother gave birth to it inside an aquarium, then who's to say the hamsters didn't acquire a critical mutation that gifted them the ability to breathe underwater, transforming them into fish?

I would be awake at 3 am in the morning, frustrated and defeated by this cruel dilemma that I cannot seem to solve. But, Rotavirus must know, right? After all, he was the person that proposed this rhetorical question, so he must have the answer to his own question! If I could just ask him, maybe, just maybe...

Yet fate cheated me once again: Codeforces has permanently banned Rotavirus — I can no longer ask him. All I have left are only pieces of clues and hints scattered around his comments. If only I could somehow rebuild him out of his words, I could just ask him and he would tell me. The answer feels so close but also just far enough to be out of my reach.

On August 4th, 2021, 11:08 PM edt, I did the impossible: I trained an LSTM recurrent neural network on all of Rotavirus' previous comments and resurrected him (well ... kinda? :clown:) But before we get to the juicy parts, we must first rewind to 8 hours earlier.

I first wrote a simple python scraper to get all of rotavirus' comments. It's just a simple script with a xhtml parser:

import requests
import re
import io
import html
import re
from html.parser import HTMLParser

f = open("comments.txt", "ab")

class scraper(HTMLParser):
	commentMode = False;
	def handle_starttag(self,tag,attrs):
		if(tag == "div" and len(attrs) > 0 and attrs[0][1] == 'ttypography'):
			self.commentMode = True;
	def handle_endtag(self,tag):
		if(tag == "div"):
			self.commentMode = False;
	def handle_data(self,data):
		if(self.commentMode):
			if(data.rstrip() != ""):
				f.write((data + "\n").encode("utf-8"));
page_number = 1
url = "https://codeforces.me/comments/with/rotavirus/page/{}".format(page_number);

s = requests.Session();
parser = scraper()
r = s.get(url);
parser.feed(r.text)
# print(html.unescape(r.text));

However, there were still a lot of emojis and Russian characters that the network can't recognize, so I wrote a cleaner script that only accepts ASCII characters.

f = open('comments.txt','rb');
fout = open('comments_cleaned.txt','w');
lines = f.readlines();
for line in lines:
	fout.write('"{}"\n'.format(line.strip().decode().encode("ascii", "ignore").decode()));

Finally, I plugged everything into my ML senpai Andrej Karpathy <3's neural network, char-rnn. And after 7 hours of excruciating waiting time, I got this

Finally, let's get to the juicy text generation part :p. Unfortunately due to the extremely small sample size, 12k comments, the generation wasn't as good as I had hoped D:, but I still found a couple of comments I found funny that at least somewhat resembles rotavirus:

===================================================================================

"is it rated?"

"I hate mike"

"cheaters are noob"

"segment tree is very indian" (interesting take O_o, talk about the importance of being careful with training data to avoid AI racism)

"Greendians retarned." (okay this is a bit too dangerous :sweat:, just reminder that I did 0 subjective modification on the model or the dataset, so I have 0 endorsements on any of these comments)

"http://codeforces.me/blog/entry/408po-" (:O it seemed to pick up on how to generate codeforces blog links as well, albeit a nonexistent one)

"I believe that means stupid must complain about the segment tree." (The AI seemed to really pick up on Rotavirus' obsession with segment tree XD)

I generated 10,000 characters worth of AI generated text, so If you want to see the uncherrypicked version, here it is

===================================================================================

Finally, time for the ultimate question that we've all been waiting for :eyes:, I primed the AI with the pretext, "If a hamster was born in aquarium, does it become a fish? The answer"

THE RESPONSE IS [drumrolls please]: If a hamster was born in aquarium, does it become a fish? The answer is NO

THANK YOU GOOD NIGHT ✿✿ヽ(゚▽゚)ノ✿

Overall, I believe the disappointing results are caused by the small datasize. The github repository recommends a minimum of 1mb, whereas the data I collected is only 93kb, more than 10 times smaller.

BUT, I actually did find a new source to retrieve a lot more of Rotavirus' comments in the famous AC Discord Server :O

Part 2? To Be Continued...

Note that everything in this post is completely for comedic purposes. Please don't take anything here seriously >.< GOOD NIGHT FOR REALSIES

If you like this post, please consider checking out my blog https://codetiger.me/ :p

Теги funny, rotavirus, hamster

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en1 Английский codetiger927 2021-08-04 08:31:44 5562 Initial revision (published)