Bioinformatics Contest 2021 Announcement

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	Qingyu	153
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

Hello everybody!

After a long break I would like to announce the fourth Bioinformatics contest organized by ITMO University, Bioinformatics Institute, Rosalind and Stepik. The contest is sponsored by JetBrains, Yandex, Serokell, and Genotek.

The contest is hosted on Stepik platform. To participate you have to sign the rules.

The contest consists of two rounds:

The Qualification round is already started today. It will be open until June 19 11:59 UTC+0.
The Final round will be held for 24 hours starting at June 26 13:00 UTC+0.

The contest is prepared by the following team: Alexey Sergushichev, Nikita Alexeev, budalnik, demon1999, tourist, cdkrot, GShark, doreshnikov (all ITMO University), German Demidov (Center for Genomic Regulation), Andrey Prjibelski (CAB SPbU).

Are there any prizes?

1st place – Whole Genome Sequencing
2nd and 3rd – Whole Exome Sequencing
4th and 5th – 23andMe or Genotek DNA Service
1st–30th – Honorable Bioinformatics T-Shirt

Do I need to know biology?

The knowledge of biology is not necessary to solve the problems!

We wish that you will like the problems!

Good luck!

Comments (30)

Show archived | Write comment?

kpw29

4 years ago, # |

What is the requirement for being qualified to the final round? It's not stated how maby participants advance, and only 'score the right number of points' is mentioned on the website :(

→ Reply

frogcatcher

4 years ago, # ^ |

The only requirement is 'scoring the right number of points', it is stated clearly :)

Everybody who will get at least 952 points (including one point from the "Welcome!" step with the Honor Code) will advance to the Final Round.

orz

Daniil Oreshnikov is doreshnikov, you could have made him a colorful link too. Now it looks like he isn't registered on Codeforces.

doreshnikov

Thanks :)

Monogon

← Rev. 2 →

+157

Is it ethical to sequence the genome of one of the greatest programmers of our time? What if a bad agent gets access to it, creating a million clones, thereby causing rating deflation?

mingzhiluo

-101

Ok why the downvotes?

What Monogon said is COMPLETELY dumb. Your DNA is "legally" flying around, everything you touch, your hair, saliva, typically anything. If a bad agent needs your DNA, they already have it.

So why the downvotes? cz unrated? Elitist/racist ruskis again?

daviyan

+69

Because his comment was a joke and you are treating as it was a serious inference, genius.

bjy

Welcome back! You were missed, even if my CPU is dreading what cursed implementations I come up with.

ShmilyTY

Hope Benq(or someone else) can share us something about his genome after the contest:-)

+31

By the way, does "Whole Genome Sequencing" really mean the service?

mengineer

One of it's kind!

WitchOfTruth

The link for time and date says "19 June", but leads to "19 July"

Aksenov239

Thank you for the remark! Fixed!

B_iShu

-7

Contest prepared by tourist

Dragnoid99

+12

Wow insightful! How did you know that?

cf-coder

Follow the link/links. You too will know.

Dude, that was sarcasm.

xorswap

-42

is this contest rated? if so, how do you know which user is which on the other platform? edit 1: so, let me get things clear, in this website people downvote posts where there's a genuine question and mods do nothing to prevent this?

Joker2333

+13

I think, NO.

You have explained the reason,aren't you? XD

mango_lassi

t is a substring of s if t is contained as a contiguous collection of symbols in contiguous collection of symbols in s

Is this a non-standard definition of a substring? It's very confusing.

Suiseiseki

← Rev. 3 →

+36

Could anyone tell me how to solve Diagnosis without running 2.5 hours for all the tests?

I only have solultions with time complexity $$$O(m(n+q))$$$ or $$$O((\sum cq_i)(\sum cm_i))$$$ or $$$O(m(\sum cq_i)+q(\sum cm_i))$$$.

heavenhunter

+29

I solved it by realizing that several test-cases are weak and having a certain pattern e.g.:

There is one input file with $$$|Q_p|=1$$$ and $$$|D_m|=1$$$. I solved it using a linear $$$O(n+m+q)$$$ solution to associate every vertex with a disease.
On another input file, the vertices for each patient are a subset of a certain disease, i.e. $$$Q_p \subseteq D_m$$$. Hence for each patient, I only need to find a disease that is a superset of patient vertices. With a decent filtering trick, my code ran for less than 1 minute for this test-case.
On another input file, I need to draw a portion of the tree to realize that the vertices that belong to the same disease or patient is not randomly placed within the tree, e.g. there is a certain pattern on it. The solution for this test-case becomes pretty trivial.

The fact that the constraint for each test-case isn't explicitly stated made me think that the clue is hidden within the test-case itself.

crimsonred

How did you solve tests 6 and 7? Are those the last two cases you described?

+11

test #7 is $$$Q_p \subseteq D_m$$$ the vertices for each patient are a subset of a certain disease

test #6 is the one with the funny pattern (I need to draw a portion of the tree on a paper).

dorijanlendvaj

Another way to solve test 6 is to notice that depths are $$$\leq 5$$$(let's call this number $$$d$$$) and the sizes are $$$\leq 3$$$(let's call this number $$$s$$$). Now, for each (disease, patient) pair the solution is the maximum sum of nodes such that the $$$i$$$-th node is a parent of some disease's node and the $$$i$$$-th patient's node. For each disease, the number of possible parents is $$$s \cdot d$$$, so we can just precompute for each disease all possible ways its parents can contribute to the solution; there are up to $$$(s \cdot d)^s$$$ ways. Then, for each patient, try all $$$d^s$$$ ways to choose the parents and see if there is a disease which can also have that way of choosing parents.

Thanks!

Maybe I am not good at finding the patterns of tests...

Igorjan94

I think that (is this task and competetion) finding patterns + thinking about + impelemnting multiple solutions per test takes not less time than running $$$O(m (n + q))$$$. I'd better spent this time on actualizing template from old GCJ to run output only tasks in multiple threads (but I had spent it on optimizing running time from 2h+ to 1h...)

spar5h

Problem 2 was B(inary)S(earch)

Traduttore

Thanks a lot for challenge, I really enjoyed it (especially binary search on Stepik on B).

1). What was the optimal strategy for problem A?

2). OK, I received some points on problem D. But I don't understand, how was counting the rate of minor haplotype? Was some read counted if it differed with minor haplotype by less than k polymorphisms?

ashmelev

+19

When you got 299.97 of 300 points trying to reconstruct the minor haplotype in Problem 4:

Aksenov239's blog