Hello everybody!
After a long break I would like to announce the fourth Bioinformatics contest organized by ITMO University, Bioinformatics Institute, Rosalind and Stepik. The contest is sponsored by JetBrains, Yandex, Serokell, and Genotek.
The contest is hosted on Stepik platform. To participate you have to sign the rules.
The contest consists of two rounds:
- The Qualification round is already started today. It will be open until June 19 11:59 UTC+0.
- The Final round will be held for 24 hours starting at June 26 13:00 UTC+0.
The contest is prepared by the following team: Alexey Sergushichev, Nikita Alexeev, budalnik, demon1999, tourist, cdkrot, GShark, doreshnikov (all ITMO University), German Demidov (Center for Genomic Regulation), Andrey Prjibelski (CAB SPbU).
Are there any prizes?
- 1st place – Whole Genome Sequencing
- 2nd and 3rd – Whole Exome Sequencing
- 4th and 5th – 23andMe or Genotek DNA Service
- 1st–30th – Honorable Bioinformatics T-Shirt
Do I need to know biology?
The knowledge of biology is not necessary to solve the problems!
We wish that you will like the problems!
Good luck!
What is the requirement for being qualified to the final round? It's not stated how maby participants advance, and only 'score the right number of points' is mentioned on the website :(
The only requirement is 'scoring the right number of points', it is stated clearly :)
Everybody who will get at least 952 points (including one point from the "Welcome!" step with the Honor Code) will advance to the Final Round.
Daniil Oreshnikov is doreshnikov, you could have made him a colorful link too. Now it looks like he isn't registered on Codeforces.
Thanks :)
Is it ethical to sequence the genome of one of the greatest programmers of our time? What if a bad agent gets access to it, creating a million clones, thereby causing rating deflation?
Ok why the downvotes?
What Monogon said is COMPLETELY dumb. Your DNA is "legally" flying around, everything you touch, your hair, saliva, typically anything. If a bad agent needs your DNA, they already have it.
So why the downvotes? cz unrated? Elitist/racist ruskis again?
Because his comment was a joke and you are treating as it was a serious inference, genius.
Welcome back! You were missed, even if my CPU is dreading what cursed implementations I come up with.
Hope Benq(or someone else) can share us something about his genome after the contest:-)
By the way, does "Whole Genome Sequencing" really mean the service?
One of it's kind!
The link for time and date says "19 June", but leads to "19 July"
Thank you for the remark! Fixed!
Contest prepared by tourist
Wow insightful! How did you know that?
Follow the link/links. You too will know.
Dude, that was sarcasm.
is this contest rated? if so, how do you know which user is which on the other platform? edit 1: so, let me get things clear, in this website people downvote posts where there's a genuine question and mods do nothing to prevent this?
I think, NO.
You have explained the reason,aren't you? XD
Is this a non-standard definition of a substring? It's very confusing.
Could anyone tell me how to solve Diagnosis without running 2.5 hours for all the tests?
I only have solultions with time complexity $$$O(m(n+q))$$$ or $$$O((\sum cq_i)(\sum cm_i))$$$ or $$$O(m(\sum cq_i)+q(\sum cm_i))$$$.
I solved it by realizing that several test-cases are weak and having a certain pattern e.g.:
There is one input file with $$$|Q_p|=1$$$ and $$$|D_m|=1$$$. I solved it using a linear $$$O(n+m+q)$$$ solution to associate every vertex with a disease.
On another input file, the vertices for each patient are a subset of a certain disease, i.e. $$$Q_p \subseteq D_m$$$. Hence for each patient, I only need to find a disease that is a superset of patient vertices. With a decent filtering trick, my code ran for less than 1 minute for this test-case.
On another input file, I need to draw a portion of the tree to realize that the vertices that belong to the same disease or patient is not randomly placed within the tree, e.g. there is a certain pattern on it. The solution for this test-case becomes pretty trivial.
The fact that the constraint for each test-case isn't explicitly stated made me think that the clue is hidden within the test-case itself.
How did you solve tests 6 and 7? Are those the last two cases you described?
test #7 is $$$Q_p \subseteq D_m$$$
the vertices for each patient are a subset of a certain disease
test #6 is the one with the funny pattern (I need to draw a portion of the tree on a paper).
Another way to solve test 6 is to notice that depths are $$$\leq 5$$$(let's call this number $$$d$$$) and the sizes are $$$\leq 3$$$(let's call this number $$$s$$$). Now, for each (disease, patient) pair the solution is the maximum sum of nodes such that the $$$i$$$-th node is a parent of some disease's node and the $$$i$$$-th patient's node. For each disease, the number of possible parents is $$$s \cdot d$$$, so we can just precompute for each disease all possible ways its parents can contribute to the solution; there are up to $$$(s \cdot d)^s$$$ ways. Then, for each patient, try all $$$d^s$$$ ways to choose the parents and see if there is a disease which can also have that way of choosing parents.
Thanks!
Maybe I am not good at finding the patterns of tests...
I think that (is this task and competetion) finding patterns + thinking about + impelemnting multiple solutions per test takes not less time than running $$$O(m (n + q))$$$. I'd better spent this time on actualizing template from old GCJ to run output only tasks in multiple threads (but I had spent it on optimizing running time from 2h+ to 1h...)
Problem 2 was B(inary)S(earch)
Thanks a lot for challenge, I really enjoyed it (especially binary search on Stepik on B).
1). What was the optimal strategy for problem A?
2). OK, I received some points on problem D. But I don't understand, how was counting the rate of minor haplotype? Was some read counted if it differed with minor haplotype by less than k polymorphisms?
When you got 299.97 of 300 points trying to reconstruct the minor haplotype in Problem 4: