Text Editor — Facebook Hacker Cup Qualification Round 2016

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	Dominater069	161
4	Um_nik	159
4	atcoder_official	159
6	djm03178	157
7	adamant	153
8	luogu_official	150
9	awoo	149
10	TheScrasse	146

Abridged Problem Statement : You have a list of $N$ distinct words, and you want to print $K$ of them. You have a very basic text editor : It supports 3 types of operations: typing a letter, deleting the previous letter, and printing the current document. You need to leave the document empty after printing $K$ words. What's the minimum number of operations required to get the job done?↵
↵
Constraints : $1 \le K \le N \le 300 $↵
The total length of all $N$ words in each set will be at most $100,000$ characters.↵
↵
Solution :↵
↵
Name the words $s_1, s_2, ..., s_N$. Let $s_0 = ""$. We initially have $s_0$ on the document. After some operations, we want to print $k$ of $s_1, s_2, ..., s_N$ and change the document to $s_0$. For some pair of string $s_i, s_j$, how many operations do we need to print $s_j$ if we have a document with $s_i$? Let $p$ be the length of the **Longest Common Prefix** of $s_i$ and $s_j$. We can compute this naively in $O(\min(|s_i|, |s_j|))$. Then, we need to delete $|s_i| - p$ characters, type $|s_j| - p$ characters, and print it ($1$ operation), for a total of $|s_i| + |s_j| - 2p + 1$ operations. We can precalculate this value for each pair of words $s_i, s_j$ in a short time.↵
↵
Now, suppose we have chosen our $K$ words $s_1, s_2, ..., s_K$. In which order do we print them in order to use a minimum amount of operations? We claim that in fact, if we print the strings in lexicographical order, we will use the minimum amount of operations. Suppose not, let $s_1, s_2, ..., s_K$ be arranged in lexicographical order. Suppose we print the words in the sequence $s_1, s_2, ..., s_{i - 1}, s_j, s_{i + 1}, ..., s_{j - 1}, s_i, s_{j + 1}, ..., s_K$, with $i < j$. Let $a_1, a_2, ..., _K$ be the length of word $s_1, s_2, ..., s_K$ respectively and denote the length of the **Longest Common Prefix** of $s_a$ and $s_b$ by $L(a, b)$. Note that $L(i, j) \ge L(i, k)$ if $i < j < k$. Suppose to the contrary the longest common prefix of $s_i$ and $s_k$ have longer length than the longest common prefix of $s_i$ and $s_j$. Then, let $p$ be the longest common prefix of $s_i$ and $s_k$. The prefix of $s_j$ does not contain $p$ but this contradicts the fact that $s_i, s_j, s_k$ are increasing in lexicographical order. Thus, the inequality holds. We'll use this inequality many times in the calculations below. It can be easily seen that the total number of operations used is $2(a_1 + a_2 + ... + a_K) - 2(L(1, 2) + L(2, 3) + ... + L(i - 1, j) + L(j, i + 1) + ... + L(j - 1, i) + L(i, j + 1) + ... + L(K - 1, K))$. Compare this to when the words are printed in lexicographical order. Firstly, if $j > i + 1$, we just need to compare $L(i - 1, j) + L(j, i + 1) + L(j - 1, i) + L(i, j + 1)$ and $L(i - 1, i) + L(i, i + 1) + L(j - 1, j) + L(j, j + 1)$. We want the former to be at most the latter. Indeed, $L(i - 1, i) \ge L(i - 1, j), L(i, i + 1) \ge L(i, j - 1), L(j - 1, j) \ge L(i + 1, j), L(j, j + 1) \ge L(i, j + 1)$, since the words are arranged in lexicographical order. So, summing up yields the desired inequality. Now suppose $j = i + 1$. We want to compare $L(i - 1, i + 1) + L(i + 1, i) + L(i, i + 2)$ and $L(i - 1, i) + L(i, i + 1) + L(i + 1, i + 2)$. Again, we have $L(i - 1, i) \ge L(i - 1, i + 1), L(i + 1, i + 2) \ge L(i, i + 2)$, as desired. So, printing the words lexicographically is the optimal way.↵
↵
Now, we already have our cruical step. It remains to select the $K$ words that minimize the amount of operations. We can do this by **dynamic programming**. First sort the $N$ words in lexicographical order. Let $dp[end][parts]$ be the minimum number of operations needed to print $parts$ number of words all of whose index is at most $end$ and the last word printed has index $end$. We want $min(dp[1][k], dp[2][k], ..., dp[n][k])$. If $end < parts$, we let $dp[end][parts]$ be $INF$. ↵
↵
Let $cost(i, j)$ be the number of operations needed to print word $j$ if our current document is word $i$. Recall that we already know how to compute $cost(i, j)$ in $O(1)$ since we already precomputed the values.↵
↵
Now, how do we find $dp[end][parts]$? It is the minimum of $dp[i][parts - 1] + cost(i, end)$ among $1 \le i \le end - 1$. Time complexity is $O(n^2k)$. Finally, don't forget to add the cost from $s_0$ to $s_i$. (in the beginning and end). ↵
↵
↵
↵
↵
↵

Rev.	By	When	Δ	Comment
en9	zscoder	2016-01-12 04:06:49	0	(published)
en8	zscoder	2016-01-10 16:53:19	2	Tiny change: '$L(i, j) \le L(i, k)$' -> '$L(i, j) \ge L(i, k)$'
en7	zscoder	2016-01-10 16:52:55	471
en6	zscoder	2016-01-10 14:20:10	9
en5	zscoder	2016-01-10 14:15:03	841
en4	zscoder	2016-01-10 14:02:48	641
en3	zscoder	2016-01-10 13:56:30	1227
en2	zscoder	2016-01-10 13:32:56	553	Tiny change: 'cters.\n\nName t' -> 'cters.\n\nSolution :\n\nName t'
en1	zscoder	2016-01-10 13:25:03	645	Initial revision (saved to drafts)

History