Question : https://codeforces.me/contest/1183/problem/E
I want to know how it can be done using 1.Graphs 2.Dynamic Programming
Also there is a harder version of the problem. [problem:https://codeforces.me/contest/1183/problem/H] What changes do i need to make for this in my approach?
P.S my rating is 1600. Any help is appreciated Thanks
I will only cover the dynamic programming part. It can also be used to solve: https://codeforces.me/contest/1183/problem/H
Let's define
cnt[i][x]
as the number of distinct subsequences of length $$$x$$$ by only taking characters from index $$$i$$$ to $$$n-1$$$.The problem can be solved by greedily taking the longest subsequences first from
cnt[0][x]
(becausecnt[0][x]
already contains all subsequences).To compute
cnt[i][x]
, we can use dp in $$$O(A \cdot n^{2})$$$ time, where A is the alphabet size. Letcnt_help[i][x][c]
be the number of distinct subsequences ending with character c and only taking characters from index $$$i$$$ to $$$n-1$$$. Initially, all values are 0, exceptcnt[n][0] = 1
.The base case is at index n (no characters). Here's the recurrence for n-1..1:
cnt_help[i][j][c] = cnt[i+1][j-1]
Take all subsequences of 1 length shorter and append character c. Now you have all subsequences ending with character c. You can see that every subsequences here will be distinct as long as all subsequences fromcnt[i+1][j-1]
is distinct.cnt[i][j] = cnt[i+1][j] + (cnt[i+1][j-1] - cnt_help[i+1][j][c])
Add the newly made subsequences tocnt
. But why do we have to subtractcnt_help[i+1][j][c]
? It's to avoid duplication. With simple observation, you can tell why all the subsequences incnt_help[i+1][j][c]
are duplicates. With this,cnt[i][j]
stays distinctAfter that, all the answers can be taken from
cnt[0][x]
as described previously.You can further memory-optimize this by kinda flying the table. Store only
cnt[x]
andcnt[x][c]
, and when iterating for every size j, iterate in decreasing order so that no conflicts happen. This is also easier to code.By the way, check out my dumb comic >:) https://www.webtoons.com/en/challenge/cp-series/list?title_no=365151
Thanks a lot malfple Great explanation! Thumbs Up
Using graphs solution: Let our implicit graph be nodes representing string s and all possible subsequences with edges between all possible transitions, that is between string a and string b where b can be made by removing one character of a. We can't generate this graph explicitly as the total possible number of subsequences is 2^n. The key observation is we need to visit only 100 of these nodes, as k <= 100. So run a BFS with starting node being string s. Go to all strings which can be made by removing only one character. Make sure you only go to distinct strings. If BFS terminates with number of nodes visited being less than k. Answer is -1. Else answer is the total cost computed by BFS.
Sample solution: https://codeforces.me/contest/1183/submission/74256424
Your solution is easy to understand. So we basically do greedy thing and keep on generating biggest possible substring. I got it. How to analyse time complexity of such algorithm? Since there can me all different alphabets or maybe duplicates? Till what N , it won't face TLE?
There's hard version of this question as well. How to deal with it? https://codeforces.me/contest/1183/problem/H
When we're looking at one node/string, we take O((n^2)logn)(logn factor for checking set) time to find which subsequences we can go to which aren't visited. The total number of nodes we visit is min(distinct number of subsequences, k). As BFS terminates when we visit k nodes or there are no more subsequences. So time complexity is O(k(n^2)logn)/O(100(n^2)logn). This approach works only as k is very small <= 100. The harder version of the question can't be solved with this approach as k <= 10^12. Which forces you to use the dynamic programming approach explained by malfple.
Thanks a lot CoderPenguin Got AC with you method with ease.
Glad to be of help. :))
Auto comment: topic has been updated by omggg (previous revision, new revision, compare).