kitsune's blog

By kitsune, history, 4 years ago, In English

You are given a string $$$s$$$ with length $$$N$$$ and in this problem you should find the number of unique subsequences (not substrings) of the string of every length. eg. find the number of unique subsequences of length $$$1,$$$ then length $$$2, 3 ... N$$$. Answer can be big so print it modulo $$$1e9+7$$$. If the statement is confusing please, see the "explanations" below. I found this post on geeksforgeeks but it counts the total number of distinct subsequences.

Length of the string $$$N<=1000$$$

Example:

Input:

$$$s = dpdp$$$

Output:

$$$2$$$ $$$4$$$ $$$4$$$ $$$1$$$

Explanation:

there are two subsequences of length 1: $$${d, p}$$$

4 subsequences of length 2: $$${dp, dd, pd, pp}$$$

4 subsequences of length 3: $$${dpd, pdp, dpp, ddp}$$$

1 subsequence of length 4: $$${dpdp}$$$

I think it's very easy problem but I have no idea how to solve it. Thanks in advance.

Sorry for my poor English.

  • Vote: I like it
  • +5
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
Rev. 4   Vote: I like it +1 Vote: I do not like it

Auto comment: topic has been updated by kitsune (previous revision, new revision, compare).

update Maybe it's solvable without dp but any kind of help is appreciated

update2 can we modify the code given in the link so that it works for our question?

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Ok, any $$$O(n^3)$$$ solutions?

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

If you want I can explain that solution to the problem described in that geek for greeks site.

»
4 years ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

If a subsequence appears in the string several times, let's only count the "leftmost". That is: for a subsequence, if we could get the same subsequence by moving some indices to the left (while preserving the order of the indices), we don't count that subsequence. This way every subsequence will be counted exactly once (try to prove this!). I'll call the subsequences we count "leftmost".

Denote by $$$\mathrm{dp}[i][j]$$$ the number of leftmost subsequences that have length $$$i$$$ and end at position $$$j$$$. How to update this DP table? For every $$$i$$$, $$$j$$$ and character $$$c$$$ do the following: let $$$k > j$$$ be the leftmost position that $$$s[k] = c$$$ (you can precalculate those). Then add $$$\mathrm{dp}[i][j]$$$ to $$$\mathrm{dp}[i + 1][k]$$$.

Why does this work? If we have a leftmost subsequence of length $$$i$$$ ending at $$$j$$$ and $$$k$$$ is the first appearance of $$$c$$$, then after appending $$$k$$$ the subsequence will remain leftmost. If we would append some other appearance of $$$c$$$, the subsequence won't be leftmost anymore. (Again, try to prove those two facts).

»
4 years ago, # |
Rev. 2   Vote: I like it +5 Vote: I do not like it

It can be solved by taking help from the above mentioned gfg link and considering dp as $$$dp[i][j]$$$ where it represents the number of unique subsequences of untill ith element in string and subsequences are of length j.

then dp transitions will be $$$dp[i][j]=dp[i-1][j]+dp[i-1][j-1]$$$ and $$$dp[i][j]=dp[last[s[i-1]]][j-1]$$$ if $$$last[s[i-1]]!=-1$$$ , last array represents the last index in which the current elements was present previously.

base cases are $$$dp[0][0]=1;$$$ and $$$dp[i][j]=0$$$ for every $$$j>i;$$$

example given string: $$$ababd$$$

1 length sub:3 -> $$$a,b,d$$$

2 length sub:6 -> $$$ab,aa,ad,ba,bd,bb$$$

3 length sub:8 -> $$$aab,aad,abb,abd,bab,aba,bbd,bad$$$

4 length sub:5 -> $$$aabd,abab,abad,babd,abbd$$$

5 length sub:1 -> $$$ababd$$$

Code for reference