chreh's blog

By chreh, history, 3 hours ago, In English

Hey everyone,

I don't know if this is the right place to put this, but I made up this problem and am not sure how to solve it. I'm hoping someone with more experience might be able to help.

Setup: I have some code in a coding language. My code consists of N tokens, each associated with a length. I want to minimize my code by adding a macro definition. A macro definition allows me to define a sequence of tokens as a single token with length K, K being the smallest valid unclaimed identifier length. Then, I can replace all occurrences of said sequence of tokens with the defined token.

Now, let's arbitrarily define each distinct token with a distinct nonnegative number. We can then create a mapping W that maps the number associated with each distinct token to the length of that token. Finally, by replacing each token in our source code with its associated number, we get an array A of N nonnegative numbers.

Problem:

Given an array A of N nonnegative numbers (each distinct number corresponds to a type of token in the original code), a map of positive weights W mapping each number in A to some positive weight (this represents the length of the token in the original code), and some positive number K, find a contiguous subarray S1, that maximizes its worth. Its worth is defined as (C-1)*(sum_{i=0}^{i=len(S1)}{W[S1[i]]}-K) ((C-1) * (the weighted sum of S1 — K)) where C is the count of how many times S1 appears in A, without overlaps.

Constraints:

1 <= K

1 <= N <= 1e6

Examples: solution([1, 2, 1, 2, 1, 2], {1:1, 2:2}, 1) -> [1, 2]

[1, 2] appears 3 times without overlaps, so C = 3. (3-1)*((1*1+1*2)-1) = 4. It can be shown that for all other subarrays, (C-1)*(sum_{i=0}^{i=len(S1)}{W[S1[i]]}-K) is less

Follow up: How would you optimize your solution if you had to replace S1 with a new nonnegative number with a weight K and then repeat until the most valuable subarray had a worth <= 0? Would this even be possible in a couple of seconds?

Note: I said that this was a string matching problem because I think it could be a string matching problem if you consider the nonnegative numbers to just be characters in an alphabet. After all, there's no inherent meaning to the number associated with a token.

Thanks in advance to anyone that has any idea on how to solve this.

  • Vote: I like it
  • 0
  • Vote: I do not like it

»
3 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by chreh (previous revision, new revision, compare).

»
3 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by chreh (previous revision, new revision, compare).

»
3 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

So the brute force way would be to just generate all contiguous subarrays in O(N**2) time. We can evaluate each subarray's worth in O(N) time. In total, this would be O(N**3). The follow-up could be done by simply rerunning the algorithm on the edited sequence up to N times, leading to O(N**4) runtime for the follow-up question. I'm sure there's a better way than this...