I ran into a clever trick while reading the solutions here https://codeforces.me/blog/entry/133382 (look at problem 2006C, "Implementation — Two Pointers")
The problem is: let's say there's an array $$$a$$$ and an associative operation $$${*}$$$ and we are interested in computing $$${f(l, r) := a_l * a_{l+1} * \ldots * a_r}$$$ , where we want to be able to increment $$$l$$$ or $$$r$$$ and easily recompute $$$f$$$. If we can do that, then we can solve all kinds of two-pointer-style problems.
If $$$f$$$ is invertible, e.g. addition, we can easily do this by just keeping track of a single aggregated value. But what if $$$f$$$ is something like $$$\gcd$$$? In this case, this simple solution won't work. But there is a trick.
With the simple solution, the hard part is incrementing $$$l$$$ because if we only keep the aggregated value for $$$f(l, r)$$$, we can't recover the value for $$$f(l+1, r)$$$. To do that we would ideally need to keep a list of values $$$f(l, r), f(l+1, r), \ldots, f(r,r)$$$ and then incrementing $$$l$$$ would just involve popping the front of the list. But then, incrementing $$$r$$$ is hard because we'd need to recompute the whole list.
To combine these two approaches, we keep $$$f(l, m), f(l+1, m), \ldots, f(m, m)$$$ for some intermediate $$$m$$$, and also $$$f(m+1, r)$$$. In the beginning, $$$l=m=r=0$$$ . Then:
- to increment $$$r$$$ we simply recompute the tail value by taking $$$f(m+1, r+1)=f(m+1, r) * a_{r+1}$$$
- to increment $$$l$$$:
- if $$$l < m$$$ we pop the list.
- otherwise, $$$l = m$$$. We set $$$m := r$$$, and recompute the whole list from $$$l$$$ to $$$r$$$. This may seem slow but the subarrays recomputed in this manner don't overlap and therefore this has amortized constant cost.
- to query $$$f(l, r)$$$ we simply compute $$$f(l, m) * f(m+1, r)$$$.
- EDIT: we can decrement l too! Just compute $$$f(l-1, m)$$$ and push to the front of the list. This doesn't break the amortized performance because we never decrement $$$m$$$.
That's it!
Note that this requires we never increment $$$r$$$, so it doesn't fit every problem.
Does this trick have a name? Is it "lore"? I haven't seen it before. But it's been a while since I've done a contest (hard to find time these days...) so maybe it's already well known. For those who haven't seen it, hopefully you find it interesting!
This trick is called "Implementation of queue using 2 stacks". General case problem: https://judge.yosupo.jp/problem/queue_operate_all_composite
That's a great way of thinking about it!
I've known about the two stack trick for a long time, but it's disguised enough here that it doesn't immediately come to mind.
Also see https://codeforces.me/blog/entry/122003
The summary is that this can be also made into deque by only moving half of the elements over every time.(it says minimum but it clearly works for any (associative) monoid). Beware of slower constant though.
Cool trick.
Left bound can be decremented too.
I think this would break the amortized O(1) cost because we would no longer have the guarantee that the recomputed intervals don't overlap.
The amortized cost of recomputation comes from the fact that $$$m$$$ strictly increases, and every time we recompute it takes us $$$O(f(m' - m))$$$ time where $$$m'$$$ is the new value of $$$m$$$. That would hold here too.
The total time complexity depends on (movements + recomputation + queries). Recomputation cost would remain the same, only cost of movements would be affected.
I think you're right, it's the $$$(m, r)$$$ interval that is key here, not $$$(l, r)$$$. We never decrement $$$m$$$ so we can change $$$l$$$ however we want.
I'll update my post.
Auto comment: topic has been updated by Svlad_Cjelli (previous revision, new revision, compare).
I would add to this with the following boring but simple option: this is my preferred method:
If things you are adding are static, you can just build a (disjoint) sparse table and two pointer over it. If O(n log n) is not to your liking you can build one of those blocked O(n log n/B) and O(B) table, and say B=16. If your operation is fast the O(B) part is very similar to O(1) as it is dominated by memory read. You can even use further layers if the operations are pretty slow and you really don’t want O(B) (more layers = more constant factor, 3 layers should be plenty).
(It is stupid to do this in any onsite competition. This is an exclusively prewritten library thing )
I am not saying this trick isn’t useful, it definitely is, but it’s real value can be showcased when the above simple thing doesn’t work: you want even better constant, no prewritten code, online add, some extremely expensive monoid operations. The dumb sparse table method, well, it is dumb and requires far less thinking and checking for conditions.
In Codeforces EDU, pashka talked about this using 2 stacks as queue technique with two pointers in video Segments with Small Spread.
Related problems:
https://codeforces.me/edu/course/2/lesson/9/2/practice/contest/307093/problem/F https://codeforces.me/edu/course/2/lesson/9/2/practice/contest/307093/problem/G https://codeforces.me/edu/course/2/lesson/9/3/practice/contest/307094/problem/I