There once was a man named Hugo Ryckeboer, and he came up with a neat way to do static fixed-length RMQ queries in $O(1)$.↵
↵
After playing around with it, I was able to extend it to do range queries of arbitrary associative operations in $O(1)$, with $O(N \log N)$ precomputation. This is the same time complexity achieved by sparse table with idempotent operations, but my data structure reaches this bound for any associative operation.↵
↵
EDIT: this data structure is called "disjoint sparse table"↵
↵
Let's go through some background first.↵
↵
## Prefix sums↵
↵
Take some array A and compute prefix sums in array P.↵
↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [ 1, 4, 10, 15, 17, 24, 25, 29]↵
↵
Now we can compute range sum queries in $O(1)$ by substracting a prefix from another.↵
↵
For example, query `[2, 7)` is $25 - 4 = 21$.↵
↵
<spoiler summary="implementation details">↵
In real implementation I add leading zero for convenience:↵
↵
P = [ 0, 1, 4, 10, 15, 17, 24, 25, 29]↵
↵
Then we can compute the sum of `A[i]+...+A[j-1]` in $O(1)$ by `P[j]-P[i]`↵
</spoiler>↵
↵
Very neat, but we need an inverse operation (substracting, in this case).↵
↵
## Postfix-Prefix Sums↵
↵
Instead, take the array A and compute a suffix sums array S for the first half and a prefix sums array P for the second↵
↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
S,P = [15, 14, 11, 5] [2, 9, 10, 14]↵
↵
Now we can compute queries by adding a suffix of the first half and a prefix of the second half.↵
↵
For example, query `[2, 7)` is $11 + 10 = 21$.↵
↵
<spoiler summary="implementation details">↵
In a real implementation I would add leading zeros and also reverse the suffix sums:↵
↵
S = [0, 5, 11, 14, 15]↵
P = [0, 2, 9, 10, 14]↵
↵
Now, if $i \leq \frac{n}{2} \leq j$, we can compute the sum of `A[i]+...+A[j-1]` in $O(1)$ by doing `S[n/2-i] + P[j-n/2]`↵
</spoiler>↵
↵
Very cool, but queries must cross the halfway point of the array.↵
↵
## Hugo Trick↵
↵
Instead, take the array A, cut it up into intervals of length k, and compute the prefix and suffix sums of each one.↵
↵
k = 3↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [[ 1, 4, 10][ 5, 7, 14][ 1, 5]]↵
S = [[10, 9, 6][14, 9, 7][ 5, 4]]↵
↵
Now we can answer any query that crosses from one interval to the next by adding a suffix of the first interval with a prefix of the second↵
↵
In particular, notice that we can compute all queries of length k using that idea.↵
↵
For example, we can now answer `[5, 8)`, whereas before we could not, as it doesn't cross the halfway point of the array.↵
↵
<spoiler summary="implementation details">↵
Again, in the real implementation I add leading zeros and reverse the suffix sums.↵
↵
P = [[ 0, 1, 4, 10][ 0, 5, 7, 14][ 0, 1, 5]]↵
S = [[ 0, 6, 9, 10][ 0, 7, 8, 14][ 0, 4, 5]]↵
↵
Now, if $j-i = k$, we can find the right interval by taking `l=floor(j/k)` and `cut=l*k` then the sum `A[i]+...+A[j-1]` can be computed in $O(1)$ by doing `S[l-1][cut-i]+P[l][j-cut]`↵
↵
</spoiler>↵
↵
And that is the entire trick that I was taught years ago. In my country we call it "DP Hugo" or "Hugo Trick".↵
↵
It is vVery rad, but it is limited to fixed-length queries.↵
↵
## Enhanced Hugo Trick↵
↵
Let's add the capability for some different-length queries. Take the array A, and cut it up into overlapping intervals of length 2k, spaced by k positions. Then, compute prefix and suffix sums for each one.↵
↵
k = 3↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [[ 1, 4, 10, 15, 17, 24][ 1, 5]]↵
[ 1, 4, 10][ 5, 7, 14, 15, 19]↵
S = [[24, 23, 20, 14, 9, 7][ 5, 4]]↵
[10, 9, 6][19, 14, 12, 5, 4]↵
↵
Now, we can still answer any query that crosses from one interval to another, but that now includes all intervals of length between k and 2k (inclusive).↵
↵
<spoiler summary="implementation details">↵
Again, in the real implementation I add leading zeros and reverse the suffix sums.↵
↵
A this point, writing the example is getting too tedious, so I won't do it.↵
↵
Keep in mind that now the intervals `l` and `l-1` are overlapping, so we actually have to use `l` and `l-2`. Also, the calculation of `l` is a bit different, but I put some example code at the end, and I did it in a different way that side-steps this problem quite nicely.↵
</spoiler>↵
↵
It is vVery cute, but still limited to some specific lengths of queries.↵
↵
## Multilevel Hugo Trick (Disjoint Sparse Table)↵
↵
Instead, create $\log N$ instances of the data structure described above, with k=1,2,4,8,16,..., then each one can handle queries lengths from one power of two to the next.↵
↵
Now, given some query, you can take the logarithm of its length to know which instance will be able to handle it. The logarithm can be taken in $O(1)$ using some bit manipulation. Then, we can answer the queries in a single operation.↵
↵
## Conclusion↵
↵
The examples use range-sum-queries, but this will actually work for any associative operation (I also use identity element to make implementation easier, but it is not strictly required).↵
↵
Therefore, we can answer arbitrary range queries in $O(1)$. More precisely, we can answer range queries with exactly one call to the operation per query.↵
↵
code:↵
↵
↵
~~~~~↵
#define OPER add // associative operation↵
#define IDEN 0 // identity element↵
↵
int add(int a, int b) { return a + b; }↵
↵
struct hugo_block {↵
vector<int> left, right;↵
int cut;↵
↵
hugo_block(vector<int>& data, int k, int _cut) : cut{_cut} {↵
right.push_back(IDEN);↵
for (int i = 0; i < 2*k && cut+i < sz(data); ++i)↵
right.push_back(OPER(right.back(), data[cut+i]));↵
↵
left.push_back(IDEN);↵
for (int i = 0; i < 2*k && cut-i-1 >= 0; ++i)↵
left.push_back(OPER(data[cut-i-1], left.back()));↵
}↵
↵
int query(int l, int r) {↵
return OPER(left[cut-l], right[r-cut]);↵
}↵
};↵
↵
struct hugo {↵
vector<hugo_block> blocks;↵
int k;↵
↵
hugo(vector<int>& data, int _k) : k{_k} {↵
for (int b = 0; b*k <= sz(data); ++b) {↵
blocks.emplace_back(data, k, b*k);↵
}↵
}↵
↵
int query(int l, int r) {↵
return blocks[r/k].query(l, r);↵
}↵
};↵
↵
struct DisjointSparseTable {↵
vector<hugo> hugos;↵
int levels;↵
↵
DisjointSparseTable(vector<int>& data, int _levels) : levels{_levels} {↵
for (int level = 0; level < levels; ++level) {↵
hugos.emplace_back(data, 1<<level);↵
}↵
}↵
↵
int query(int l, int r) {↵
if (r == l) return IDEN;↵
int level = bit_width(unsigned(r-l))-1;↵
return hugos[level].query(l, r);↵
}↵
};↵
~~~~~↵
↵
<spoiler summary="A Nicer way?">↵
↵
A perhaps nicer way do this is in divide-and-conquer style.↵
↵
To build a data structure that can answer any range query with a single call to the operation, you must:↵
↵
- Build a data structure that can answer range queries that are fully contained in the first half↵
- Build another one that can answer queries contained in the second half.↵
- Build a Hugo DP over the entire array to answer queries that cross from one half to the other.↵
↵
This works, but finding the correct Hugo DP to do the query requires a $O(\log N)$ tree traversal, as far as I can tell.↵
↵
EDIT: You can actually find the right node in the tree in O(1) using some bit manipulation — It's essentially computing LCA on a complete binary tree, which can be done using the longest common prefix of two root-to-leaf paths, which can be computed using `CLZ(path_u ^ path_v)` on bitmasks. This is explained more clearly on one of the links in the comments.↵
↵
</spoiler>↵
↵
↵
After playing around with it, I was able to extend it to do range queries of arbitrary associative operations in $O(1)$, with $O(N \log N)$ precomputation. This is the same time complexity achieved by sparse table with idempotent operations, but my data structure reaches this bound for any associative operation.↵
↵
EDIT: this data structure is called "disjoint sparse table"↵
↵
Let's go through some background first.↵
↵
## Prefix sums↵
↵
Take some array A and compute prefix sums in array P.↵
↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [ 1, 4, 10, 15, 17, 24, 25, 29]↵
↵
Now we can compute range sum queries in $O(1)$ by substracting a prefix from another.↵
↵
For example, query `[2, 7)` is $25 - 4 = 21$.↵
↵
<spoiler summary="implementation details">↵
In real implementation I add leading zero for convenience:↵
↵
P = [ 0, 1, 4, 10, 15, 17, 24, 25, 29]↵
↵
Then we can compute the sum of `A[i]+...+A[j-1]` in $O(1)$ by `P[j]-P[i]`↵
</spoiler>↵
↵
Very neat, but we need an inverse operation (substracting, in this case).↵
↵
## Postfix-Prefix Sums↵
↵
Instead, take the array A and compute a suffix sums array S for the first half and a prefix sums array P for the second↵
↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
S,P = [15, 14, 11, 5] [2, 9, 10, 14]↵
↵
Now we can compute queries by adding a suffix of the first half and a prefix of the second half.↵
↵
For example, query `[2, 7)` is $11 + 10 = 21$.↵
↵
<spoiler summary="implementation details">↵
In a real implementation I would add leading zeros and also reverse the suffix sums:↵
↵
S = [0, 5, 11, 14, 15]↵
P = [0, 2, 9, 10, 14]↵
↵
Now, if $i \leq \frac{n}{2} \leq j$, we can compute the sum of `A[i]+...+A[j-1]` in $O(1)$ by doing `S[n/2-i] + P[j-n/2]`↵
</spoiler>↵
↵
Very cool, but queries must cross the halfway point of the array.↵
↵
## Hugo Trick↵
↵
Instead, take the array A, cut it up into intervals of length k, and compute the prefix and suffix sums of each one.↵
↵
k = 3↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [[ 1, 4, 10][ 5, 7, 14][ 1, 5]]↵
S = [[10, 9, 6][14, 9, 7][ 5, 4]]↵
↵
Now we can answer any query that crosses from one interval to the next by adding a suffix of the first interval with a prefix of the second↵
↵
In particular, notice that we can compute all queries of length k using that idea.↵
↵
For example, we can now answer `[5, 8)`, whereas before we could not, as it doesn't cross the halfway point of the array.↵
↵
<spoiler summary="implementation details">↵
Again, in the real implementation I add leading zeros and reverse the suffix sums.↵
↵
P = [[ 0, 1, 4, 10][ 0, 5, 7, 14][ 0, 1, 5]]↵
S = [[ 0, 6, 9, 10][ 0, 7, 8, 14][ 0, 4, 5]]↵
↵
Now, if $j-i = k$, we can find the right interval by taking `l=floor(j/k)` and `cut=l*k` then the sum `A[i]+...+A[j-1]` can be computed in $O(1)$ by doing `S[l-1][cut-i]+P[l][j-cut]`↵
↵
</spoiler>↵
↵
And that is the entire trick that I was taught years ago. In my country we call it "DP Hugo" or "Hugo Trick".↵
↵
↵
## Enhanced Hugo Trick↵
↵
Let's add the capability for some different-length queries. Take the array A, and cut it up into overlapping intervals of length 2k, spaced by k positions. Then, compute prefix and suffix sums for each one.↵
↵
k = 3↵
A = [ 1, 3, 6, 5, 2, 7, 1, 4]↵
P = [[ 1, 4, 10, 15, 17, 24][ 1, 5]]↵
[ 1, 4, 10][ 5, 7, 14, 15, 19]↵
S = [[24, 23, 20, 14, 9, 7][ 5, 4]]↵
[10, 9, 6][19, 14, 12, 5, 4]↵
↵
Now, we can still answer any query that crosses from one interval to another, but that now includes all intervals of length between k and 2k (inclusive).↵
↵
<spoiler summary="implementation details">↵
Again, in the real implementation I add leading zeros and reverse the suffix sums.↵
↵
A this point, writing the example is getting too tedious, so I won't do it.↵
↵
Keep in mind that now the intervals `l` and `l-1` are overlapping, so we actually have to use `l` and `l-2`. Also, the calculation of `l` is a bit different, but I put some example code at the end, and I did it in a different way that side-steps this problem quite nicely.↵
</spoiler>↵
↵
↵
## Multilevel Hugo Trick (Disjoint Sparse Table)↵
↵
Instead, create $\log N$ instances of the data structure described above, with k=1,2,4,8,16,..., then each one can handle queries lengths from one power of two to the next.↵
↵
Now, given some query, you can take the logarithm of its length to know which instance will be able to handle it. The logarithm can be taken in $O(1)$ using some bit manipulation. Then, we can answer the queries in a single operation.↵
↵
## Conclusion↵
↵
The examples use range-sum-queries, but this will actually work for any associative operation (I also use identity element to make implementation easier, but it is not strictly required).↵
↵
Therefore, we can answer arbitrary range queries in $O(1)$. More precisely, we can answer range queries with exactly one call to the operation per query.↵
↵
code:↵
↵
↵
~~~~~↵
#define OPER add // associative operation↵
#define IDEN 0 // identity element↵
↵
int add(int a, int b) { return a + b; }↵
↵
struct hugo_block {↵
vector<int> left, right;↵
int cut;↵
↵
hugo_block(vector<int>& data, int k, int _cut) : cut{_cut} {↵
right.push_back(IDEN);↵
for (int i = 0; i < 2*k && cut+i < sz(data); ++i)↵
right.push_back(OPER(right.back(), data[cut+i]));↵
↵
left.push_back(IDEN);↵
for (int i = 0; i < 2*k && cut-i-1 >= 0; ++i)↵
left.push_back(OPER(data[cut-i-1], left.back()));↵
}↵
↵
int query(int l, int r) {↵
return OPER(left[cut-l], right[r-cut]);↵
}↵
};↵
↵
struct hugo {↵
vector<hugo_block> blocks;↵
int k;↵
↵
hugo(vector<int>& data, int _k) : k{_k} {↵
for (int b = 0; b*k <= sz(data); ++b) {↵
blocks.emplace_back(data, k, b*k);↵
}↵
}↵
↵
int query(int l, int r) {↵
return blocks[r/k].query(l, r);↵
}↵
};↵
↵
struct DisjointSparseTable {↵
vector<hugo> hugos;↵
int levels;↵
↵
DisjointSparseTable(vector<int>& data, int _levels) : levels{_levels} {↵
for (int level = 0; level < levels; ++level) {↵
hugos.emplace_back(data, 1<<level);↵
}↵
}↵
↵
int query(int l, int r) {↵
if (r == l) return IDEN;↵
int level = bit_width(unsigned(r-l))-1;↵
return hugos[level].query(l, r);↵
}↵
};↵
~~~~~↵
↵
<spoiler summary="A Nicer way?">↵
↵
A perhaps nicer way do this is in divide-and-conquer style.↵
↵
To build a data structure that can answer any range query with a single call to the operation, you must:↵
↵
- Build a data structure that can answer range queries that are fully contained in the first half↵
- Build another one that can answer queries contained in the second half.↵
- Build a Hugo DP over the entire array to answer queries that cross from one half to the other.↵
↵
This works, but finding the correct Hugo DP to do the query requires a $O(\log N)$ tree traversal, as far as I can tell.↵
↵
EDIT: You can actually find the right node in the tree in O(1) using some bit manipulation — It's essentially computing LCA on a complete binary tree, which can be done using the longest common prefix of two root-to-leaf paths, which can be computed using `CLZ(path_u ^ path_v)` on bitmasks. This is explained more clearly on one of the links in the comments.↵
↵
</spoiler>↵
↵