Blog entries - Codeforces

#	User	Rating
1	jiangly	3846
2	tourist	3799
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3590
6	Ormlis	3533
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
9	Um_nik	3451

#	User	Contrib.
1	cry	165
2	-is-this-fft-	161
3	Qingyu	160
4	atcoder_official	157
5	Dominater069	156
6	adamant	154
7	Um_nik	151
7	djm03178	151
9	luogu_official	149
10	awoo	147

steveonalex's blog

My Solution to IOI17 P6 — Ancient Books

By steveonalex, history, 6 weeks ago, In English

The official solution isn't exactly the clearest, so I guess I will present own solution to this problem.

HackMD blog if you prefer it: HackMD: My Solution to IOI17 P6 — Ancient Books

1. Abridged Statement:

Given $$$n$$$ tables, each table has a book. The table is numbered from $$$0$$$ to $$$n-1$$$, and each book has a distinct value from $$$0$$$ to $$$n-1$$$ (let's denote the value of the book on the $$$i^{th}$$$ table as $$$p_i$$$). Initially, lil Johnson starts at table number $$$s$$$ ($$$0 \leq s < n$$$), and is not holding any book.

Lil Johnson has four possible operations:

If lil Johnson isn't holding a book, and there is a book in the current table, then he can pick it up.
If lil Johnson is holding a book, and there is no book in the current table, then he can put that book down.
If lil Johnson is holding a book, and there is also a book in the current table, then he can replace the book.
Lil Johnson go to the left or to the right i.e. if he is current at position $$$s$$$, then he can either go to table number $$$s-1$$$ or $$$s+1$$$ (obviously, he has to maintain $$$0 \leq s < n$$$ at all time.

The first three operations can be done instantaneously, and the fourth operation will take 1 second. What is the minimum time lil Johnson need to sort the array $$$p$$$ and come back to where he start?

Example: $$$p = [1, 0], s = 0$$$. Here, Lil Johnson picks up the book at table 0, moves to table 1 to replace the book, returns to table 0, and places the book down. The total time is 2 seconds.

Constraint:

$$$p$$$ is a permutation consisting of distinct integers from $$$0$$$ to $$$n-1$$$.
$$$n \leq 10^6$$$, $$$0 \leq s < n$$$.
Time limit: 2 seconds. That means the complexity must be $$$O(n)$$$ or $$$O(n * log(n))$$$, or more if you are an optimization connoisseur.

2. The lower bound of the answer

For the sake of simplicity, we will assume that $$$p_0 \neq 0$$$, and $$$p_{n-1} \neq n-1$$$.

You can imagine that each time we do the operation (4), we are moving at most one book closer to its intended destination. Therefore, the answer would be at least $$$\sum^{n-1}_{i = 0}|p_i - i|$$$.

So can we always achieve this lower bound? No, but it is helpful to know when can this be achieved.

To make my life easier, just interpret "wasted time" as any additional time over the lower bound of the answer. For example, if the answer is 420 while the lower bound is 400, then we call those 20 seconds "wasted time".

One obvious case is that if we draw the edge $$$i \rightarrow p_i$$$, it forms a cycle (or circle for uncultured peasants). That is because you can bring the book at table number $$$s$$$ to table number $$$p_s$$$, then the book at table number $$$p_s$$$ to the table number $$$p_{p_s}$$$, and so on. This gives us an insight: Just sort the cycles, and if you can conveniently switch cycle while you are at it, without incurring any cost, then great, just do it!

Figure 1: How would the cycle looks like.

Another case is that you would come across another cycle when you are sorting the current cycle. Then, you can jump to that cycle, then get back later.

Figure 2: How would the "intersecting" cycles would looks like. As you can see, when lil Johnson was on the $$$[0, 2]$$$ cycle, he can actually hop on the $$$[1, 3]$$$ cycle and sort that cycle, then come back. The operation would look like this.

Pick up $$$p_0$$$ book.
Go to table number $$$1$$$ and replace the book. Note that we are switching to cycle $$$[1, 3]$$$ here.
Go to table number $$$3$$$, replace the book, then go back to table number $$$1$$$ and replace the book. Now we are back to the $$$[0, 2]$$$ cycle.
Go to table number $$$2$$$, replace the book, then go back to $$$0$$$ and place the book down. The entire thing cost $$$8$$$ seconds.

So, we know that we do not have to waste anytime switching to a different cycle, as long as we encounter a vertex of that cycle while traversing.

Therefore, our algorithm for checking whether the answer reached our lower bound would look something like this:

// get the maximum and minimum element of each cycle
vector<pair<int,int>> range(n, {-1, -1});
for(int i = 0; i < n; ++i){
    if (range[i].first != -1) continue;
    pair<int,int> mi = {i, i};
    int j = p[i];
    while(j != i){
        // maximize_range means that it takes in two range, then minimize the left boundary and maximize the right boundary.
        maximize_range(mi, make_pair(j, j));
        j = p[j];
    }
    do{
        range[j] = mi;
        j = p[j];
    } while(j != i);
}

// check if as we are sorting the current cycle, whether we can go outside
pair<int, int> cur = {s, s};
pair<int, int> target = range[s];
while(cur.first > target.first || cur.second < target.second){
    if (cur.first > target.first){
        maximize_range(target, range[--cur.first]);
    }
    else{
        maximize_range(target, range[++cur.second]);
    }
}

if (cur.first == 0 && cur.second == n-1) cout << "Yes\n":
else cout << "No\n":

Explaination of the algorithm:

Consider the "0 wasted movement" range, which is initially $$$[s, s]$$$. As we can reach every vertex inside this range, we also have access to every cycle that each vertex in the range belongs to i.e. we can switch to these cycles while we are sorting the current cycle, without incurring additional time.
Since we can jump to those cycles, we can travel to the minimum and maximum elements of the cycles that these vertices belongs to as well. Just imagine that our range "eat" every cycles of each vertices, which makes it bigger, which allows it to "eat" more. The process ends when it cannot expand any more.

3. Snipe some easy subtask.

From the above code, we can determine how far to the left or right Lil Johnson can move without incurring additional wasted time.

Let's call a range "stretched" if it cannot be expanded further without incurring additional wasted time. For example, $$$p = [2, 3, 4, 1, 0, 5]$$$. The range $$$[1, 1]$$$ is not stretched, because Lil Johnson can expand the range to $$$[1, 3]$$$. Similarly, $$$[1, 3]$$$ is not stretched, because we can expand it to $$$[0, 4]$$$, but $$$[0, 4]$$$ and $$$[5, 5]$$$ are stretched.

When Lil Johnson arrives at the border of the "travellable" range, he can either move left or right and extend the current range at the cost of 2 seconds of wasted walking time (you have to go back too). That is, if the current range is $$$[l, r]$$$, then he can expand it to $$$[l-1, r]$$$ or $$$[l, r+1]$$$.

Let not forget what our goal is. We need to sort the permutation while incurring the fewest cost, which means we need to expand our "travellable" range to $$$[0, n-1]$$$, while ensuring the cost is minimized.

This let us arrive at a range DP solution, featuring three operations: expand to the left, expand to the right, and stretch the current range:

ll dp[n][n]; memset(dp, 63, sizeof dp);
dp[s][s] = 0;
for(int i = n-1; i >= 0; --i) 
    for(int j = i; j <= n; ++j){
        // calculate the stretched range, I omitted the detail of the function for brevity.
        pair<int, int> stretched = stretch_range(make_pair(i, j));
        minimize(dp[stretched.first][stretched.second], dp[i][j]);
        
        if (i > 0) minimize(dp[i-1][j], dp[i][j] + 2);
        if (j + 1 < n) minimize(dp[i][j+1], dp[i][j] + 2);
    }   
}

cout << dp[0][n-1] + ans_lower_bound << "\n";

Note that the number of distinct states in our dp table is $$$(s+1) * (n - s)$$$, therefore, this would easily solve the $$$s = 0, n \leq 10^6$$$ and the $$$n \leq 10^3$$$ subtasks. That is an easy 70 points right there.

4. The last subtask (the only hard one)

Let's call the current range (stretched) we are working with $$$[l_0, r_0]$$$. Let us also assume there exists a cycle whose span $$$[x_1, x_2]$$$ strictly contains the current range, i.e., ($$$x_1 < l_0 \leq r_0 < x_2$$$).

Claim: Consider any "stretched" range $$$[l_x, r_x]$$$, such that it contains $$$[l_0, r_0]$$$, and it also contains a cycle whose span strictly contains $$$[l_0, r_0]$$$. Let's call them "beautiful". Now consider the smallest of them, let's call it $$$[l_1, r_1]$$$. We claim that regardless of which direction you expand $$$[l_0, r_0]$$$, $$$[l_1, r_1]$$$ will be the first "beautiful" range you reach.

Proof:

This means we just need to calculate the fastest way to get to $$$[l_1, r_1]$$$. Observe that since they are the first range in the way that strictly contains $$$[l_0, r_0]$$$, that means that expanding left will not make expanding right faster, and vice versa.

Therefore, it is best to expand in 1 direction until we encounter $$$[l_1, r_1]$$$. So, we can just simulate the process and choose which direction incurs the fewest costs.

If no cycle strictly contains the current range, then the range can only expand independently to the left or right. In this scenario, we calculate the cost of moving entirely to the left and to the right separately, then sum these costs to determine the total time.

Time complexity: $$$O(n)$$$ or $$$O(n * log(n))$$$, depending on the implementation.

My code

Full text and comments »

steveonalex
6 weeks ago
0

Convex Hull Trick, but slightly cooler?

By steveonalex, 4 months ago, In English

I didn't see any documentary and blog on this topic, so I guess it's Codeforce blog time .

Blog in HackMD if you prefer it

Prerequisite:

Convex Hull Trick.

1) The OG Convex Hull Trick:

This blog is not a tutorial to Convex Hull Trick, so I would assume that you guys already know about it before reading this. However, I'll briefly recap for the sake of completeness.

Problem Statement: Given $$$n$$$ lines of the form $$$y = a_i * x + b_i$$$ and $$$q$$$ random queries $$$x_j$$$. For each query, calculate the value $$$max(a_1*x_j + b_1, a_2*x_j + b_2, ..., a_n*x_j + b_n)$$$.

Constraint: $$$n, q \leq 3*10^5$$$, $$$a_i, b_i, x_j \leq 10^9$$$.
Time limit: 1s

This is a well-known problem, and you can find tutorial for this basically everywhere. In short, we will sort all the lines in ascending order of their slope, and remove all of the redundant lines, as shown in my beautiful painting below (A line is redundant if it lies entirely "below" its two adjacent lines). This is solvable in $$$O(n)$$$ using stacks and some geometry.

Figure 1: How a redundant line looks.

From this, we observe that the resulting function is convex (since slopes are sorted). Each slope is optimal for one continuous segment, starting from and ending with its intersection with its two adjacent lines in the hull.

Figure 2: How the convex hull should looks, alongside its intersections.

Once the convex hull is constructed, the problem is basically just binary searching over the intersections of the convex hull to find the optimal line for queried point $$$x_j$$$.

Sample Code

struct CHT{ // casual CHT
    #define Node pair<ll, ll>
    vector<Node> hull, suboptimal;
    vector<double> inter;
 
    double getInter(Node a, Node b){return (double) (b.second - a.second) / (a.first - b.first);}
    ll f(Node s, ll x){return s.first * x + s.second;}
 
    void add(Node x){
        if (hull.empty()) {hull.push_back(x); return;}
        if (hull.back().first == x.first){
            suboptimal.push_back(hull.back());
            hull.pop_back();
            if (inter.size()) inter.pop_back();
        }
        while(hull.size() >= 2){
            double x1 = getInter(hull.back(), x), x2 = inter.back();
            if (x1 >= x2) break;
            suboptimal.push_back(hull.back());
            hull.pop_back(); inter.pop_back();
        }
        if (hull.size()) inter.push_back(getInter(hull.back(), x));
        hull.push_back(x);
    }
 
    ll get_val(ll x){
        if (size() == 0) return -1e18;
        int idx = lower_bound(ALL(inter), x) - inter.begin();
        return f(hull[idx], x);
    }
 
    int size(){return hull.size();}
};

Time complexity: $$$O(n * log_2(n))$$$ for the preprocessing, and $$$O(log_2(n))$$$ for each query.

2) Extended CHT problem:

Problem Statement: Given $$$k$$$ and $$$n$$$ lines of the form $$$y = a_i * x + b_i$$$ and $$$q$$$ random queries of the form $$$x_j$$$. First, we denote $$$c_i = a_i * x_j + b_i$$$. For each query $$$x_j$$$, find the $$$k$$$ largest values of the array $$$c$$$.

Constraint: $$$n, q \leq 3*10^5$$$, $$$k \leq 10$$$, $$$a_i, b_i, x_j \leq 10^9$$$.
Time limit: 5s.

Since the lines in the convex hull are sorted by slope, we observe that the further a line from the queried point, the less relevant it is.

But why is that? Let's consider two adjacent lines $$$(b)$$$ and $$$(c)$$$ to the left of $$$x_j$$$ ($$$(c)$$$ is further from $$$x_j$$$). These two lines intersect at the same point, but because $$$(b)$$$ slope is greater than $$$(c)$$$, the value of $$$(b)$$$ at $$$x_j$$$ ends up being greater.

Figure 3: Illustration of the lines $$$(b)$$$ and $$$(c)$$$ to the left of $$$x_j$$$, and how $$$(b)$$$ is more relevant than $$$(c)$$$.

Thus, we only need to focus on the $$$k$$$ nearest lines from $$$x_j$$$, both to the left and to the right.

Figure 4: How the algorithm may work.

However, there is a flaw to this approach. For example, for $$$k = 2$$$, what if the "redundant line" is actually the second largest line?

Figure 5: How the second largest line might not be on the convex hull.

There is an easy fix! We will keep track of all of the "redundant lines" from our first run of constructing the CHT data structure, and we will use these lines to make a second CHT. So for the previous example, it would look like this.

Figure 6: How the 2-layer CHT would look like.

Then we will do the same thing for the second CHT i.e. brute forcing through the $$$k$$$ nearest lines from the queried point, both to the left and to the right.

Extending to the general case was pretty simple. We can just make $$$k$$$ CHT, with each one using all of the redundant lines from the previous CHT. We know that this is optimal, because on each layer, we only need to go to the left and right at most $$$k$$$ times, and we only need to dive down at most $$$k$$$ layers (Anything on the $$$k+1^{th}$$$ layer is just not needed, since all of the lines on the previous layers are better).

Sample Code


// The implementation is not optimized. I left it as it is for readability.
    
struct CHT{ // casual CHT
    #define Node pair<ll, ll>
    vector<Node> hull, suboptimal;
    vector<double> inter;
 
    double getInter(Node a, Node b){return (double) (b.second - a.second) / (a.first - b.first);}
    ll f(Node s, ll x){return s.first * x + s.second;}
 
    void add(Node x){
        if (hull.empty()) {hull.push_back(x); return;}
        if (hull.back().first == x.first){
            suboptimal.push_back(hull.back());
            hull.pop_back();
            if (inter.size()) inter.pop_back();
        }
        while(hull.size() >= 2){
            double x1 = getInter(hull.back(), x), x2 = inter.back();
            if (x1 >= x2) break;
            suboptimal.push_back(hull.back());
            hull.pop_back(); inter.pop_back();
        }
        if (hull.size()) inter.push_back(getInter(hull.back(), x));
        hull.push_back(x);
    }

    void get_val(ll x, int r, vector<ll> &ans){
        int idx = lower_bound(ALL(inter), x) - inter.begin();
        for(int i = idx - r; i <= idx + r; ++i) if (i >= 0 && i < size()){
            ans.push_back(f(hull[i], x));
        }  
    }

    int size(){return hull.size();}
};


struct CHT_extended{ // mlg super pro vip CHT
    #define Node pair<ll, ll>
    int k;
    vector<CHT> hull_max;

    CHT_extended(int k): k(k){
        for(int i = 0; i < k; ++i) hull_max.push_back(CHT());
    }

    void add(vector<Node> a){
        for(int i = 0; i < k; ++i){ // use the suboptimal lines from the previous CHT run
            sort(ALL(a));
            for(Node j: a) hull_max[i].add(j);
            a = hull_max[i].suboptimal; hull_max[i].suboptimal.clear();
        }
    }

    vector<ll> get(ll x){
        vector<ll> ans; ans.reserve(k * k);
        for(int i = 0; i < k; ++i){ // get O(k^2) lines
            hull_max[i].get_val(x, k - 1, ans);
        }
        if (ans.size() > k) 
            nth_element(ans.begin(), ans.begin() + k, ans.end(), greater<ll>());
        while(ans.size() > k) ans.pop_back();
        return ans; // spit back k best lines in arbitrary order. You can call sort if you want.
    }
};

Time complexity: $$$O(n * k * log_2(n))$$$ for the preprocessing, and $$$O(k^2 + k*log_2(n))$$$ for each query.

The complexity in both the preprocessing and querying could be further optimized, but I'll leave it as an exercise for readers.

Are there any problem that feature this algorithm? Well uhh... I don't know, this is like mythical stuff that you will probably never encounter all your life. But now you have :))

Full text and comments »

geometry, convex hull trick, data structure

steveonalex
4 months ago
5

My solution to 2003F — Turtle and Three Sequences

By steveonalex, 4 months ago, In English

I know this round is from like 2-3 months ago but screw that, let's dig it up because I just solved it recently. Also I cannot see any people who are doing the same thing as me.

1) The problem statement:

Problem link: 2003F — Turtle and Three Sequences

Given two positive integers $$$n$$$, $$$m$$$ ($$$n \geq m$$$), and three sequences $$$a$$$, $$$b$$$, $$$c$$$ of size $$$n$$$. Find a sequence of length $$$m$$$ $$$p_1, p_2, ..., p_m$$$, such that $$$p_1 < p_2 < ... < p_m$$$, and $$$a_{p_1} \leq a_{p_2} \leq ... \leq a_{p_m}$$$, and $$$b_{p_i} \neq b_{p_j}$$$, $$$\forall i \neq j$$$.

Constraint:

$$$n \leq 3000$$$, $$$m \leq 5$$$.
$$$1 \leq a_i, b_i \leq n$$$, $$$1 \leq c_i \leq 10^4$$$.
Time Limit: 3 seconds.

You can read the intended solution in this link: Editorial.

Spoiler:

I would recommend you to try the problem out before scrolling further (or just don't, because the problem is rated 2800, so if you are a fellow blue hardstuck-er then you don't really stand much of a chance anyway).

2) My solution:

Prerequisite:

Fenwick Tree.
Constant optimization skill.

It would be best if we start from something manageable first. $$$m \leq 2$$$ is pretty simple, you literally just have to brute, it literally cannot get any easier than this, so let's move on.

For $$$m = 3$$$, it is more challenging, but you can solve it by iterating through all pair $$$(i, j)$$$ $$$(i < j)$$$, and for each pair, find the best index $$$k$$$ to the left of $$$j$$$ in $$$O(n)$$$, so the total complexity is $$$O(n^3)$$$. To optimize this, observe that we don't really need to keep track of that many $$$k$$$. We only need to maintain an array $$$suff_j$$$, containing up to $$$m$$$ candidate tuples $$$(a_k, b_k, c_k)$$$ with the largest $$$c_k$$$, such that $$$k > j$$$, $$$a_k \geq a_j$$$, and all $$$b_k$$$ are distinct (because at worst, we only have to skip $$$m-1$$$ candidates of the $$$m$$$ tuples). We can precalculate $$$suff_j$$$ in $$$O(n^2)$$$, so the final complexity for $$$m = 3$$$ is $$$O(n^2*k)$$$.

$$$m = 4$$$ is pretty much the same thing, except you have to also maintain the array $$$pref_i$$$, which contains up to $$$m$$$ tuples $$$(a_k, b_k, c_k)$$$ with the largest $$$c_k$$$, such that $$$k < i$$$, $$$a_k \leq a_i$$$, and all $$$b_k$$$ are distinct, then iterate through all pair $$$(i, j)$$$ just like the previous algorithm and iterate through all the candidate tuples in $$$suff_j$$$ and $$$pref_i$$$. The complexity of this is $$$O(n^2*k^2)$$$.

$$$m = 5$$$ is pretty tough, however. Previously, calculating $$$m^{th}$$$ best candidate to the left of $$$i$$$ and to the right of $$$j$$$ is pretty easy, but how do we take it a step further? Iterating through all the possible pairs of candidates (not just candidates, pairs of candidates) to the left of $$$i$$$ or to the right of $$$j$$$ is pretty infeasible, so we only have one choice: maintaining the array $$$between_{i, j}$$$, containing up to $$$m$$$ tuples with the largest $$$c_k$$$, such that $$$i < k < j$$$, $$$a_i \leq a_k \leq a_j$$$, and all $$$b_k$$$ are distinct.

We can use data structures used to solve range maximum queries like Fenwick Tree to calculate this array. Here's how: for each $$$i$$$, iterate $$$j$$$ from $$$i+1$$$ to $$$n$$$. For each $$$j$$$, get $$$m$$$ candidate tuples such that $$$a_k \leq a_j$$$, then update the Fenwick Tree with the tuple $$$(a_j, b_j, c_j)$$$. Once you obtained the three arrays, you just do the same this as the previous subtask, except you also iterate through the $$$between_{i, j}$$$. The complexity of this is $$$O(n^2*k*(k^2+log(n))$$$.

As the complexity might suggest, you indeed have to go pretty crazy on the constant optimization here (probably anything other than Fenwick Tree won't even pass, who knows, my program runs in 2.93s, which is like one Minecraft tick away from getting TLE).

Can we extend this for $$$m = 6$$$? Nope, I think I've pushed this idea to its limit already. Which is scary to think about, because author set the constraint on $$$m$$$ precisely equal to $$$5$$$, while the intended solution can go beyond that. Does the author know about this solution, so they set $$$m = 5$$$? And if yes, why don't they just write this solution into the editorial? Guess we'll never know.

Full text and comments »

tutorial, awesome, informative, inspirational

steveonalex
4 months ago
5

[Tutorial] Divide and Conquer Offline Query — A Niche Way to solve Static Range Query

By steveonalex, 19 months ago, In English

Hi guys, this is my first blog on Codeforces. So if there were any mistakes or suggestions, feel free to correct me down in the comment section. Anyway, I discovered a nice way to solve static range query problems using "Divide and conquer", and I'm eager to share it with you guys.

Pre-requisites:
• Prefix Sum.

Problem 1:

Given an array $$$A$$$ of $$$N (N \leq 10^{5})$$$ integers, your task is to answer $$$q (q \leq 10^{5})$$$ queries in the form: what is the minimum value in the range $$$[l, r]$$$?

For now, let's forget about Segment Tree, Square Decomposition, Sparse Table and such. There's a simple way to solve this problem without any use of these fancy data structure.

First, let's start with $$$L_{0} = 1$$$, $$$R_{0} = n$$$, and $$$M_{0} = \left\lfloor { \frac{L_{0} + R_{0}}{2} } \right\rfloor$$$. Let's just assume that every query satisfy $$$L_{0} \leq l \leq M_{0} < r \leq R_{0}$$$. We maintain two prefix sum arrays:
• $$$X[i] = min(A[i], A[i+1], ..., A[M_{0}-1], A[M_{0}])$$$
• $$$Y[i] = min(A[M_{0}+1], A[M_{0}+2], ..., A[i-1], A[i])$$$

The answer to the query $$$ [ l_{0} , r_{0} ] $$$ is simply $$$min(X[l_{0}], Y[r_{0}])$$$. But what about those queries that doesn't satisfy the aforementioned condition? Well we can recursively do the same thing to $$$L_{1} = L_{0}, R_{1} = M_{0}$$$, and $$$L_{2} = M_{0} + 1, R_{2} = R_{0}$$$, hence the name "Divide and conquer". The recursive tree is $$$log N$$$ layers deep, each query exists in no more than $$$log N$$$ layers, and in each layer you do $$$O(N)$$$ operation. Therefore this algorithm runs in $$$O((N + q) * log N)$$$, and $$$O(N + q)$$$ memory.

So... Why on earth should I use it?

While this technique has practically the same complexity as Segment Tree, there is an interesting property: You only perform the "combine" operation once per query. Here's a basic example to show how this property can be exploited.

Problem 2:

Define the cost of a set the product of its elements. Given an array $$$A$$$ of $$$N (N \leq 2*10^{5})$$$ integers and a positive integer $$$k$$$ ($$$k \leq 20$$$). Define $$$g(l, r, k)$$$ as sum of cost of all subsets of size $$$k$$$ in the set {$$$ A[l], A[l+1], ..., A[r-1], A[r] $$$}. your task is to answer $$$q (q \leq 5 * 10^{5})$$$ queries in the form: what is $$$g(l, r, k)$$$ modulo $$$10^{9} + 69$$$?.

Naive Idea

Now I'll assume that you've read the naive idea above (you should read it). Notice how combining the value of two ranges runs in $$$O(k^2)$$$. However, if one of the two ranges has the length of $$$1$$$, then they can be combined in $$$O(k)$$$. This means a prefix-sum can be constructed in $$$O(N * k)$$$.
Why is this important? Let's not forget that in the naive Segment Tree idea, the bottle-neck is the convolution calculation, and we wish to reduce that in exchange for less expensive operations, which is what our aforementioned divide & conquer technique can help with, since you only do the expensive "combine" operation once per query. And besides, unlike Segment Tree, you can calculate the answer right away using the formula $$$\sum_{t=1}^k X[l][t] * Y[r][k - t]$$$

This will runs in $$$O(N * log N * k + q * (k + log N))$$$, and $$$O(N + q + k)$$$ memory, which is much better than the Segment Tree idea.

steveonalex's blog

1. Abridged Statement:

2. The lower bound of the answer

3. Snipe some easy subtask.

4. The last subtask (the only hard one)

1) The OG Convex Hull Trick:

2) Extended CHT problem:

1) The problem statement:

2) My solution:

Problem 1:

So... Why on earth should I use it?

Problem 2:

Related problem: