Attempting to Extend Segment Trees to Arbitrary Subsets

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	168
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	159
5	atcoder_official	156
6	adamant	153
6	djm03178	153
8	luogu_official	149
9	awoo	147
10	TheScrasse	146

Throughout the last several weeks I have tried to invent, and tackle, a generalization of segment trees (although now it doesn't fit to name it "segment trees"... see for yourself). The results are very few (and not good, if you ask me), but the problem was very appealing and fun to think about.

The purpose of this blogpost is to share my experience with this problem, in order to strike curiousity in at least a few people, and also to let bright-minded users tackle this problem together, and possibly achieve better results. Since I want to share my experience, I will also put some possibly challenging parts under spoilers, if you want to attempt it yourself.

So, there is no tl;dr. Sorry :)

The Motivation and The Problem

As the title suggests, the motivation was from inspection of segment trees. We can define the following problem:

Given $$$n$$$ elements, initially $$$0$$$, and $$$m$$$ subsets $$$\forall i \in [m], S_i \subseteq [n]$$$ (where $$$[n] = \lbrace 0, \ldots, n-1 \rbrace$$$), we need to support the following:

Update $$$(i, x)$$$: add $$$x$$$ to $$$a_i$$$.
Query $$$v$$$: return $$$\sum_{i \in S_v} a_i$$$.

Segment trees define the $$$m = \binom{n+1}{2}$$$ subsets implicitly — as ranges. For some "magical" reason, when we have ranges, we can define $$$n' = O(n)$$$ elements, and subsets $$$S_i, T_i \subseteq [n']$$$, such that an update $$$(v, x)$$$ requires adding $$$x$$$ to all $$$a_j, j \in S_v$$$, and a query $$$v$$$ is equal to $$$\sum_{j \in T_v} a_j$$$ — but with the bonus that $$$|S_i|, |T_i|$$$ are all $$$O(\log n)$$$, so all operations can be done in $$$O(\log n)$$$.

The construction is just setting $$$n'$$$ as the number of nodes in the segment tree, setting $$$S_i$$$ for every element as the set of $$$O(\log n)$$$ ranges that include it, and setting $$$T_i$$$ for every range as the set of ranges that cover it.

I wanted to figure out whether there is a generalization that allows such optimizations for problems where we don't have ranges. Analysis of cases where the subsets are too many and implicit, is too difficult. So the problem is as above, where the subsets $$$S_i$$$ are all given explicitly, and then queries and updates are made. So the input size is $$$O(n + \sum |S_i|)$$$.

Interpreting as a Graph

We can define a similar problem, and see that it's equivalent (up to a constant): Given an undirected graph, each node has a value $$$a_i$$$, initially $$$0$$$. An update requires us to add to a node's value, and a query $$$v$$$ asks for the sum of values of all neighbors of $$$v$$$.

The reduction from this problem to the aforementioned problem is simple — just define every subset as the set of neighbors of a node. In the opposite direction, given subsets we can define a bipartite graph of size $$$n + m$$$, where each of the $$$m$$$ vertices on one side, has neighbors on the other side corresponding to each of the $$$m$$$ subsets.

I found that, by interpreting the problem as a graph, I was able to make progress.

Solving for a tree

Suppose the graph is a tree. Can we solve this problem efficiently? You can attempt this yourself before reading.

Found Complexity

Solution

That was awesome, let's try a general graph

Found Complexity

Solution

Let's analyze what happened here

The tree solution is significantly better than the general graph solution. Applying the general solution to a tree gives us $$$O(\sqrt{V})$$$ instead of $$$O(1)$$$. So what was this magic in trees that allowed us to cheat like this?

The magic boils down to relating vertices and their parents. Let's generalize this to any graph:

Let's direct every edge in our graph in some direction. Define for every vertex, the vertices to which it has an outgoing edge, as $$$T_v$$$. Think of it as $$$v$$$ "parents". The sum of neighbors of a vertex can be split to the sum of outgoing neighbors, and the sum of incoming neighbors. Now, the tree solution is generalizable:

Maintain $$$a_v$$$ for every vertex as usual, and maintain $$$b_v$$$ for every vertex, as the sum of its incoming neighbors. Then on an update $$$(v, x)$$$ we add $$$x$$$ to $$$a_v$$$, and to $$$b_u$$$ for all $$$u \in T_v$$$. On a query $$$v$$$, we compute $$$b_v$$$ + $$$\sum_{u\in T_v} a_u$$$.

The time complexity becomes $$$O(\max |T_i|)$$$. While this result doesn't improve our situation in dense graphs, it can be useful in different ones — for example, trees.

Minimizing $$$\max |T_i|$$$

Given a graph, let's say we want to find the minimum value of $$$\max |T_i|$$$ among all possible ways to direct the edges. Turns out that it is computable in polynomial time!

Hint

Solution

Okay, so while this is polynomial time, we would probably be happier with a better

Rev.	By	When	Δ	Comment
en17	Noam527	2022-06-12 23:51:32	0	(published)
en16	Noam527	2022-06-12 23:50:46	3484	Tiny change: 'nteresting right now, but I do' -> 'nteresting, but I do'
en15	Noam527	2022-06-12 22:50:21	4507	Tiny change: 'pressing $O \left( \f' -> 'pressing $\Omega \left( \f'
en14	Noam527	2022-06-09 22:25:20	124
en13	Noam527	2022-06-05 14:09:50	216	Tiny change: ' you want.' -> ' you want.\n\n'
en12	Noam527	2022-06-05 13:41:06	3258
en11	Noam527	2022-06-05 13:08:49	1115
en10	Noam527	2022-06-05 11:59:13	168	Tiny change: ' is:\n\n$$\left\lcei' -> ' is:\n\n$$d(G) \coloneqq \left\lcei'
en9	Noam527	2022-06-05 01:15:15	25	Tiny change: 'er upto a constant factor of' -> 'er upto a factor of'
en8	Noam527	2022-06-05 01:12:15	197	Tiny change: 'ices $S = {v_1, \ldots, v_m}$ then it' -> 'ices $S = \{v_1, \ldots, v_m\}$ then it'
en7	Noam527	2022-06-05 01:00:35	2316
en6	Noam527	2022-06-04 02:55:54	2592	Tiny change: 'n$$\lceil \rceil$$\' -> 'n$$\lceil \max_{\empty \neq S \subseteq V} \rceil$$\'
en5	Noam527	2022-06-04 01:20:52	909	Tiny change: 'cs (well, I don't c' -> 'cs (well, at least I don't c'
en4	Noam527	2022-06-04 00:49:42	4705	Tiny change: 'very good.\n</spoil' -> 'very good. Generally this can get up to $O(V)$.\n</spoil'
en3	Noam527	2022-06-03 23:28:23	2738	Tiny change: ' $\forall i \in [m]$' -> ' $\forall 0 \leq i < m, S_i \subseq [n]$'
en2	Noam527	2022-06-03 22:48:32	4
en1	Noam527	2022-06-03 22:48:19	700	Initial revision (saved to drafts)

The Motivation and The Problem

Interpreting as a Graph

Solving for a tree

That was awesome, let's try a general graph

Let's analyze what happened here

Minimizing $$$\max |T_i|$$$

History