cjtoribio's blog

By cjtoribio, history, 7 years ago, In English

Motivation & Problem

Recently saw this problem. For some this problem might seem like a segment tree problem and it is indeed one. However this problem and others where segment tree does not apply can be solved using another approach. I will rephrase the problem in a simpler way. We want a data structure capable of doing three main update-operations and some sort of query. The three modify operations are:

add: Add an element to the set.

remove: Remove an element from the set.

updateAll: This one normally changes in this case subtract X from ALL the elements. For this technique it is completely required that the update is done to ALL the values in the set equally.

And also for this problem in particular we may need one query:

getMin: Give me the smallest number in the set.

Observing this operations we want to handle, SegmentTree seems legit except for the remove, but we can wave it around by assigning INF to the position we want to remove. However if we don't have the updateAll then we can just use a MULTISET. And in fact if the remove is only to the minimum element then we can use a HEAP. However the four operations without any "workaround" can only be managed by a DynamicSegmentTree which in fact exists but is like using a bazooka to kill a simple mosquito.

Interface of the Data Structure

struct VeniceSet {
	void add(int);
	void remove(int);
	void updateAll(int); 
	int getMin(); // custom for this problem
	int size();
};

Usage of this methods

So imaging we have this four operations, how can we tackle this problem with it. Well i will illustrate it with a simple code in C++ which i hope is self explanatory.

VeniceSet mySet;
for (int i = 0; i < N; ++i) {
	mySet.add(V[i]);
	mySet.updateAll(T[i]); // decrease all by T[i]
	int total = T[i] * mySet.size(); // we subtracted T[i] from all elements
	
	// in fact some elements were already less than T[i]. So we probbaly are counting 
	// more than what we really subtracted. So we look for all those elements
	while (mySet.getMin() < 0) {
		// get the negative number which we really did not subtracted T[i]
		int toLow = mySet.getMin(); 
		
		// remove from total the amount we over counted
		total -= abs(toLow);
		
		// remove it from the set since I will never be able to substract from it again
		mySet.remove(toLow);
	}
	cout << total << endl;
}
cout << endl;

Implementation of the Technique

As the title says I don't call it a data structure itself but this is a technique almost always backed up by a data structure. And now I am going to explain why I call it Venice technique. Imagine you have an empty land and the government can make queries of the following type: * Make a building with A floors. * Remove a building with B floors. * Remove C floors from all the buildings. (A lot of buildings can be vanished) * Which is the smallest standing building. (Obviously buildings which are already banished don't count)

The operations 1,2 and 4 seems very easy with a set, but the 3 is very cost effective probably O(N) so you might need a lot of workers. But what if instead of removing C floors we just fill the streets with enough water (as in venice) to cover up the first C floors of all the buildings :O. Well that seems like cheating but at least those floor are now vanished :). So in order to do that we apart from the SET we can maintain a global variable which is the water level. so in fact if we have an element and want to know the number of floors it has we can just do height - water_level and in fact after water level is for example 80. If we want to make a building of 3 floors we must make it of 83 floors so that it can tough the land. So the implementation of this technique might look like this.

struct VeniceSet {
	multiset<int> S;
	int water_level = 0;
	void add(int v) {
		S.insert(v + water_level);
	}
	void remove(int v) {
		S.erase(S.find(v + water_level));
	}
	void updateAll(int v) {
		water_level += v;
	}
	int getMin() {
		return *S.begin() - water_level;
	}
	int size() {
		return S.size();
	}
};

Variations

Trick With Trie

Probably you have seen a problem of having a set of numbers and handle queries such as what is the greatest XOR than can be achieved by using one number of the set and the given number. This can be handled with a BinaryTrie. However if we add the operation XOR all the numbers in the set this would be very heavy. But if we maintain a global water_level that would XOR any number in and any number out, then it would virtually be XORING all the numbers in the set. Don't have an example problem yet, but I know I have used it, so any is welcome.

So as my experience is, this technique is a lightweight modification that can be applied to all DataStructures that support (add, remove and query) and only need the operation (updateAll).

Other Approaches For This Problem

Binary Search: This is very fast and simple, however is an offline solution. (Not apply for other operations such as sort)

Event Processing: Easy to code and fast but still offline and does not apply for other operations such as XOR.

Dynamic Segment Tree: A little slow in practice, same complexity in theory, and requires heavy code. This is online but requires heavy code. (Dynamic means can insert and delete nodes from the segment tree). Does not apply to other variations of the problem such as XOR.

Implicit Segment Tree: Relativly easy to code, but theoretical complexity of Nlog(MAXVAL) and still requires this technique to work. (Does not apply to other variations such as XOR)

Example Problems

Full text and comments »

  • Vote: I like it
  • +80
  • Vote: I do not like it

By cjtoribio, history, 8 years ago, In English

Block Tree

Story

I was trying to solve this problem from the gym and I struggled to find the solution. Finally, I came up with a very interesting data structure capable of handling any subtree update and really don't know if someone else has seen it before, but I will post it here since I could not find its name (if it has) in google. My solution was able to solve the problem with O(N) memory, O(N) in creation, per query. When I saw the editorial I saw that it had another interesting approach with an update buffer which solved the problem in , for my surprise my solution was summing the creation and queries, and I saw conveniently the author used Q <  = 104 so probably the author didn't know about my approach.

After analyzing my approach and reading one of the comments in the same problem, I saw that the problem could be also solvable using Square Root Decomposition in the Euler Tour Technique. Then I analyzed my approach and it seems that they are 90% the same, but the difference is in how the decomposition is stored. As some can see the relationship between KMP and AhoCorasick (Matching in Array — Matching in Tree), I see this as (Sqrt Decomposition Array — Sqrt Decomposition in a Tree). So, after posting this structure I search in what cases is my data structure better than the Euler Trick, then I thought of a new feature that can be implemented. I found it, it can dynamically change (be cut, and joined). Well, I probably have bored with this story, so let's go to the point.

Update — After posting this algorithm I received a comment with the name of this structure, it is called block-tree for those who don't know. I tried to find something on the internet but with no success.

Construction

The idea is not very hard, first do a dfs to compute subtree size and depth of each node. Order all nodes by depth in reverse order, this can be done in O(N) using counting sort. We will now create what I called fragment, we will collect small tree-like portions of the tree and assign them to fragments. The subtree size will be modified in each iteration it will count how many subtree nodes are not inside assigned to a fragment. So, each time we start with a node we shall update its subtree size, after that iterate its children again and count how many children are not assigned to a fragment, after this count reaches , we create a special node and assign the children to this new node and remove the children from the previous parent.

for(int u : orderedNodes){
    SZ[u] = 1;
    for(int v : adj[u]){
        SZ[u] += SZ[v];
    }
    newAdj = []
    temp = []
    spNodes = []
    int sum = 0;
    for(int v : adj[u]){
        if( sum > sqrt(N) ){
            int sp = createSpecialNode(u, temp);
            spNodes.push_back(sp);
            temp = [];
            SZ[u] -= sum;
            sum = 0;
        }
    }
    adj[u] = concat(temp, spNodes);
}

Special Node creation

After this I will end up with groups.

Built Fragmented Tree

here each special node needs to be able to answer any query or update in O(1). Since the fragment is very small if O(1) is not possible, any other structure used in that subtree will be relatively efficient since it will be or , although the time constraint will be very tight.

So, after this decomposition, we create a new adjacency list adjacencyList with these new nodes in O(N), and there will be extra nodes. After that another adjacency list subAdjacencyList will be created in with the tree only consisting of the special nodes.

Proof of correct segmentation

I don't have a formal proof for correct segmentation, but in the construction above the ONLY way that I create a segment is only if by adding the subtree to the list it goes over , and since the two groups of subtrees I am adding are less than (if not they would have merged before) then each time I create a fragment it will be of size since ALL fragments have size then the total number of fragments will be

Unable to parse markup [type=CF_TEX]

. So segmentation is proven. If I am wrong in this informal proof, I am open to comments.

Query

After the construction each query of subtree can be easily answered will be answered with a specialized dfs. The dfs will start on u and do one of two. If the node is special, then query the fragment in O(1) and recursively add the answers returned by dfs(specialChild) using the subAdjacencyList. If the node is non-special node, then apply any pending update in the fragment this node belong and query each individual node recursively using adjacencyList. It is not hard to prove that the fragment u belongs will be updated completely in [as lazy propagation] then it will visit at most nodes in that same fragment and then it will query at most all the fragments. This leads to 3 time leaving a complexity of . This code is something like this:

// G         : fragment u belongs
// leader()  : returns the leader of the fragment
Long getSum(int u){
	if(leader(u) == u){
                int fragId = fragmentOf[u];
		Long w = frags[ fragId ].queryFragment();
		for(int g : subAdjacencyList[ fragId ]){
			w += getSum( frags[g].leader );
		}
		return w;
	}else{
		frags[ G[u] ].updateAllNodesIfPendingChanges();
		Long w = u >= N ? 0 :V[ L[u] ] ;
		for(int v : adjacencyList[u]){
			w += getSum(v);
		}
		return w;
	}
}

Update

If the update is in one subtree it can be done similar as the Query function with lazy updates in the fragments and active updates in nodes. If the update is in all the nodes that are at distance L from the root (as in this problem), it is slightly different. For that we only need to iterate each fragment and update a value per level we have in that fragment (at most of size ) like this valueInLevel[L - leader.level] and update the value representing the whole fragment, in this case fragmentSum +  = nodesInLevel[L - leader.level]. this will be done in O(1) time, for each fragment. This later update will be seen like this.

void update(int LVL, Long v){
	V[LVL] += v;
	for(Fragment &f : frags){
		f.update(LVL, v);
	}
}

Cut/Join

Everything above is per query, cut/join in a node is not that hard to do. For cut we will only split one fragment and add a new parentless special node. And for join we will assign a parent to that special node. This sounds as O(1), but since we need to update the old(in cut) or new(in join) parent of the cut node this will probably cost . This sounds amazing, but there is one problem, each cut creates one special node. and the complexity of this whole structure is O(specialNodeCount) for query and update. So probably we will lose the efficiency with many cuts. To solve this, we take advantage of the O(N) creation time. Each time the whole tree has been cut times, we will rebuild the tree in O(N), so that it will recover again the optimal amount of special nodes.

Notes:

  • The whole code can be seen in my submission here or here to the problem. Any tip or something wrong in this post, just leave a comment..
  • Was said in comments that mugurelionut paper was the same, i thought that but then i read it carefully and they solve different problems.
  • It seems there are people that knows this, waiting for some sources.

Full text and comments »

  • Vote: I like it
  • +115
  • Vote: I do not like it