A general approach to solve subree distinct values queries!

Revision en7, by Haidora, 2024-05-01 11:21:56

Hello Codeforces ,today I want to introduce a technique that can solve some problems related to distinct values queries on a subtree .The only way I know to solve array distinct values queries with updates is using MO's algorithm .The approach I will introduce can solve the subtree version of this problem online with updates in $$$O(Nlog^2(N))$$$.So let's get started.

Prerequisites:

1.Heavy light decomposition (it's enough to know how it works and how to implement it).

2.Lowest common ancestor.

3.Euler tours (not necessary for this specific problem but I will use it to prove some fact).

Problem Statement:

You are given a tree rooted at vertex 1. You have to perform two types of queries:

1 $$$i val$$$ : update the value of node $$$i$$$ to $$$val$$$.

2 $$$u$$$ : output the sum of distinct values in the subtree of node $$$u$$$.

Solution:

Whenever you encounter such a strange problem try to make a shift in perspective .What is the first thing you thought when you have seen this problem? Euler tour? Persistent Segment trees? Isn't it similar to the array distinct values queries? But wait ,I just said it's hard to solve this problem. Didn't I?

Can we change the problem into path queries problem?

Wait what are the nodes that are affected by an update?

I turns out that the affected nodes are all in a range on the ancestry path of the updated node .Don't it seem like a standard HLD problem?

Not yet .Here comes the idea .Consider our new problem:

We want to update the answers in a range on the ancestry path of a node .So ,where does this range end? If we can figure out how to do this then the problem becomes a standard path update and queries problem that can be easily solved using HLD .Now, here is the fun part.

Isn't it obvious the endpoint is the first node on the ancestry path that doesn't have an occurrence of $$$val$$$?

How can we find such a node? well this is the same as finding the first node on the path that have an occurrence then going down by one edge.

It turns out that this node is the first node that have the closest occurrence of $$$val$$$ in its subtree. But what does it mean for a node to be be the closest?

Well ,let's consider this node to be the $$$lca$$$ of the updated node and the closet node since the $$$lca$$$ is the first node on the ancestry path from the updated node to the root .Now the problem boils down to finding the maximum depth $$$lca$$$ among all possible $$$lca$$$ s ,but how can we do this fast?

Well, if we maintain a map of sets that stores for each number the $$$time in$$$ s in DFS order for the nodes with a value equal to that number then the node we are searching for is the node with maximum depth among $$$lca($$$updated node ,the first node with time in bigger than the $$$time in$$$ of the updated node $$$)$$$ and $$$lca($$$updated node ,the first node with time in smaller than the $$$time in$$$ of the updated node $$$)$$$ .But why is this true in general?

Proof of the above fact:

Let's consider the case of (the first node with time in bigger than the time in of the updated node(let's call this node $$$u$$$)) then the second case will go through an identical reasoning.

Consider taking a node with a bigger time in than $$$u$$$ (let's call it $$$v$$$).

Let $$$x$$$ be the first node on the ancestry path then:

If a node $$$d$$$ is the closest node to $$$i$$$ then $$$d$$$ must be in the subtree of $$$x$$$. Because $$$in[i]<in[u]<in[v]$$$ if $$$v$$$ was in the subtree of $$$x$$$ and $$$i$$$ was in the subtree of $$$x$$$ then also $$$u$$$ is in the subtree of $$$x$$$ .Thus taking $$$u$$$ is at least as optimal as taking node v.

Note that there is a corner case when there is at least one occurrence of $$$val$$$ in the subtree of $$$i$$$ we don't update anything.

Also note that when we update a node we should remove the old value and add the new value but this is can be done in the same way the adding process is done thus I leave it as an exercise to the reader to check this.

A recap for what we did:

You could reflect on how we transformed a subtree queries problem into a path queries problem which turns out to be much easier if we think that way. Then we used a fact that facilitates using the bruteforce approach by finding optimal values to try .

Implementation:

There are many implementation details that I skipped above but if you are curious here is my code:

https://codeforces.me/gym/521454/submission/258943402

I couldn't find any link for a problem that uses this technique so I made one .

link: [problem:521454A]

https://codeforces.me/contestInvitation/02c14ae6e9e06b8f0c6a3ce7eb957a1f3ba2f865

You can also optimize heavy light decomposition see https://codeforces.me/blog/entry/104997.

Thank you for reading my blog.

I spent a lot of time preparing this so I hope you loved it.

P.S. This is my first educational blog .Sorry for long text :) .

Tags hld, data structures, trees

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en14 English Haidora 2024-05-01 21:49:55 1 Tiny change: ' resources.' -> ' resources .'
en13 English Haidora 2024-05-01 16:42:24 2 Tiny change: '\n\n**Note** that th' -> '\n\n**Note :** that th'
en12 English Haidora 2024-05-01 15:06:42 4
en11 English Haidora 2024-05-01 14:26:30 355
en10 English Haidora 2024-05-01 12:13:53 6346 Tiny change: '\n...\n```\n#include' -> '\n...\n```c++\n#include'
en9 English Haidora 2024-05-01 11:34:08 98
en8 English Haidora 2024-05-01 11:27:03 4 Tiny change: 'tes in $O(Nlog^2(N))$' -> 'tes in $O((N+Q)log^2(N))$'
en7 English Haidora 2024-05-01 11:21:56 48
en6 English Haidora 2024-05-01 11:06:09 85
en5 English Haidora 2024-05-01 10:44:27 2
en4 English Haidora 2024-05-01 10:43:39 1 Tiny change: 'ed node$)$.But why i' -> 'ed node$)$ .But why i'
en3 English Haidora 2024-05-01 10:39:32 40
en2 English Haidora 2024-05-01 10:36:24 790 Tiny change: 'bles $lca$s ,but how' -> 'bles $lca$ s ,but how' (published)
en1 English Haidora 2024-05-01 10:08:59 4269 Initial revision (saved to drafts)