How to solve this?
You have queries of the following three types:
$$$add(i, v)$$$: add an element with value $$$v$$$ at index $$$i$$$.
$$$query(l, r)$$$: count the number of distinct values of elements on the range from $$$l$$$ to $$$r$$$ (inclusive).
$$$remove(i, v)$$$: remove one of the elements with value $$$v$$$ from the index $$$i$$$.
Notes:
let $$$Q$$$ be the number of queries of first and the third types, and $$$Q'$$$ be the number of queries of the second type, $$$1 \le Q \le 10^5$$$, $$$1 \le Q' \le Q\space log\space Q$$$.
$$$1 \le v \le 10^5$$$.
$$$0 \le i, l, r \le 10^8\space\space(l \le r)$$$.
in $$$add$$$ queries some index might have multiple elements at the same time and some may share the same value.
in $$$remove$$$ queries it is guaranteed that there will be at least one value at index $$$i$$$ which equals $$$v$$$.
$$$query$$$ queries should be preformed online, but the the other two types can be preprocessed if needed.
notice the unusual constrains over $$$l$$$, $$$r$$$ and $$$i$$$.
Is there is any way to do this in $$$O(log\space n)$$$ time for the queries of the second type and $$$O(log^2 n)$$$ or faster for the first the third types?
What is the limit on a[i]?
If a[i] is in the hundreds or so, then you can just iterate from 1 to the limit of a[i], and then binary search whether it's within the range.
If you can't, my next thought would be to use a segment tree to answer queries. A data structure called "merge-sort tree" (which is similar to a segment tree) stores a subarray of information (requires n log n memory). At this point, it is possible to use binary search to answer queries, or possible to use fractional cascading as an alternative (see Fractional Cascading ).
For remove queries, it is much harder as you would have to remove that number from log n subarrays. It might just be better to use a boolean array to check if that number has been used.
These are some of the ideas that I have, but they're not definitive. Do you know what the time limit of this problem is in? Normally, the size of the array would not be so large.
All the constrains are listed in the notes.
The time limit is about 2 seconds.
Can you post a link to the problem ?
It is not a problem in itself.
This is what I was trying to solve: Problem A.
The best algorithm I can come up with is $$$\mathcal{O}(\log^2 n)$$$ per query (which should still pass).
First, compress indices so that we have exactly one add and exactly one remove operation at every index. Say the value that is added at position $$$i$$$ is $$$v_i$$$.
Call a value active if it has been added but not deleted yet. Maintain an array where at position $$$i$$$ you have $$$-1$$$ if it is not active, and otherwise the next position $$$j$$$ which is active and has $$$v_i = v_j$$$. You can maintain these values in $$$\mathcal{O}(\log n)$$$ per query.
Now, you can count the number of distinct active values $$$v_i$$$ in range $$$[a, b]$$$ by counting the number of values in $$$[a, b]$$$ that are greater than $$$b$$$.
To do this, maintain a segment tree, and at every segment tree node a indexed set of the current values below. You can count the number of values in $$$[a, b]$$$ greater than $$$b$$$ in $$$\log n$$$ queries of the type how many values in this set are greater than $$$b$$$ which you can answer in $$$\log n$$$ time each, for $$$\mathcal{O}(\log^2 n)$$$ time per query.
Well I thought of something similar, but the problem with this (and the reason why I asked for a $$$O(log\space n)$$$ solution) is that I do the queries of the second type during a binary search which makes it $$$O(log^3 n)$$$ (I re-worked the constrains so that it is more clear now).
I also couldn't find a way to compress the indices so that exactly one add and remove operations happens at each index even if the $$$query$$$ queries were offline, because we will have to map them to the correct new indices (i.e. we have to extend or shrink the ranges of the queries) which was hard for me. Can you describe a way to do this compression?
And can we replace the segment tree of indexed set in each node with another data structure that can do the same queries (how many values are greater than $$$b$$$) faster, and also support the same updates (even if the updates would be slower like $$$O(log^2 n)$$$)?
Here's code for compressing the indices. It's basically just a bunch of binary searching. The second value in comp represents if we have added at that position and if we have deleted at that position.
I didn't think about it this way at all!! With this I can also do the compression for the queries online, right?
I can store only the $$$add$$$ and $$$remove$$$ queries in
comp
and do a binary search to find where $$$l$$$ and $$$r$$$ should be at each query. (??)Yes, you only need to know where the add operations will be.