[Tutorial] Finding N bits using O(N / log(N)) sums

Правка en17, от SlavicG, 2022-07-29 12:40:12

Introduction

Hi everyone, because we noticed there was no good tutorial in English explaining this idea, KrowSavcik and I have decided to write a tutorial blog on it!

This idea finds the values of each element from a binary array using $$$O(N / log(N))$$$ sum queries, more specifically it solves problems of the type:

Given a binary array $$$b$$$ (an array consisting only of zeroes and ones) of $$$n$$$ elements, you can ask queries of the following format:

What is the sum of the subsequence of the values on positions: $$$p_1, p_2, ..., p_k$$$. In other words you will give the subsequence $$$p_1, p_2, ..., p_k$$$ to the interaction and it will return you the value of ($$$b_{p_1} + b_{p_2} + ... + b_{p_k}$$$).

And after asking $$$O(N / log(N))$$$ queries we should be able to tell the value of each index.

The technique was mentioned here too but it was not as detailed and not easy to find.

Concept

The idea is to use a divide and conquer-like approach. Because it's not an intuitive concept, firstly I will have to add some simple notations, so it will be easier to understand it.

  • $$$x_i$$$ — refers to the position $$$i$$$
  • $$$A$$$ — any capital letter (except S) refers to a set of points $$$x_i$$$
  • $$$v_{A} = \sum b_{x_i}$$$ for $$$x_i \in A$$$
  • $$$k$$$ — the layer we are currently considering
  • $$$S_i$$$ — Set of queries

Also you should keep in mind that indices are enumerated from 0.

Also keep in mind that even though it's said the queries are used, we actually only query at the end, and we build up our set of queries first for multiple layers first.

As it is a divide and conquer-like approach we will work with layers. Let's say that for layer $$$k$$$ we need $$$2^k$$$ queries and that from it we can get the value of $$$f_k$$$ elements. Then $$$f_{k+1} = 2 \cdot f_k + 2^k - 1$$$.

First of all, let's set our base case, which is $$$k = 0$$$. So, for $$$k = 0, f_0 = 1$$$ and the query set will only be {$$$0$$$}, so we know a single element using a single query.

The block $$$f_{k+1}$$$ is formed from $$$2$$$ blocks of size $$$f_k$$$ and $$$2^k -1$$$ additional elements in this order. Let's say $$$k_1$$$ represents the first block of $$$f_k$$$ elements and the $$$k_2$$$ the second such block.

The first query is used to get the sum on $$$[f_k, 2 \cdot f_k)$$$ — the sum of the second block. Then we add two new queries for each non last query in $$$S_{k_1}$$$ and $$$S_{k_2}$$$. First one is $$$S_{k_1}[i] \cup S_{k_2}[i]$$$. Second one is $$$S_{k_1}[i] \cup ([f_k, 2 \cdot f_k) / S_{k_2}[i]) \cup x_{(2 \cdot f_k + i)}$$$.

The last query is for entire range $$$[0, f_{k+1})$$$.

I think it's easy to see why we now have $$$2^{k+1}$$$ queries. The harder part is to understand why we don't lose any value in this process, and how we can solve it recursively. In fact, having answered all the $$$S_{k+1}$$$ queries, we can calculate all the $$$v_{S_{k_1}[i]}$$$ and $$$v_{S_{k_2}[i]}$$$.

Now, when we reach a $$$k$$$ with a value of $$$f_k >= n$$$ we can stop there. Let's assume $$$n = f_k$$$ since it will be easier to work with it (when $$$n$$$ is smaller than $$$f_k$$$ we can just think of it as appending $$$f_k - n$$$ zeroes at the end since they won't influence the sum at all). There is a more elegant way to form $$$n$$$ from blocks of different sizes, but it doesn't enter the scope of this blog and it's not that hard to implement using offsets. Using the set of queries responsible for the $$$k$$$-th layer we can in fact now reconstruct the whole sequence, now recursively going from the $$$k$$$-th layer to the ($$$k - 1$$$)-th one (but consider each layer can have multiple blocks).

First of all, the only things we care about when we are in the $$$k$$$-th layer are:

  • The query set for the corresponding block of the corresponding layer.
  • The block we are currently at (can be dealt with using an offset value in the recursion).

So we will store them when going recursively.

Firstly, let's again set our base case: $$$k = 0$$$. We are now sure that only one element is responsible for this block from this layer, so we can just set the value of the $$$b_x$$$-th bit (where $$$x$$$ is some offset value we use to keep track of the block) to $$$v_{S_0}$$$ (since that's the sum for a single element which we've seen at the build-up).

Now, since the base case is already dealt with, here's how we will go to the ($$$k - 1$$$)-th layer:

We will need to reconstruct the previous query sets for the first block and the second block of size $$$f_{k - 1}$$$, and set the values for the other $$$2^{(k - 1)} - 1$$$ values respectively (since they aren't part of any of the blocks they shouldn't be part of the recursion either).

Let's denote the numbers of $$$1$$$-s in $$$[f_k, 2 \cdot f_k)$$$ with $$$c$$$. It's obvious that $$$c = v_{S[0]}$$$. We will now go through every pair of queries, starting from 1. That means we will be analyzing queries $$$S_1[i] \cup S_2[i]$$$ and $$$S_1[i] \cup ([f_{k-1}, 2 \cdot f_{k-1}) / S_2[i]) \cup x_{(2 \cdot f_{k-1} + i)}$$$.

  • $$$v_{S[2 \cdot i + 1]} = v_{S_1[i]} + v_{S_2[i]}$$$
  • $$$v_{S[2 \cdot i + 2]} = v_{S_1[i]} + c - v_{S_2[i]} + b_{2 \cdot f_{k-1} + i}$$$

In this case we have to calculate 3 values: $$$v_{S_1[i]} , v_{S_2[i]} , b_{2 \cdot f_{k-1} + i}$$$.

  • $$$v_{S_1[i]} = \lfloor\frac{v_{S[2 \cdot i + 1]} + v_{S[2 \cdot i + 2]} - c}{2}\rfloor$$$
  • $$$v_{S_2[i]} = \lceil\frac{v_{S[2 \cdot i + 1]} - v_{S[2 \cdot i + 2]} + c}{2}\rceil$$$
  • $$$b_{2 \cdot f_{k-1} + i} = (v_{S[2 \cdot i + 1]} + v_{S[2 \cdot i + 2]} - c)$$$ & $$$1$$$

The only remaining queries to answer are $$$v_{S_1[2^{k-1}]}$$$ and $$$v_{S_2[2^{k-1}]}$$$ as we didn't add them in $$$S$$$.

  • $$$v_{S_2[2^{k-1}]} = c = v_{S[0]}$$$
  • $$$v_{S_1[2^{k-1}]} = v{S[2^k]} - c - \sum_{i = 0}^{2^{k-1}-1} b_{2 \cdot f_{k-1} + i} $$$

So, with all this calculated we are ready to go down to the previous layer's blocks.

Visual representation

Here's a visual representation of the queries we use for going up to the next layer:

  • The green squares represent the queries responsible for the first block of the previous layer.
  • The orange squares represent the queries responsible for the second block of the previous layer.
  • The blue squares represent the queries that query the whole second block excluding the elements from the query of the second block.
  • The red squares represent the last $$$2^k - 1$$$ bits.

Count of queries (a bit optimized)

Here's a visual representation of how the count of queries grows as $$$n$$$ increases, we notice that the count of queries is around $$$2.67 \cdot n / log2(n)$$$.

Some Code

Keep in mind this code is pretty inefficient and is just used to explain the solution better (it's still functional though)

Firstly, let's set some helper functions to work easier with our query sets:

Helper Functions

Now, here's how we do the build-up part:

Build-up

And the final part — recursively going from the $$$k$$$-th layer to the ($$$k - 1$$$)-th till each element is visited.

The build-down

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en18 Английский SlavicG 2022-07-29 12:41:24 8 Tiny change: '+ 2]} - c)$ & $1$\n\nThe ' -> '+ 2]} - c) \land 1$\n\nThe '
en17 Английский SlavicG 2022-07-29 12:40:12 12 Tiny change: 'that $c = S[0]$. We ' -> 'that $c = v_S[0]$. We '
en16 Английский KrowSavcik 2022-07-24 13:28:54 62 Minor correction
en15 Английский SlavicG 2022-07-24 12:37:17 0 (published)
en14 Английский SlavicG 2022-07-24 12:35:10 46 Tiny change: 'in A$\n* $(A)$ — a set in brackets is a query\n* $k$ &m' -> 'in A$\n* $k$ &m'
en13 Английский SlavicG 2022-07-24 12:34:34 389
en12 Английский KrowSavcik 2022-07-24 12:29:11 253
en11 Английский SlavicG 2022-07-24 11:55:40 21 Tiny change: 's detailed.\n\n### ' -> 's detailed and not easy to find.\n\n### '
en10 Английский SlavicG 2022-07-24 11:54:53 4711 Tiny change: 'nd $1.85 \ cdot n / l' -> 'nd $1.85 \cdot n / l'
en9 Английский KrowSavcik 2022-07-24 11:07:59 258 Added gifs
en8 Английский KrowSavcik 2022-07-24 10:53:28 379 Tiny change: 'if)\n![ ](https://codeforces.me/2fca27/op2.gif)\n![ ](ht' -> 'if)\n![ ]()\n![ ](ht'
en7 Английский KrowSavcik 2022-07-24 08:57:55 1261 Tiny change: ' i)}$.\n\n' -> ' i)}$.\n\n* $v_{S[2 \cdot i + 1]} = v_{S_1[i]} + v_{S_2[i]}$\n\n'
en6 Английский SlavicG 2022-07-23 22:02:47 19
en5 Английский SlavicG 2022-07-23 21:20:21 1540 Tiny change: 'cdot f_k) \cap S_{k_2}[i' -> 'cdot f_k) / S_{k_2}[i'
en4 Английский SlavicG 2022-07-23 20:53:50 766 Tiny change: 'ated from $0$*\n\nAs it' -> 'ated from 0*\n\nAs it'
en3 Английский KrowSavcik 2022-07-23 20:17:23 783 Tiny change: 's $S_{k_1} \cup S_{k' -> 's $S_{k_1}[i] \cup S_{k'
en2 Английский KrowSavcik 2022-07-23 19:51:40 732 Tiny change: '.\n\n```\nq_i\n```\n\n' -> '.\n\n```\n$q_i$\n```\n\n'
en1 Английский SlavicG 2022-07-23 19:21:02 1001 Initial revision (saved to drafts)