Integer Binary Search – proper implementation

Правка en7, от rembocoder, 2023-03-01 07:27:04

It seems, there are so many materials on binary search already that everyone must know how to code it, however every material seems to tell its own approach, and I see people being lost and still making bugs on such a simple thing. That's why I want to tell you about an approach which I find the most elegant. Of all the variety of articles, I saw my approach only in this comment (but less generalized).

UPD: Also in the nor's blog.

The problem

Suppose you want to find the last element less than x in a sorted array and just store a segment of candidates for the answer [l, r]. If you write:

int l = 0, r = n - 1; // the segment of candidates
while (l < r) { // there are more than 1 candidate
    int mid = (l + r) / 2;
    if (a[mid] < x) {
        l = mid;
    } else {
        r = mid - 1;
    }
}
cout << l;

...you will run into a problem: suppose l = 3, r = 4, then mid = 3. If a[mid] < x, you will end up in a loop (the next iteration will be again on [3, 4]).

Okay, it can be fixed with rounding mid up – mid = (l + r + 1) / 2. But we have another problem: what if there are no such elements in the array? We would need an extra check for that case. As a result, we have a pretty ugly code that is not generalized very well.

My approach

Let's generalize the problem we want to solve. We have a statement about an integer number n that is true for integers smaller than some bound, but then always stays false once n exceeded that bound. We want to find the last n for which the statement is true.

First of all, we will use half-intervals instead of segments (one border is inclusive, another is non-inclusive). Half-intrevals are in general very useful, elegant and conventional in programming, I recommend using them as much as possible. In that case we will choose some small l for which we know in before the statement is true and some big r for which we know it's false. Then the range of candidates is [l, r).

My code for that problem would be:

int l = -1, r = n; // a half-interval [l, r) of candidates
while (r - l > 1) { // there are more than 1 candidate
    int mid = (l + r) / 2;
    if (a[mid] < x) {
        l = mid; // now it's the largest for which we know it's true
    } else {
        r = mid; // now it's the smallest for which we know it's false
    }
}
cout << l; // in the end we are left with a range [l, l + 1)

The binary search will only do checks for some numbers strictly between initial l and r. It means that it will never check the statement for l and r, it will trust you that for l it's true, and for r it's false. Here we consider -1 as a valid answer, which will correspond to no numbers in array being less than x.

Notice how there are no "+ 1" or "- 1" in my code and no extra checks are needed, and no loops are possible (since mid is strictly between the current l and r).

Reverse problem

The only variation that you need to keep in mind is that half of the times you need to find not the last, but the first number for which something is true. In that case the statement must be always false for smaller numbers and always true starting from some number.

We will do pretty much the same thing, but now r will be an inclusive border, while l will be non-inclusive. In other words, l is now some number for which we know the statement to be false, and r is some for which we know it's true. Suppose I want to find the first number n for which n * (n + 1) >= x (x is positive):

int l = 0, r = x; // a half-interval (l, r] of candidates
while (r - l > 1) { // there are more than 1 candidate
    int mid = (l + r) / 2;
    if (mid * (mid + 1) >= x) {
        r = mid; // now it's the smallest for which we know it's true
    } else {
        l = mid; // now it's the largest for which we know it's false
    }
}
cout << r; // in the end we are left with a range (r - 1, r]

Just be careful to not choose a r too large, as it can lead to overflow.

An example

1201C - Maximum Median You are given an array a of an odd length n and in one operation you can increase any element by 1. What is the maximal possible median of the array that can be achieved in k steps?

Consider a statement about the number x: we can make the median to be no less than x in no more than k steps. Of course it is always true until some number, and then always false, so we can use binary search. As we need the last number for which this is true, we will use a normal half-interval [l, r).

To check for a given x, we can use a property of the median. A median is no less than x iff at least half of the elements are no less than x. Of course the optimal way to make half of the elements no less than x is to take the largest elements.

Of course we can reach the median no less than 1 under given constraints, so l will be equal to 1. But even if there is one element and it's equal to 1e9, and k is also 1e9, we still can't reach median 2e9 + 1, so r will be equal to 2e9 + 1. Implementation:

#define int int64_t

int n, k;
cin >> n >> k;
vector<int> a(n);
for (int i = 0; i < n; i++) {
    cin >> a[i];
}
sort(a.begin(), a.end());
int l = 1, r = 2e9 + 1; // a half-interval [l, r) of candidates
while (r - l > 1) {
    int mid = (l + r) / 2;
    int cnt = 0; // the number of steps needed
    for (int i = n / 2; i < n; i++) { // go over the largest half
        if (a[i] < mid) {
            cnt += mid - a[i];
        }
    }
    if (cnt <= k) {
        l = mid;
    } else {
        r = mid;
    }
}
cout << l << endl;

Conclusion

Hope I've made it clearer and some of you will switch to this implementation. To clarify, occasionally other implementations can be more fitting, for example with interactive problems – whenever we need to think in terms of an interval of searching, and not in terms of the first/last number for which something is true.

I remind that I do private lessons on competitive programming, the price is $30/h. Contact me on Telegram, Discord: rembocoder#3782, or in CF private messages.

Теги binary search

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en7 Английский rembocoder 2023-03-01 07:27:04 4 Tiny change: 'price is $25/h. Contac' -> 'price is $30/h. Contac'
ru6 Русский rembocoder 2023-03-01 07:26:37 4 Мелкая правка: ', цена – $25/ч. Пишите' -> ', цена – $30/ч. Пишите'
en6 Английский rembocoder 2022-05-29 14:21:23 0 (published)
en5 Английский rembocoder 2022-05-29 14:20:53 93 (saved to drafts)
ru5 Русский rembocoder 2022-05-29 14:20:25 90 Мелкая правка: '[user:nor]а в [блоге]' -> '[user:nor] в [блоге]'
en4 Английский rembocoder 2022-05-28 21:53:08 23 Tiny change: '\n\n~~~~\nint n, k' -> '\n\n~~~~\n#define int int64_t\n\nint n, k'
ru4 Русский rembocoder 2022-05-28 21:51:10 23 Мелкая правка: '\n\n~~~~\nint n, k' -> '\n\n~~~~\n#define int int64_t\n\nint n, k'
en3 Английский rembocoder 2022-05-28 20:37:33 9 Tiny change: 'e _first_ element for which' -> 'e _first_ number for which' (published)
ru3 Русский rembocoder 2022-05-28 20:36:26 0 (опубликовано)
ru2 Русский rembocoder 2022-05-28 20:31:27 1
ru1 Русский rembocoder 2022-05-28 20:31:12 6379 Первая редакция перевода на Русский (сохранено в черновиках)
en2 Английский rembocoder 2022-05-28 19:39:46 24 Tiny change: 'h the same, but now ' -> 'h the same thing, but now '
en1 Английский rembocoder 2022-05-28 19:28:15 6356 Initial revision (saved to drafts)