Hi everyone!
This time I'd like to write about what's widely known as "Aliens trick" (as it got popularized after 2016 IOI problem called Aliens). There are already some articles about it here and there, and I'd like to summarize them, while also adding insights into the connection between this trick and generic Lagrange multipliers and Lagrangian duality which often occurs in e.g. linear programming problems.
Familiarity with a previous blog about ternary search or, at the very least, definitions and propositions from it is expected.
Great thanks to mango_lassi and 300iq for useful discussions and some key insights on this.
Tldr.
Let $$$f : X \to \mathbb R$$$ and $$$g : X \to \mathbb R^c$$$. You need to solve the constrained optimization problem
Let $$$t(\lambda) = \inf_x [f(x) - \lambda \cdot g(x)]$$$. Finding $$$t(\lambda)$$$ is unconstrained problem and is usually much simpler.
It may be rewritten as $$$t(\lambda) = \inf_y [h(y) - \lambda \cdot y]$$$ where $$$h(y)$$$ is the minimum possible $$$f(x)$$$ subject to $$$g(x)=y$$$.
This is applicable if $$$h(y)$$$ is convex, as $$$t(\lambda)$$$ defines a supporting hyperplane of $$$h(y)$$$'s graph with normal vector $$$(-\lambda, 1)$$$.
For convex $$$h(y)$$$, there is a certain monotonicity between $$$\lambda_i$$$ and $$$y_i$$$, so you can find $$$\lambda$$$ corresponding to $$$y=0$$$ with a binary search.
If $$$g(x)$$$ and $$$f(x)$$$ are integer functions, a binary search is sometimes doable over integers with $$$\lambda_i$$$ corresponding to $$$h(y_i) - h(y_i-1)$$$.
Boring and somewhat rigorous explanation is below, problem examples are belower.
Lagrange duality
Let $$$f : X \to \mathbb R$$$ be the objective function and $$$g : X \to \mathbb R^c$$$ be the constraint function. The constrained optimization problem
in some cases can be reduced to finding stationary points of the Lagrange function
Here $$$\lambda \cdot g(x)$$$ is a dot product of $$$g(x)$$$ and a variable vector $$$\lambda \in \mathbb R^c$$$.
Def. 1. A function $$$L(x, \lambda)$$$ defined above is called the Lagrange function, or the Lagrangian.
Def. 2. A vector $$$\lambda \in \mathbb R^c$$$ defined above is called a Lagrange multiplier. Its components are collectively called Lagrange multipliers.
Mathematical optimization focuses on finding stationary points of $$$L(x,\lambda)$$$. However, we're more interested in its infimal projection
Def. 3. A function $$$t(\lambda)$$$ defined above is called the Lagrange dual function.
Note that $$$t(\lambda)$$$, as a point-wise infimum of concave (linear in $$$\lambda$$$) functions, is always concave, no matter what $$$X$$$ and $$$f$$$ are.
If $$$x^*$$$ is the solution to the original problem, then $$$t(\lambda) \leq L(x^*,\lambda)=f(x^*)$$$ for any $$$\lambda \in \mathbb R^c$$$. This allows us to introduce
Def. 4. The unconstrained optimization problem $$$t(\lambda) \to \max$$$ is called the Lagrangian dual problem.
Def. 5. If $$$\lambda^*$$$ is the solution to the dual problem, the value $$$f(x^*) - t(\lambda^*)$$$ is called the duality gap.
Def. 6. A condition when the duality gap is zero is called a strong duality.
Typical example here is Slater's condition, which says that strong duality holds if $$$f(x)$$$ is convex and there exists $$$x$$$ such that $$$g(x)=0$$$.
Change of domain
In competitive programming, the set $$$X$$$ in definitions above is often weird and very difficult to analyze directly, so Slater's condition is not applicable. As a typical example, $$$X$$$ could be a set of all possible partitions of $$$\{1,2,\dots, n\}$$$ into non-intersecting segments. Besides, instead of specific equation $$$g(x)=0$$$, you are often asked to minimize $$$f(x)$$$ subject to $$$g(x)=y$$$ where $$$y$$$ is a part of problem input.
To mitigate this, we define $$$h(y)$$$ as the minimum value of $$$f(x)$$$ subject to $$$g(x)=y$$$. In this notion, the dual function is written as
where $$$Y=\{ g(x) : x \in X\}$$$. The set $$$Y$$$ is usually much more regular than $$$X$$$, as just by definition it is already a subset of $$$\mathbb R^c$$$. The strong duality condition is also very clear in this terms: it holds if and only if $$$0 \in Y$$$ and there is a $$$\lambda$$$ for which $$$y=0$$$ delivers infimum.
Def. 7. The epigraph of a function $$$f : Y \to \mathbb R$$$ is a set $$$\text{epi }f = \{(y, z) : z \geq f(y)\} \subset Y \times \mathbb R$$$.
Def. 8. A supporting hyperplane of a set $$$X \subset \mathbb R^d$$$ with a normal vector $$$\lambda \in \mathbb R^d$$$ is a surface defined as $$$\lambda \cdot x = c$$$, where $$$c$$$ is the largest possible number such that $$$\lambda \cdot x \geq c$$$ for all $$$x \in X$$$. Equivalently, $$$c$$$ is the infimum of $$$\lambda \cdot x$$$ among all $$$x \in X$$$.
Geometrically, $$$t(\lambda)$$$ defines a level at which the epigraph of $$$h(y)$$$ has a supporting hyperplane with the normal vector $$$(-\lambda, 1)$$$. Indeed, the half-space bounded by such hyperplane on the level $$$c$$$ is defined as $$$\{(y, z) : z-\lambda \cdot y \geq c\}$$$.
All the points $$$(y, h(y))$$$ at which the hyperplane touches the epigraph minimize the $$$t(\lambda)$$$. Please, refer to the picture below. Lower $$$c$$$ would move the hyperplane lower, while higher $$$c$$$ would move it higher. With $$$c=t(\lambda)$$$, the hyperplane is lowest possible while still intersecting the epigraph of the function in the point $$$(y^*, h(y^*))$$$ where $$$y^*$$$ delivers the minimum of $$$h(y) - \lambda \cdot y$$$.
For strong duality to hold for all inputs, all $$$y \in Y$$$ should have a supporting hyperplane that touches the epigraph at $$$(y, h(y))$$$. This condition is essentially equivalent to $$$h(y)$$$ being convex-extensible, that is, there should exist a convex function on $$$\mathbb R^c$$$ such that its restriction to $$$Y$$$ yields $$$h(y)$$$.
$$$1$$$-dimensional case
For now, let's assume that the problem has a single constraint, thus only one dimension to deal with.
Due to general properties of convex functions, larger $$$y$$$ would require larger $$$\lambda$$$ for the supporting line in the point $$$y$$$ and vice versa — larger $$$\lambda$$$ would be able to define a supporting line on larger $$$y$$$ only. This monotonicity means that we can find optimal $$$\lambda$$$.
Algorithm for finding $$$\lambda$$$ that delivers optimal $$$t(\lambda)$$$:
- Do the binary search on $$$\lambda$$$. Assume that you work in $$$[\lambda_l, \lambda_r)$$$ and test $$$\lambda_m$$$ with $$$m \approx \frac{l+r}{2}$$$.
- Compute optimal $$$x_\lambda$$$ for $$$f(x)-\lambda_m g(x)$$$ function and $$$y_\lambda=g(x_\lambda)$$$ corresponding to it.
- When there are several possible $$$x_\lambda$$$, choose the one that delivers minimum $$$y_\lambda$$$.
- If $$$y_\lambda > 0$$$, you should reduce working range to $$$[\lambda_l, \lambda_m)$$$, otherwise reduce it to $$$[\lambda_m,\lambda_r)$$$.
Third step is essential, as $$$\lambda_m$$$ can correspond to several consequent $$$y$$$ such that the points $$$(y, h(y))$$$ lie on the same line and, therefore, have a common supporting line. However, if we look on the smallest $$$y$$$ for every $$$\lambda_m$$$, we will guarantee that the values we find are non-decreasing as a function of $$$\lambda_m$$$. If finding minimal $$$y_\lambda$$$ is cumbersome, one might use ternary search instead to solve dual problem directly.
Note: This approach doesn't guarantee that we find $$$x_\lambda$$$ such that $$$g(x_\lambda)=0$$$, however we will find $$$\lambda$$$ that corresponds to the optimum, which would mean that $$$(y_\lambda, f(x_\lambda))$$$ and $$$(y^*, f(x^*))$$$ lie on the same line with a slope coefficient $$$\lambda$$$, thus you can get the answer as
Integer search
What are the possible $$$\lambda$$$ for specific $$$y$$$? When $$$h(y)$$$ is continuously differentiable, it essentially means that $$$\lambda$$$ corresponds to $$$y$$$ such that $$$\lambda = h'(y)$$$. On the other hand, when $$$Y$$$ is the set of integers, $$$y$$$ optimizes $$$t(\lambda)$$$ for all $$$\lambda$$$ such that
So, if $$$h(y)$$$ is an integer function, we may do the integer search of $$$\lambda$$$ on possible values of $$$h(k)-h(k-1)$$$ only.
In a very generic case, when the function is not continuously differentiable and $$$y_i$$$ are not necessarily integer, the set of possible $$$\lambda_i$$$ for a given $$$y_i$$$ is defined as $$$\partial h(y_i)$$$, the so-called sub-differential of $$$h$$$ in $$$y_i$$$, formally defined as $$$[a,b]$$$ where
The concept of sub-derivatives and sub-differentials can be generalized to multi-dimensional case as well with sub-gradients.
$$$2$$$-dimensional case
When $$$c>1$$$, the original problem might be reduced to a sequence of nested single-dimensional problems. Let's start with $$$c=2$$$.
The function $$$t(\lambda_1)$$$ can also be rewritten as a Lagrangian dual:
Thus, rather than solving joint problem on $$$\lambda_1,\lambda_2$$$, we need to solve two Lagrangian dual problems:
Rewriting them in $$$h$$$-terms, we obtain
where $$$h$$$-functions are defined in the following way:
For both problems to have strong duality, both $$$h(y_1) = h(y_1, 0)$$$ and $$$h_{\lambda_1}(y_1)$$$ must be convex.
Multidimensional case
Let's for the sake of clarity write down the full sequence of nested $$$1$$$-dimensional optimization problems in $$$c$$$-dimensional case.
We need to optimize $$$f(x)$$$ w.r.t $$$g_1(x) = g_2(x) = \dots = g_n(x)=0$$$. We introduce $$$\lambda_1, \dots, \lambda_n$$$. and solve the optimization problem
We decompose it in a sequence of nested problems in the following way:
where
With the help of $$$h(y_1, \dots, y_n)$$$, the same could be rewritten as
For nested binary search on $$$\lambda$$$ components to work, every function $$$h_{\lambda_1 \dots \lambda_{k-1}}(y_k)$$$ must be convex, including $$$h(y_1) = h(y_1, 0, \dots, 0)$$$.
The whole procedure could roughly look as follows:
void adjust(double *lmb, int i, int c) {
if(i == c) {
return;
}
double l = -inf, r = +inf; // some numbers that are large enough
while(r - l > eps) { // Might be unsafe on large numbers, consider fixed number of iterations
double m = (l + r) / 2;
lmb[i] = m;
adjust(lmb, i + 1, c);
auto [xl, yl] = solve(lmb, i); // returns (x_lambda, y_lambda) with the minimum y_lambda[i]
if(yl[k] > 0) {
r = m;
} else {
l = m;
}
}
lmb[i] = (l + r) / 2;
}
Alternatively, one might consider nested ternary search to solve joint dual problem directly.
Testing convexity
When $$$Y$$$ is continuous set, convexity may be tested by local criteria. For one-dimensional set, $$$h(y)$$$ is convex if and only if $$$h'(y)$$$ is non-decreasing. If it has a second derivative, we might also say that the function is convex if and only if $$$h' '(y) \geq 0$$$ for all $$$y$$$.
In multidimensional case, the local criterion is that the Hessian matrix $$$\frac{\partial h}{\partial y_i \partial y_j}$$$ is a positive semi-definite.
In discrete case, derivatives can be substituted with finite differences. A function $$$h(y)$$$ defined on $$$\mathbb Z$$$ is convex if and only if
is non-decreasing or, which is equivalent,
Another possible way to prove that $$$1$$$-dimensional $$$h(y)$$$ is convex in relevant problems is to reduce it to finding min-cost flow of size $$$y$$$ in some network. The common algorithm to find such flow pushes a flow of size $$$1$$$ through the shortest path in the residual network, which becomes smaller when $$$y$$$ rises. This method was revealed to me by 300iq.
In multi-dimensional case testing convexity might be challenging. Instead of this, one may decompose the problem into a nested set of $$$1$$$-dimensional problems such that for each problem $$$\lambda$$$-optimization is applicable. For this, you would need to prove that TODO TODO TODO
Problem examples
For simplicity, we will use $$$g(x)=y_0$$$ in primal problem instead of $$$g(x)-y_0=0$$$, while also omitting the constant part in dual function. It leads to the following changes with respect to the written above:
- Binary search target is set to $$$y=y_0$$$ instead of $$$y=0$$$;
- The primal solution is determined by $$$f(x^*) = t(\lambda^*) + \lambda \cdot y_0$$$ rather than $$$f(x^*) = t(\lambda^*)$$$.
Sometimes the problem is stated with $$$\max$$$ rather than $$$\min$$$. In this case, $$$t(\lambda)$$$ and $$$h(y)$$$ are defined as supremum and maximum rather than infimum and minimum. Correspondingly, $$$h(y)$$$ needs to be concave rather than convex to allow the usage of the trick.
Gosha is hunting. You're given $$$a$$$, $$$b$$$, $$$p_1, \dots, p_n$$$ and $$$q_1,\dots, q_n$$$. You need to pick two subsets $$$A$$$ and $$$B$$$ of $$$\{1,2,\dots,n\}$$$ of size $$$a$$$ and $$$b$$$ correspondingly in such a way that the following sum is maximized:
Formulating it as a constrained optimization problem, we obtain
Lagrangian dual here is
Let $$$h(\alpha,\beta) = \max f(A, B)$$$ subject to $$$|A| = \alpha$$$ and $$$|B| = \beta$$$, then
References
- Duality (optimization) — English Wikipedia
- The Trick From Aliens — Serbanology
- My Take on Aliens' Trick — Mamnoon Siam's Blog
- Incredibly beautiful DP optimization from N^3 to N log^2 N
- Comment on Codeforces by _h_
- Brief conversation with 300iq
- Part of the article was once revealed to me in a dream