Blog entries - Codeforces

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

platelet's blog

CodeTON Round 5 Editorial

By platelet, 20 months ago, In English

1842A - Tenzing and Tsondu

Tutorial

Code

#include <iostream>

using namespace std;

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    int T;
    cin >> T;
    while (T--) {
        int n, m, a[50], b[50];
        long long sumA = 0, sumB = 0;
        cin >> n >> m;
        for (int i = 0; i < n; i++)
            cin >> a[i], sumA += a[i];
        for (int i = 0; i < m; i++)
            cin >> b[i], sumB += b[i];
        if (sumA > sumB) cout << "Tsondu\n";
        if (sumA < sumB) cout << "Tenzing\n";
        if (sumA == sumB) cout << "Draw\n";
    }
}

1842B - Tenzing and Books

Tutorial

Alternative Solution

Code

#include <iostream>

using namespace std;

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    int T;
    cin >> T;
    while (T--) {
        int n, x, a[100000];
        cin >> n >> x;
        int s = 0;
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < n; j++) cin >> a[j];
            for (int j = 0; j < n; j++) {
                if ((x | a[j]) != x) break;
                s |= a[j];
            }
        }
        if (s == x) cout << "YES\n";
        else cout << "NO\n";
    }
}

Alternative Code

#include <bits/stdc++.h>

using namespace std;

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    int T;
    cin >> T;
    while (T--) {
        int n, x, ai;
        cin >> n >> x;
        vector<int> pre[3];
        for (int i = 0; i < 3; i++) {
            int s = 0;
            pre[i].push_back(s);
            for (int j = 0; j < n; j++) {
                cin >> ai;
                if ((s | ai) != s)
                    s |= ai, pre[i].push_back(s);
            }
        }
        bool ans = 0;
        for (int A : pre[0]) for (int B : pre[1]) for (int C : pre[2])
            ans |= (A | B | C) == x;
        cout << (ans ? "YES\n" : "NO\n");
    }
}

1842C - Tenzing and Balls

Tutorial

Code

#include <bits/stdc++.h>

using namespace std;

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    int T;
    cin >> T;
    while (T--) {
        const int N = 200000 + 5;
        int n, a[N], dp[N], buc[N];
        cin >> n;
        dp[0] = 0;
        for (int i = 1; i <= n; i++) buc[i] = 0x3f3f3f3f;
        for (int i = 1; i <= n; i++) {
            cin >> a[i];
            dp[i] = min(dp[i - 1] + 1, buc[a[i]]);
            buc[a[i]] = min(buc[a[i]], dp[i - 1]);
        }
        cout << n - dp[n] << '\n';
    }
}

1842D - Tenzing and His Animal Friends

Tutorial

Code

#include <bits/stdc++.h>

using namespace std;

int n, m;
long long dis[100][100];

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    cin >> n >> m;
    memset(dis, 0x3f, sizeof dis);
    while (m--) {
        int u, v, w;
        cin >> u >> v >> w, u--, v--;
        dis[u][v] = dis[v][u] = w;
    }
    for (int i = 0; i < n; i++) dis[i][i] = 0;
    for (int k = 0; k < n; k++)
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                dis[i][j] = min(dis[i][j], dis[i][k] + dis[k][j]);
    if (dis[0][n - 1] > 1e18)
        cout << "inf", exit(0);
    int ord[100];
    iota(ord, ord + n, 0);
    sort(ord + 1, ord + n, [](int a, int b) {
        return dis[0][a] < dis[0][b];
    });
    string s(n, '0');
    vector<pair<string, int>> ans;
    for (int i = 0; i < n - 1; i++) {
        int u = ord[i], v = ord[i + 1];
        s[u] = '1';
        ans.emplace_back(s, dis[0][v] - dis[0][u]);
        if (v == n - 1) break;
    }
    cout << dis[0][n - 1] << ' ' << ans.size() << '\n';
    for (auto [s, t] : ans)
        cout << s << ' ' << t << '\n';
}

Please first understand the approach in the tutorial. A triangle $$$[L, R]$$$ can cover a point $$$(x, y)$$$ iff $$$L\le x$$$ and $$$k-y\le R$$$. Therefore, point $$$(x,y)$$$ can be regarded as interval $$$[x,k-y]$$$. Now the problem is transformed into one where some intervals $$$[l,r]$$$ have a cost of $$$w$$$, and you can place some non-overlapping intervals. If $$$[l,r]$$$ is not included in any placed interval, then you need to pay a cost of $$$w$$$. Further transformation: you can pay a cost of $$$A$$$ to cover $$$[i,i+1]$$$, and for an interval $$$[l,r]$$$ with a cost of $$$w$$$, if interval $$$[l,r]$$$ is not completely covered, then you need to pay a cost of $$$w$$$. Try to use minimum cut to solve this problem:

First establish a bipartite graph with $$$n$$$ left nodes representing the points in the original problem and $$$k$$$ right nodes representing intervals $$$[i,i+1]$$$.
The source node is connected to each left node with an edge capacity equal to the node's cost.
Each right node is connected to the sink node with an edge capacity of $$$A$$$.
For each left node representing interval $$$[l,r]$$$, it is connected to right nodes $$$l,l+1,l+2,...,r-1$$$ with an edge capacity of infinity.

According to the maximum flow minimum cut theorem, let's consider how to find the maximum flow of this graph.

This is basically a maximum matching problem of a bipartite graph. Each left node can match an interval on the right side, and each node has matching frequency restriction. A greedy algorithm: First sort all intervals in increasing order of the right endpoint, then consider each interval in turn and match it with positions within the interval from left to right. Specifically, let $$$cnt_i$$$ represent how many times the $$$i$$$-th point on the right side can still be matched. Initially, $$$cnt_{0\dots k-1}=A$$$. For each interval $$$[l,r]$$$ that can be matched at most $$$w$$$ times, each time find the smallest $$$i$$$ in $$$[l,r-1]$$$ such that $$$cnt_i\ne 0$$$, and match $$$\min(cnt_i,w)$$$ times with $$$i$$$.

Use a Disjoint Set Union to query the smallest $$$i\ge l$$$ such that $$$cnt_i\ne 0$$$. The time complexity is $$$O(n\alpha(n))$$$.

The Method of Four Russians can also be used to achieve $$$O(n)$$$ time complexity. Divide the sequence into blocks of $$$64$$$, use bit operations and __builtin_ctzll for searching within the block, and use Disjoint Set Union to skip blocks where $$$cnt_i$$$ is all $$$0$$$. In this way, the union operation only needs $$$O(\frac n{\log n})$$$ times and the find operation only needs $$$O(n)$$$ times. It can be proved that in this case, the time complexity of Disjoint Set Union is $$$O(n)$$$ instead of $$$O(n\alpha(n))$$$.

Code

#include <bits/stdc++.h>
 
using namespace std;
 
const int N = 2e5 + 8, M = 1 << 18;
 
int n, k, A, c[M << 1][2], dp[N];
vector<pair<int, int>> p[N];
 
void insert(int i, int v) {
    for (i += M; i; i >>= 1) c[i][0] = v, v = max(v, c[i ^ 1][0]);
}
void add(int i, int v) {
    for (i += M + 1; i > 1; i >>= 1) {
        int v1 = i & 1 ? v : 0;
        c[i ^ 1][0] += v1, c[i ^ 1][1] += v1;
        c[i >> 1][0] = max(c[i][0], c[i ^ 1][0]) + c[i >> 1][1];
    }
}
int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    cin >> n >> k >> A;
    int sum = 0;
    while (n--) {
        int x, y, w;
        cin >> x >> y >> w;
        p[k - y].emplace_back(x, w);
        sum += w;
    }
    for (int i = 1; i <= k; i++) {
        add(i - 1, -A);
        for (auto [x, w] : p[i]) add(x, w);
        dp[i] = max(dp[i - 1], c[1][0]);
        insert(i, dp[i]);
    }
    cout << sum - dp[k] << '\n';
}

Alternative Code O(n alpha n)

#include <bits/stdc++.h>

using namespace std;

const int N = 2e5 + 8;

int n, k, A, cnt[N], pa[N];
vector<pair<int, int>> v[N];

int find(int x) {
    return !pa[x] ? x : pa[x] = find(pa[x]);
}
int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    cin >> n >> k >> A;
    while (n--) {
        int x, y, w;
        cin >> x >> y >> w;
        v[k - y].emplace_back(x, w);
    }
    for (int i = 0; i < k; i++) cnt[i] = A;
    int ans = 0;
    for (int r = 1; r <= k; r++)
        for (auto [l, w] : v[r])
            while (l < r) {
                int tmp = min(w, cnt[l]);
                ans += tmp, cnt[l] -= tmp, w -= tmp; 
                if (!cnt[l]) l = pa[l] = find(l + 1);
                if (!w) break;
            }
    cout << ans << '\n';
}

Alternative Code O(n)

#include <bits/stdc++.h>

struct IO {
    static const int inSZ = 1 << 17;
    char inBuf[inSZ], *in1, *in2;
    inline __attribute((always_inline))
    int read() {
        if(__builtin_expect(in1 > inBuf + inSZ - 32, 0)) {
            auto len = in2 - in1;
            memcpy(inBuf, in1, len);
            in1 = inBuf, in2 = inBuf + len;
            in2 += fread(in2, 1, inSZ - len, stdin);
            if(in2 != inBuf + inSZ) *in2 = 0;
        }
        int res = 0;
        unsigned char c = *in1++;
        while(res = res * 10 + (c - 48), (c = *in1++) >= 48);
        return res;
    }
    IO() {
        in1 = inBuf;
        in2 = in1 + fread(in1, 1, inSZ, stdin);
    }
} IO;
inline int read() { return IO.read(); }

using namespace std;

const int N = 2e5 + 8, N2 = N / 64 + 8;

int n, k, A, pts[N][3], buc[N], LW[N][2];
int cnt[N], pa[N2], Rp[N];
uint64_t mask[N2];

int find(int x) {
    return pa[x] < 0 ? x : pa[x] = find(pa[x]);
}

int main() {
    n = read(), k = read(), A = read();
    for (int i = 0; i < n; i++) {
        pts[i][1] = read();
        pts[i][0] = k - read();
        pts[i][2] = read();
        buc[pts[i][0]]++;
    }
    for (int i = 1; i <= k; i++) buc[i + 1] += buc[i];
    for (int i = 0; i < n; i++) {
        int t = --buc[pts[i][0]];
        memcpy(LW[t], pts[i] + 1, 8);
    }
    for (int i = 0; i < k; i++) cnt[i] = A;
    memset(mask, 0xff, sizeof mask);
    memset(pa, -1, sizeof pa);
    iota(Rp, Rp + N2, 0);
    int ans = 0;
    for (int r = 1; r <= k; r++)
        for (int i = buc[r]; i < buc[r + 1]; i++) {
            int l = LW[i][0], w = LW[i][1];
            ans += w;
            int lb = Rp[find(l >> 6)], rb = r >> 6;
            auto S0 = mask[lb];
            if (lb == l >> 6) S0 &= ~0ULL << (l & 63);
            while (lb < rb) {
                auto S = S0;
                for (; S; S &= S - 1) {
                    int p = lb * 64 + __builtin_ctzll(S);
                    int tmp = min(w, cnt[p]);
                    cnt[p] -= tmp, w -= tmp;
                    if (!w) break;
                }
                mask[lb] ^= S0 ^ S;
                if (!w) break;
                int nxt = find(lb + 1);
                if (!mask[lb]) {
                    lb = find(lb);
                    if (pa[nxt] > pa[lb]) swap(nxt, lb), Rp[nxt] = Rp[lb];
                    pa[nxt] += pa[lb], pa[lb] = nxt;
                }
                lb = Rp[nxt], S0 = mask[lb];
            }
            if (w != 0 & lb == rb) {
                S0 &= (1ULL << (r & 63)) - 1;
                auto S = S0;
                for (; S; S &= S - 1) {
                    int p = lb * 64 + __builtin_ctzll(S);
                    int tmp = min(w, cnt[p]);
                    cnt[p] -= tmp, w -= tmp;
                    if (!w) break;
                }
                mask[lb] ^= S0 ^ S;
            }
            ans -= w;
        }
    cout << ans << '\n';
}

1842F - Tenzing and Tree

Tutorial

Code

#include <bits/stdc++.h>

using namespace std;

const int N = 5000 + 8;

int n, ans[N];
vector<int> G[N];

void bfs(int u) {
    static int q[N], dis[N];
    memset(dis, -1, sizeof dis);
    q[1] = u, dis[u] = 0;
    for (int l = 1, r = 2; l < r; l++) {
        u = q[l];
        for (int v : G[u]) if (dis[v] < 0)
            dis[v] = dis[u] + 1, q[r++] = v;
    }
    int sum = 0;
    for (int i = 1; i <= n; i++) {
        sum += dis[q[i]]; 
        ans[i] = max(ans[i], (n - 1) * i - sum * 2);
    }
}
int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    cin >> n;
    for (int i = 0; i < n - 1; i++) {
        int u, v;
        cin >> u >> v, u--, v--;
        G[u].push_back(v), G[v].push_back(u);
    }
    for (int i = 0; i < n; i++) bfs(i);
    for (int i = 0; i <= n; i++)
        cout << ans[i] << ' ';
}

1842G - Tenzing and Random Operations

Tutorial

Before starting to solve this problem, let's establish two basic properties:

For two completely independent random variables $$$x_1,x_2$$$, we have $$$E(x_1x_2) = E(x_1)E(x_2)$$$.
For $$$(a+b)\times (c+d)$$$, we have $$$E((a+b)\times (c+d)) = E(ac) + E(ad) + E(bc) + E(bd)$$$.

Returning to this problem, let $$$x_{i,j}$$$ be a random variable: its value is $$$v$$$ when the $$$i$$$-th operation sets $$$a_j$$$ to $$$a_j + v$$$, otherwise it is $$$0$$$.

Then note that the answer is the expected value of $$$\prod_{i=1}^{n}(a_i+\sum_{j=1}^{m}x_{j,i})$$$.

Applying the second property above to split the product, each term is a product of some $$$a_i$$$ and $$$x$$$.

Specifically, each term has $$$n$$$ factors, and for each $$$i\in [1,n]$$$, either $$$a_i$$$ is one of its factors, or some $$$x_{j,i}$$$ is one of its factors.

Let's investigate the expectation of a specific term. Note that if $$$i_1\lt i_2$$$, then $$$E(x_{j,i_1}\times x_{j,i_2}) = E(x_{j,i_1})\times v$$$, that is, if $$$x_{j,i_1}$$$ is $$$0$$$ then the whole product is $$$0$$$, and if $$$x_{j,i_1}$$$ is $$$v$$$ then $$$x_{j,i_2}$$$ must be $$$v$$$.

Therefore, for all the $$$x$$$ factors in a term, we categorize them by the first index, i.e. we group all $$$x_{j,...}$$$ into category $$$j$$$. For each category, we only need to focus on the first variable. If it's $$$v$$$, then the remaining variables take value $$$v$$$, otherwise the result is $$$0$$$. Note that the variables in different categories are completely independent (because their values are determined by operations in two different rounds), so the expected product of the variables in two categories can be split into the product of the expected products of the variables within each category.

Our goal is to compute the expected sum of all the terms, which can be nicely combined with DP:

Let $$$dp(i,j)$$$ be the value that we have determined the first $$$i$$$ factors of each term and there are $$$j$$$ categories that have appeared at least once (if adding the variable at position $$$i+1$$$ brings contribution $$$v$$$, otherwise the contribution is $$$\frac{i+1}{n}\times v$$$). The transition can be easily calculated with $$$O(1)$$$, depending on whether to append $$$a_{i+1}$$$ or $$$x_{...,i+1}$$$ to each term, and if it's the latter, we discuss whether the variable belongs to one of the $$$j$$$ categories that have appeared or the other $$$m-j$$$ categories. The time complexity is $$$O(n\times \min(n,m))$$$.

Code

#include <bits/stdc++.h>

using namespace std;

const int N = 5000 + 8, P = 1e9 + 7;

int n, m, v, a[N], coef[N], dp[N][N];

long long Pow(long long a, int n) {
    long long r = 1;
    while (n) {
        if (n & 1) r = r * a % P;
        a = a * a % P, n >>= 1;
    }
    return r;
}

int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    cin >> n >> m >> v;
    for (int i = 1; i <= n; i++) cin >> a[i];
    dp[0][0] = 1;
    for (int i = 1; i <= n; i++) {
        auto coef = i * Pow(n, P - 2) % P * v % P;
        for (int j = 0; j < i; j++) {
            dp[i][j + 1] = dp[i - 1][j] * coef % P * (m - j) % P;
            dp[i][j] = (dp[i][j] + dp[i - 1][j] * (a[i] + 1LL * j * v % P)) % P;
        }
    }
    int ans = 0;
    for (int i = 0; i <= n; i++)
        (ans += dp[n][i]) %= P;
    cout << ans << '\n';
}

1842H - Tenzing and Random Real Numbers

Hint 1

Hint 2

Tutorial

Suppose no variable equals $$$0.5$$$, because the probability of being equal to $$$0.5$$$ is $$$0$$$, variables less than $$$0.5$$$ are called white vertices, and those greater than $$$0.5$$$ are called black vertices. Each black and white coloring is equiprobable, so we can calculate the probability that satisfies all conditions for each black and white coloring, and then take the average.

For two variables less than $$$0.5$$$, the condition of $$$\le 1$$$ is always satisfied, and the condition of $$$\ge 1$$$ is never satisfied. Therefore, we do not need to consider the conditions between same-colored points. The condition between white vertex $$$u$$$ and black vertex $$$v$$$, $$$x_u+x_v\le 1$$$, is satisfied only when $$$x_u\le 1-x_v$$$. Let $$$y_u=\min(x_u,1-x_u)=\begin{cases}x_u&(u\text{ is white})\\1-x_u&(u\text{ is black})\end{cases}$$$, then $$$y_u$$$ can be regarded as a random variable in $$$[0,0.5)$$$, for $$$\le 1$$$ condition, the white vertex's $$$y$$$ must be less than or equal to the black vertex's $$$y$$$, so we add an edge from the white vertex to the black vertex; for $$$\ge 1$$$ condition, we add an edge from the black vertex to the white vertex.

We get a directed graph that restricts the size relationship of $$$y_{1\cdots n}$$$. Suppose that sorting $$$y_{1\cdots n}$$$ from small to large is $$$y_{p_1},y_{p_2},\cdots,y_{p_n}$$$, then each permutation $$$p$$$ is equiprobable, and this $$$p$$$ contributes if and only if it is a topological sort, so the probability that satisfies all conditions is the number of topological sorts divided by $$$n!$$$.

Now the problem has been transformed into a counting problem. For each coloring, count the total number of topological sorts. Now we do not enumerate coloring directly but enumerate topological sorts directly by enumerating a permutation $$$p$$$ such that $$$y_{p_1}<y_{p_2}<\cdots<y_{p_n}$$$ and count the number of colorings that satisfy the conditions. It can be found that $$$\le 1$$$ condition limits variablesin in the front position of $$$p$$$ to be less than $$$0.5$$$, and $$$\ge 1$$$ condition limits variables in the front position of $$$p$$$ to be greater than $$$0.5$$$.

Then we can use bit-mask DP. Let $$$dp_{mask}$$$ represent that we have added all vertices in mask into topological sort. We enumerate new added vertex u for transition. If all variables with $$$\le 1$$$ conditions between it are included in mask, it can be colored black; if all variables with $$$\ge 1$$$ conditions between it are included in mask, it can be colored white.

Time complexit is $$$O(2^nn)$$$.

Code

#include <iostream>
 
const int P = 998244353;
 
long long Pow(long long a, int n) {
    long long r = 1;
    while (n) {
        if (n & 1) r = r * a % P;
        a = a * a % P, n >>= 1;
    }
    return r;
}
inline void inc(int& a, int b) {
    if((a += b) >= P) a -= P;
}
 
int n, m, G[20][2], f[1 << 20];
 
int main() {
    std::cin >> n >> m;
    while (m--) {
        int t, i, j;
        std::cin >> t >> i >> j;
        i--, j--;
        G[i][t] |= 1 << j;
        G[j][t] |= 1 << i;
    }
    f[0] = 1;
    for (int S = 0; S < 1 << n; S++)
        for (int i = 0; i < n; i++) if (~S >> i & 1) {
            if ((G[i][0] | S) == S) inc(f[S | 1 << i], f[S]);
            if ((G[i][1] | S) == S) inc(f[S | 1 << i], f[S]);
        }
    long long t = 1;
    for (int i = 1; i <= n; i++) t = t * i * 2 % P;
    std::cout << f[(1 << n) - 1] * Pow(t, P - 2) % P << '\n';
}

1842I - Tenzing and Necklace

Hint 1

Hint 2

Tutorial

Add a constraint: "you must cut off $$$m$$$ edges".

Consider enumerating the minimum cut edges from small to large.

Suppose the minimum cut edge chosen is $$$a_1$$$, and the subsequent optimal solution is $$$a_2, a_3, ..., a_m$$$.

If another minimum cut edge is selected: $$$b_1$$$, and the subsequent optimal solution is $$$b_2, b_3, ..., b_m$$$.

Assume that $$$a_i<a_{i+1}, b_i<b_{i+1}, b_1>a_1$$$.

1. It is possible to only adjust $$$b_2,b_3, ..., b_m$$$, so that $$$\forall_{1\leq i\leq m} b_i\geq a_i$$$, and the total cost after adjustment remains unchanged.

The adjustment method is as follows:

Find the smallest $$$i$$$ such that $$$b_i<a_i$$$, and find the first $$$j$$$ such that $$$b_j\geq a_j$$$ after $$$i$$$, if it does not exist, let $$$j=m+1$$$.

It can be observed that $$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$ can be replaced with $$$(a_i,a_{i+1},a_{i+2},...,a_{j-1})$$$, which is still a valid solution. Moreover, the solution $$$(a_i,a_{i+1},a_{i+2},...,a_{j-1})$$$ can also be replaced with $$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$, because $$$b_{i-1}\geq a_{i-1}$$$ and $$$b_j\geq a_j$$$.

Since $$$a$$$ and $$$b$$$ are both optimal solutions with fixed $$$a_1$$$ and $$$b_1$$$, $$$w_{b_i}+w_{b_{i+1}}+...+w_{b_{j-1}}=w_{a_i}+w_{a_{i+1}}+...+w_{a_{j-1}}$$$. Therefore, replacing $$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$ with $$$(a_i,a_{i+1},a_{i+2},...,a_{j-1})$$$ does not increase the total cost.

Repeat the above adjustment until there is no $$$b_i<a_i$$$.

Similarly, it can be proven that only adjusting $$$a_2,a_3,...,a_m$$$ is feasible, so that $$$\forall_{1\leq i\leq m} b_i\geq a_i$$$, and the total cost after adjustment remains unchanged.

2. If $$$\forall_{1\leq i\leq m} b_i\geq a_i$$$ is already satisfied, it is possible to only adjust $$$b_2,b_3, ..., b_m$$$, so that $$$\forall_{1\leq i<m}a_i\leq b_i\leq a_{i+1}$$$, and the total cost after adjustment remains unchanged. Assume that $$$a_1<b_1\leq a_2$$$.

The adjustment method is as follows:

Find the smallest $$$i$$$ such that $$$b_i> a_{i+1}$$$, and find the first $$$j$$$ such that $$$b_j\leq a_{j+1}$$$ after $$$i$$$ (let $$$a_{m+1}=+\infty$$$).

$$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$ can be replaced with $$$(a_{i+1},a_{i+2},a_{i+3},...,a_{j})$$$, which is still a valid solution. Moreover, the solution $$$(a_{i+1},a_{i+2},a_{i+3},...,a_{j})$$$ can also be replaced with $$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$, because $$$b_{i-1}\leq a_{i}$$$ and $$$b_j\leq a_{j+1}$$$.

Since $$$a$$$ and $$$b$$$ are both optimal solutions with fixed $$$a_1$$$ and $$$b_1$$$, $$$w_{b_i}+w_{b_{i+1}}+...+w_{b_{j-1}}=w_{a_{i+1}}+w_{a_{i+2}}+...+w_{a_j}$$$. Therefore, replacing $$$(b_i,b_{i+1},b_{i+2},...,b_{j-1})$$$ with $$$(a_{i+1},a_{i+2},a_{i+3},...,a_{j})$$$ does not increase the total cost.

Similarly, it can be proven that only adjusting $$$a_2,a_3,...,a_m$$$ is feasible, so that $$$\forall_{1\leq i<m}a_i\leq b_i\leq a_{i+1}$$$, and the total cost after adjustment remains unchanged.

3. If $$$b_1> a_2$$$, it is possible to adjust $$$b_1,b_2,...,b_m$$$, so that $$$b_1\leq a_2$$$, and the total cost does not increase.

The adjustment method is as follows:

Find the smallest $$$j$$$ such that $$$b_j\leq a_{j+1}$$$ (let $$$a_{m+1}=+\infty$$$).

It can be observed that $$$(a_{2},a_{3},a_{4},...,a_{j})$$$ can be replaced with $$$(b_1,b_{2},b_{3},...,b_{j-1})$$$, which is still a valid solution. Moreover, the solution $$$(b_1,b_{2},b_{3},...,b_{j-1})$$$ can also be replaced with $$$(a_{2},a_{3},a_{4},...,a_{j})$$$, because $$$b_j\leq a_{j+1}$$$.

Since $$$a$$$ is the optimal solution with fixed $$$a_1$$$ and $$$b_1$$$, $$$w_{b_1}+w_{b_{2}}+...+w_{b_{j-1}}\geq w_{a_{2}}+w_{a_{3}}+...+w_{a_j}$$$. Therefore, replacing $$$(b_1,b_{2},b_{3},...,b_{j-1})$$$ with $$$(a_{2},a_{3},a_{4},...,a_{j})$$$ does not increase the total cost.

Combining the above conclusions, we can obtain a solution that must cut off $$$m$$$ edges:

Let $$$a_1=1$$$, find the optimal solution $$$a_1,a_2,a_3,...,a_m$$$.

Then, it can be assumed that all $$$b_i$$$ satisfy $$$a_i\leq b_i\leq a_{i+1}$$$.

A divide-and-conquer algorithm can be used. Let $$$solve((l_1,r_1),(l_2,r_2),(l_3,r_3),...,(l_m,r_m))$$$ represent the optimal solution for all $$$l_i\leq b_i\leq r_i$$$.

If $$$l_1>r_1$$$, then we are done. Otherwise, let $$$x=\lfloor\frac{l_1+r_1}{2}\rfloor$$$, we can use DP to calculate the cost and solution for $$$b_1=x$$$ in $$$O(\sum r_i-l_i+1)$$$ time complexity. Then, recursively calculate $$$solve((l_1,b_1-1),(l_2,b_2),(l_3,b_3),...,(l_m,b_m))$$$ and $$$solve((b_1+1,r_1),(b_2,r_2),(b_3,r_3),...,(b_m,r_m))$$$.

Time complexity analysis: $$$\sum r_i-l_i+1=(\sum r_i-l_i)+m$$$. If the sum of adjacent parts is $$$\leq k$$$, it can be merged, but it is definitely not the optimal solution. Therefore, $$$m\leq 2\lceil\frac{n}{k}\rceil$$$. Assuming that the length of the first segment is $$$r_1-l_1+1=O(k)$$$, the time complexity is $$$O(n\log k+mk)=O(n\log k)$$$.

Finally, we need to calculate the solution for all possible $$$m$$$ and take the $$$\min$$$ as the final answer. After pruning the first edge, if the optimal solution requires cutting off $$$m'$$$ edges, similar to the previous proof, other solutions can be adjusted to satisfy $$$|m-m'|\leq 1$$$ and the total cost does not increase.

Code

#include <bits/stdc++.h>

using namespace std;

const int N = 5e5 + 8;

int n, K, a[N], pre[N * 2];
long long dp[N * 2], ans;

vector<int> trim(vector<int> a, int L, int R) {
    return vector(a.begin() + L, a.end() - R);
}
vector<int> init() {
    static int q[N];
    q[1] = 0;
    for (int i = 1, l = 1, r = 1; i <= n; i++) {
        if (q[l] < i - K) l++;
        dp[i] = dp[q[l]] + a[i], pre[i] = q[l];
        while (l <= r && dp[i] < dp[q[r]]) r--;
        q[++r] = i;
    }
    ans = dp[n];
    vector<int> res;
    for (int i = n; i; i = pre[i]) res.push_back(i);
    res.push_back(0), reverse(res.begin(), res.end());
    return res;
}
vector<int> solve(int first, vector<int> L, vector<int> R) {
    dp[first] = a[first];
    int l = first, r = first;
    long long val; int p;
    auto checkMin = [&](int i) {
        if (dp[i] < val) val = dp[i], p = i;
    };
    for (int i = 0; i < L.size(); i++) {
        val = 1e18, p = 0;
        for (int j = R[i]; j >= L[i]; j--) {
            for (; r >= max(l, j - K); r--) checkMin(r + i);
            dp[j + i + 1] = val + a[j];
            pre[j + i + 1] = p;
        }
        l = L[i], r = R[i];
    }
    val = 1e18, p = 0;
    for (int i = max(L.back(), n - K + first); i <= R.back(); i++)
        checkMin(i + L.size());
    ans = min(ans, val);
    vector<int> res;
    for (int i = L.size(); i; i--) res.push_back(p - i), p = pre[p];
    reverse(res.begin(), res.end());
    return res;
}
void divide(int l, int r, vector<int> L, vector<int> R) {
    if (l > r) return;
    int mid = (l + r) >> 1;
    auto M = solve(mid, L, R);
    divide(l, mid - 1, L, M), divide(mid + 1, r, M, R);
}
void divide(vector<int> p) {
    p.push_back(n), divide(1, p[0], trim(p, 0, 1), trim(p, 1, 0));
}
int main() {
    ios::sync_with_stdio(false), cin.tie(nullptr);
    int T;
    cin >> T;
    while (T--) {
        cin >> n >> K;
        for (int i = 1; i <= n; i++) cin >> a[i];
        a[0] = a[n];
        auto p = init();
        divide(trim(p, 1, 1));
        divide(solve(0, trim(p, 0, 1), trim(p, 1, 0)));
        if ((p.size() - 2) * K >= n)
            divide(solve(0, trim(p, 1, 2), trim(p, 2, 1)));
        cout << ans << '\n';
    }
}

Full text and comments »

Tutorial of CodeTON Round 5 (Div. 1 + Div. 2, Rated, Prizes!)

+573

platelet
20 months ago
128

CodeTON Round 5 (Div. 1 + Div. 2, Rated, Prizes!)

By platelet, 20 months ago, In English

Note the unusual start time of the round.

Hello, Codeforces!

Now that Gaokao is over, we are very glad to invite you to participate in CodeTON Round 5 (Div. 1 + Div. 2, Rated, Prizes!), which will start at Jun/24/2023 17:05 (Moscow time). You will be given 9 problems and 3 hours to solve them. The round will be rated for everyone.

All problems are written and prepared by Gary2005, Asuka, Crying, sjcsjcsjc, MonkeyKing, DerekFeng, KbltQaQ, ShmilyTY and me.

Statements and editorials will be available in Chinese (Simplified) after the contest.

We would like to give our sincere thanks to:

errorgorn for his wonderful coordination!
Alexdat2000 for translating problem statements.
gyh20, wangziji, MagicalFlower, SeeleVollerei, qiuzx, Gellyfish, ntherner, nvmdava, bkbtpout, AlperenT, ffao, AquaMoon, mir, zengminghao, constructive, Arraiter, FengzhuJian, He_Ren, LZDQ, CrTsIr, Pineapplello, feecle6418, 1.618, Alexdat2000, tzc_wk, WAtoAC2001 and tibinyte2006 for testing this round and providing valuable feedbacks.
MikeMirzayanov for the great codeforces and polygon platform.
Lastly, we would like to express our gratitude to you for participating in the round.

The main character of the problems will be Tenzing Tsondu.

We hope that everyone can enjoy the round! As this round is sponsored, everyone will have an opportunity to win some prizes!

Good luck!

UPD1: Here is the score distribution:

250 — 500 — 1000 — 1500 — 2000 — 2500 — 3000 — 3750 — 5000

UPD2: Tutorial is available.

UPD3: Congratulations to the winners.

UPD4: Congratulations to the first solver of each problem.

A: alexwice
B: ksun48
C: PinkieRabbit
D: Um_nik
E: Qingyu
F: amenotiomoi
G: tourist
H: rain_boy
I: maroonrk (after contest)

UPD5: Chinese statements

UPD6: Chinese editorials

Some information from our title sponsor:

Hello, Codeforces!

We, the Ton Foundation team, are pleased to support CodeTON Round 5.

The Open Network (TON) is a fully decentralized layer-1 blockchain designed to onboard billions of users to Web3.

Since July 2022, we have been supporting Codeforces as a title sponsor. This round is another way for us to contribute to the development of the community.

The winners of CodeTON Round 5 will receive valuable prizes.

The first 1,023 participants will receive prizes in TON cryptocurrency:

1st place: 1,024 TON
2–3 places: 512 TON each
4–7 places: 256 TON each
8–15 places: 128 TON each
…
512–1,023 places: 2 TON each

We wish you good luck at CodeTON Round 5 and hope you enjoy the contest!

Full text and comments »

Announcement of CodeTON Round 5 (Div. 1 + Div. 2, Rated, Prizes!)

+1345

platelet
20 months ago
278

Why does "ios::sync_with_stdio(false)" cause a "fread" error?

By platelet, history, 2 years ago, In English

Wrong answer submission and Accepted submission

You can compare them, the only difference is that the Wrong answer submission has an extra cin.tie(0)->sync_with_stdio(0);

I didn't use cin and cout, just fread and fwrite.

Can someone tell me if there is something wrong with my usage or a compiler bug.

Full text and comments »

platelet
2 years ago
7

A modulo multiplication method that is 2x faster than compiler implementation

By platelet, history, 2 years ago, In English

Given $$$k,m$$$ and $$$n$$$ numbers $$$a_i$$$, compute each $$$a_i\times k\bmod m$$$

I came up with a method that is almost twice as fast as the compiler implementation (when $$$m$$$ is a constant), which can effectively speed up the NTT and some DPs.

First of all

$$$ a_i\times k\bmod m=\{a_i\times \frac km\}\times m\tag1 $$$

where $$$\{x\}=x-\lfloor x\rfloor$$$ is the fractional part function. The principle of equation $$$(1)$$$ is that $$$a_i\times k\bmod m$$$ is only related to the fractional part of $$$\frac{a_i\times k}m$$$.

Let $$$\frac km\approx\frac p{2^{64}}$$$, $$$p$$$ is either equal to $$$\lfloor\frac km\times2^{64}\rfloor$$$ or $$$\lceil\frac km\times2^{64}\rceil$$$, which one to choose, we will decide later, then

$$$ \begin{aligned} a_i\times k\bmod m&\approx\{a_i\times \frac p{2^{64}}\}\times m\\ &=\frac{a_ip\bmod 2^{64}}{2^{64}}\times m\\ &=\frac{a_ip\bmod 2^{64}\times m}{2^{64}} \end{aligned} \tag2 $$$

There are two place we need to choose whether round up or down:

$$$p=\lfloor\frac km\times2^{64}\rfloor$$$ or $$$p=\lceil\frac km\times2^{64}\rceil$$$.
The final formula in (2) isn't always an integer, so we need to consider whether we round it up or down.

We choose $$$p=\lceil\frac km\times2^{64}\rceil$$$, which will slightly enlarge the answer, while the final formula in (2) rounded down, which will slightly lessen the answer. We'll prove that this set of choice can give us correct answer just if $$$a_i\le\frac{2^{64}}m$$$.

Proof

const int P = 998244353;

void calc(int n, int k, int a[]) {
    unsigned long long p = ((unsigned __int128)k << 64) + P - 1) / P;
    for(int i = 0; i < n; i++)
    	a[i] = (unsigned)a[i] * p * (unsigned __int128)P >> 64;
}

A few notes.

The code uses unsigned __int128 so it can only be used in a 64-bit architecture.
(unsigned)a[i] * p will automatically be modulo $$$2^{64}$$$.
* (unsigned __int128)P >> 64 is 64-bit multiplication retaining only the high 64 bits (in the rdx register), the same speed as multiplying two unsigned long longs together.
Converting a[i] to unsigned is because int to unsigned and then to unsigned long long is faster than going directly to unsigned long long, which requires sign extension.

Speed test.

code

#include <bits/stdc++.h>

using namespace std;

const int N = 5e4, P = 998244353;

int a[N];

void ThroughputTest() {
    int checkSum1 = 0, checkSum2 = 0, checkSum3 = 0;

    auto start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N; j++) {
            checkSum1 ^= (int64_t)a[i] * a[j] % P;
            checkSum1 ^= (int64_t)a[i + 1] * a[j] % P;
        }
    auto end = std::chrono::steady_clock::now();
    cout << "Compiler's signed modulo:   " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N; j++) {
            checkSum2 ^= (uint64_t)(uint32_t)a[i] * (uint32_t)a[j] % P;
            checkSum2 ^= (uint64_t)(uint32_t)a[i + 1] * (uint32_t)a[j] % P;
        }
    end = std::chrono::steady_clock::now();
    cout << "Compiler's unsigned modulo: " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2) {
        uint64_t x = (((__uint128_t)a[i] << 64) + P - 1) / P;
        uint64_t y = (((__uint128_t)a[i + 1] << 64) + P - 1) / P;
        for(int j = 0; j < N; j++) {
            checkSum3 ^= (uint32_t)a[j] * x * (__uint128_t)P >> 64;
            checkSum3 ^= (uint32_t)a[j] * y * (__uint128_t)P >> 64;
        }
    }
    end = std::chrono::steady_clock::now();
    cout << "My modulo:                  " << (end - start).count() * 1e-6 << " ms" << endl;

    assert(checkSum1 == checkSum2 && checkSum2 == checkSum3);
}
void LatencyTest() {
    int checkSum1 = 0, checkSum2 = 0, checkSum3 = 0;

    auto start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N / 2; j++) {
            checkSum1 = (int64_t)a[i] * (a[j] ^ checkSum1) % P;
            checkSum1 = (int64_t)a[i + 1] * (a[j] ^ checkSum1) % P;
        }
    auto end = std::chrono::steady_clock::now();
    cout << "Compiler's signed modulo:   " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N / 2; j++) {
            checkSum2 = (uint64_t)(uint32_t)a[i] * (uint32_t)(a[j] ^ checkSum2) % P;
            checkSum2 = (uint64_t)(uint32_t)a[i + 1] * (uint32_t)(a[j] ^ checkSum2) % P;
        }
    end = std::chrono::steady_clock::now();
    cout << "Compiler's unsigned modulo: " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2) {
        uint64_t x = (((__uint128_t)a[i] << 64) + P - 1) / P;
        uint64_t y = (((__uint128_t)a[i + 1] << 64) + P - 1) / P;
        for(int j = 0; j < N / 2; j++) {
            checkSum3 = (uint32_t)(a[j] ^ checkSum3) * x * (__uint128_t)P >> 64;
            checkSum3 = (uint32_t)(a[j] ^ checkSum3) * y * (__uint128_t)P >> 64;
        }
    }
    end = std::chrono::steady_clock::now();
    cout << "My modulo:                  " << (end - start).count() * 1e-6 << " ms" << endl;

    assert(checkSum1 == checkSum2 && checkSum2 == checkSum3);
}
int main() {
    mt19937 gen;
    for(int i = 0; i < N; i++) a[i] = gen() % P;
    cout << "Throughput test (50000 * 50000):" << endl;
    ThroughputTest();
    cout << endl;
    cout << "Latency test (50000 * 25000):" << endl;
    LatencyTest();
}

It contains two parts.

The first part is the reciprocal throughput, the time taken by the CPU to be highly parallel (modern CPUs can be parallelized at instruction level on a single core), containing a total of $$$50000\times50000$$$ modulo multiplications.
The second part is the Latency, which is the time taken for each modulo multiplication to be performed sequentially without parallelism, containing a total of $$$50000\times25000$$$ modulo multiplications.

Possible output:

Throughput test(50000 * 50000):
Compiler's signed modulo:   1954.83 ms
Compiler's unsigned modulo: 1746.73 ms
My modulo:                  1160.47 ms

Latency test(50000 * 25000):
Compiler's signed modulo:   4329.33 ms
Compiler's unsigned modulo: 3945.29 ms
My modulo:                  2397.97 ms

By the way, a few general facts.

Constant modulo multiplication is almost 4 times faster in parallel than serial (as is modulo multiplication of my invention).
int to unsigned then to long long is faster than long long, but negative numbers will be wrong.
unsigned long long modulo constants is faster than long long.

Comparison with other methods

Comparison with Barrett reduction and Montgomery multiplication:

The purpose of my method is to compute $$$a\times b\bmod m$$$ for fixed $$$m$$$ and $$$b$$$, while Barrett reduction and Montgomery multiplication compute $$$a\times b\bmod m$$$ for fixed m. But my method is faster than the other two methods.
The derivation of my method is similar to Barrett reduction. So They both work when $$$m < 2^{32}$$$, while Montgomery multiplication works when $$$m < 2^{64}$$$ and $$$m$$$ is an odd number.

Extensions

It is also possible to compute $$$(a_1b_1+a_2b_2+\cdots+a_nb_n)\bmod m$$$, but $$$\sum a_i$$$ cannot exceed $$$\frac{2^{64}}m$$$.

Let $$$p_i=\lceil\frac{b_i}m\times2^{64}\rceil$$$.

$$$ (\sum a_ib_i)\bmod m=\lfloor\frac{(\sum a_ip_i)\bmod 2^{64}\times m}{2^{64}}\rfloor $$$

Full text and comments »

+311

platelet
2 years ago
6

speed test code

By platelet, history, 2 years ago, In English

speed test code of this.

#include <bits/stdc++.h>

using namespace std;

const int N = 5e4, P = 998244353;

int a[N];

void ThroughputTest() {
    int checkSum1 = 0, checkSum2 = 0, checkSum3 = 0;

    auto start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N; j++) {
            checkSum1 ^= (int64_t)a[i] * a[j] % P;
            checkSum1 ^= (int64_t)a[i + 1] * a[j] % P;
        }
    auto end = std::chrono::steady_clock::now();
    cout << "Compiler's signed modulo:   " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N; j++) {
            checkSum2 ^= (uint64_t)(uint32_t)a[i] * (uint32_t)a[j] % P;
            checkSum2 ^= (uint64_t)(uint32_t)a[i + 1] * (uint32_t)a[j] % P;
        }
    end = std::chrono::steady_clock::now();
    cout << "Compiler's unsigned modulo: " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2) {
        uint64_t x = (((__uint128_t)a[i] << 64) + P - 1) / P;
        uint64_t y = (((__uint128_t)a[i + 1] << 64) + P - 1) / P;
        for(int j = 0; j < N; j++) {
            checkSum3 ^= (uint32_t)a[j] * x * (__uint128_t)P >> 64;
            checkSum3 ^= (uint32_t)a[j] * y * (__uint128_t)P >> 64;
        }
    }
    end = std::chrono::steady_clock::now();
    cout << "My modulo:                  " << (end - start).count() * 1e-6 << " ms" << endl;

    assert(checkSum1 == checkSum2 && checkSum2 == checkSum3);
}
void LatencyTest() {
    int checkSum1 = 0, checkSum2 = 0, checkSum3 = 0;

    auto start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N / 2; j++) {
            checkSum1 = (int64_t)a[i] * (a[j] ^ checkSum1) % P;
            checkSum1 = (int64_t)a[i + 1] * (a[j] ^ checkSum1) % P;
        }
    auto end = std::chrono::steady_clock::now();
    cout << "Compiler's signed modulo:   " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2)
        for(int j = 0; j < N / 2; j++) {
            checkSum2 = (uint64_t)(uint32_t)a[i] * (uint32_t)(a[j] ^ checkSum2) % P;
            checkSum2 = (uint64_t)(uint32_t)a[i + 1] * (uint32_t)(a[j] ^ checkSum2) % P;
        }
    end = std::chrono::steady_clock::now();
    cout << "Compiler's unsigned modulo: " << (end - start).count() * 1e-6 << " ms" << endl;

    start = chrono::steady_clock::now();
    for(int i = 0; i < N; i += 2) {
        uint64_t x = (((__uint128_t)a[i] << 64) + P - 1) / P;
        uint64_t y = (((__uint128_t)a[i + 1] << 64) + P - 1) / P;
        for(int j = 0; j < N / 2; j++) {
            checkSum3 = (uint32_t)(a[j] ^ checkSum3) * x * (__uint128_t)P >> 64;
            checkSum3 = (uint32_t)(a[j] ^ checkSum3) * y * (__uint128_t)P >> 64;
        }
    }
    end = std::chrono::steady_clock::now();
    cout << "My modulo:                  " << (end - start).count() * 1e-6 << " ms" << endl;

    assert(checkSum1 == checkSum2 && checkSum2 == checkSum3);
}
int main() {
    mt19937 gen;
    for(int i = 0; i < N; i++) a[i] = gen() % P;
    cout << "Throughput test (50000 * 50000):" << endl;
    ThroughputTest();
    cout << endl;
    cout << "Latency test (50000 * 25000):" << endl;
    LatencyTest();
}

Possible output:

Throughput test(50000 * 50000):
Compiler's signed modulo:   1954.83 ms
Compiler's unsigned modulo: 1746.73 ms
My modulo:                  1160.47 ms

Latency test(50000 * 25000):
Compiler's signed modulo:   4329.33 ms
Compiler's unsigned modulo: 3945.29 ms
My modulo:                  2397.97 ms

Full text and comments »

platelet
2 years ago
1

proof

By platelet, history, 2 years ago, In English

proof of this

Theorem: Let $$$p=\lceil\frac km\times2^{64}\rceil$$$, when $$$a_i\le \frac{2^{64}}m$$$, the computation of the lower rounding is exact.

Proof: Let $$$\frac p{2^{64}}=\frac km+\epsilon$$$, where $$$\epsilon\in[0,\frac1{2^{64}})$$$.

$$$ \begin{aligned} \lfloor\{a_i\times \frac p{2^{64}}\}\times m\rfloor&=\lfloor\{a_i\times (\frac km+\epsilon)\}\times m\rfloor\\ &=\lfloor\{a_i\times \frac km+a_i\epsilon\}\times m\rfloor \end{aligned} $$$

Here if $$$\lfloor a_i\times\frac km+a_i\epsilon\rfloor>\lfloor a_i\times \frac km\rfloor$$$ must be wrong, $$$a_i\times \frac km$$$ is at least $$$\frac1m$$$ away from $$$\lfloor a_i\times \frac km\rfloor+1$$$, so as long as $$$a_i\epsilon<\frac1m$$$ it's OK, let's continue the derivation.

$$$ \begin{aligned} &=\lfloor\{a_i\times \frac km+a_i\epsilon\}\times m\rfloor\\ &=\big\lfloor\left(\{a_i\times \frac km\}+a_i\epsilon\right)m\big\rfloor\\ &=\lfloor\{a_i\times \frac km\}\times m+a_im\epsilon\rfloor \end{aligned} $$$

Since $$$\{a_i\times\frac km\}\times m$$$ is an integer, the result is exact as long as $$$a_im\epsilon<1$$$.

The $$$a_i\epsilon<\frac1m$$$ and $$$a_im\epsilon<1$$$ are the same, and then the condition can be rewritten as $$$a_i\le\frac{2^{64}}m$$$ according to $$$\epsilon<\frac1{2^{64}}$$$.

Full text and comments »

platelet
2 years ago
3