Right after the end of Facebook Hacker Cup 2021 Round 1, wjomlex posted the following comment in the discussion section under the announcement blog:
A common issue was people writing O(N^2) solutions to problem A when O(N) solutions are necessary. You won't be able to run an O(N^2) solution in 6 minutes when N = 800,000.
But is this really true? Challenge accepted!
Problem A1
First of all, here is an $$$O(N)$$$ solution for problem A1:
#include <iostream>
#include <vector>
#include <stdint.h>
using namespace std;
/*
* This is a solution for problem A1. Time complexity: O(N)
* Characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a1(const int8_t *arr, int n)
{
int32_t ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
ans++;
hand = ch;
}
}
return ans;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<int8_t> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a1(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
A bit of preprocessing is done to the input data for remapping letters to numbers 0, 1 and 2. This allows to use bitwise OR operation for updating the current state of hands: 0 — undecided, 1 — left hand, 2 — right hand. If the hands state variable becomes equal to 3, then it's a hands change situation and the answer is increased by 1. The reason for implementing it this way is that the elementary operation in the innermost loop needs to be very fast. It's going to be reused in A2 solutions too.
Problem A2, solution v0
Reusing code from the problem A1 solution allows to easily create an $$$O(N^3)$$$ solution for problem A2 by just iterating over all the possible substrings, calling function "solve_a1" for each of them and accumulating the result. But we need $$$O(N^2)$$$ to satisfy the conditions of the challenge and this can be achieved by applying a minor adjustment, transforming "solve_a1" into "solve_a2l":
#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")
/*
* The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
* data types reduce memory usage and cache misses, but may introduce a bit of
* conversion overhead.
*/
#ifndef ETYPE
#define ETYPE int32_t
#endif
const int32_t mod = 1000000007;
/*
* This is a solution for modified problem A2: account only substrings, which are
* aligned at the left side of the original string. Example: valid left aligned
* substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2l(const ETYPE *arr, int n)
{
int32_t partial_ans = 0, ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
partial_ans++;
hand = ch;
}
ans += partial_ans;
if (ans >= mod)
ans -= mod;
}
return ans;
}
/*
* This is a solution for problem A2. Time complexity: O(N^2)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2(const ETYPE *arr, int n)
{
int32_t ans = 0;
for (int i = 0; i < n; i++) {
ans += solve_a2l(arr + i, n - i);
if (ans >= mod)
ans -= mod;
}
return ans;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<ETYPE> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a2(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
Unsurprisingly, this solution is way too slow and needs 28 minutes to process the practice mode data even on my desktop computer with a quad-core Intel Core i7-860 processor @2.8GHz:
# g++-11.2.0 -O3 -march=native -funroll-loops solution0.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 28m13.998s
user 28m12.630s
sys 0m1.280s
Problem A2, solution v1 (OpenMP pragma)
A good thing is that the performance can be very easily improved by just taking advantage of more CPU cores:
#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
using namespace std;
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx,avx2,sse,sse2,sse3,sse4,popcnt,fma")
/*
* The data type used for the array. Can be int8_t, int16_t or int32_t. Smaller
* data types reduce memory usage and cache misses, but may introduce a bit of
* conversion overhead.
*/
#ifndef ETYPE
#define ETYPE int32_t
#endif
const int32_t mod = 1000000007;
/*
* This is a solution for modified problem A2: account only substrings, which are
* aligned at the left side of the original string. Example: valid left aligned
* substrings of "OXXF" are "O", "OX", "OXX", and "OXXF". Time complexity: O(N)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2l(const ETYPE *arr, int n)
{
int32_t partial_ans = 0, ans = 0, hand = 0;
for (int i = 0; i < n; i++) {
int32_t ch = arr[i];
hand |= ch;
if (hand == 3) {
partial_ans++;
hand = ch;
}
ans += partial_ans;
if (ans >= mod)
ans -= mod;
}
return ans;
}
/*
* This is a solution for problem A2. Time complexity: O(N^2)
*
* Note: characters in the input buffer are remapped: 'F' => 0, 'X' => 1, 'O' => 2
*/
static int32_t solve_a2(const ETYPE *arr, int n)
{
int64_t ans = 0;
#pragma omp parallel for reduction(+:ans) schedule(guided)
for (int i = n - 1; i >= 0; i--)
ans += solve_a2l(arr + i, n - i);
return ans % mod;
}
int main()
{
ios_base::sync_with_stdio(false); cin.tie(NULL);
int t;
cin >> t;
for (int casenum = 1; casenum <= t; casenum++) {
int n;
string s;
cin >> n >> s;
vector<ETYPE> buf(n);
for (int i = 0; i < n; i++) {
if (s[i] == 'X')
buf[i] = 1;
else if (s[i] == 'O')
buf[i] = 2;
}
int32_t ans = solve_a2(&buf[0], n);
cout << "Case #" << casenum << ": " << ans << "\n";
}
}
Just a single line #pragma omp parallel for
before a single loop helps a lot. The reduction(+:ans)
part is also needed to tell OpenMP how to combine results from each loop iteration (without it we would need to store these results in a temporary buffer and then add them together in an extra $$$O(N)$$$ loop). The accumulator variable is upgraded to int64_t to avoid overflows. Additionally, the schedule(guided)
part of the pragma and the reversed order of processing is there to keep all CPU cores busy and improve performance. The compiler command line needs to include -fopenmp
option.
Below are benchmarks on my desktop computer, old laptop and Samsung Galaxy Tab S6 Lite tablet:
# g++-11.2.0 -O3 -march=native -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 6m31.028s
user 52m6.475s
sys 0m0.020s
# g++-11.2.0 -O3 -march=core2 -funroll-loops -fopenmp solution1.cpp
# time ./a.out < weak_typing_chapter_2_input.txt > output.txt
real 25m39.187s
user 51m15.532s
sys 0m0.156s
# aarch64-unknown-linux-gnu-g++-11.2.0 -static -O3 -mtune=cortex-a73.cortex-a53 -funroll-loops -fopenmp solution1.cpp
# adb push a.out /data/local/tmp/
# time adb shell /data/local/tmp/a.out < weak_typing_chapter_2_input.txt > output.txt
real 8m34.429s
user 0m0.000s
sys 0m0.005s
Solution | Device | Processor | Time |
---|---|---|---|
solution v0 | desktop computer | quad-core Intel Core i7-860 @2.8GHz | 28m13.998s |
solution v1 (OpenMP) | desktop computer | quad-core Intel Core i7-860 @2.8GHz | 6m31.028s |
old laptop | dual-core Intel Core2 Duo T7200 @2.0GHz | 25m39.187s | |
android tablet | octa-core 4x ARM Cortex-A73 @2.3GHz + 4x ARM Cortex-A53 @1.7GHz | 8m34.429s |
Running it all on a homemade "cluster"
I have multiple devices at home and none of them is fast enough by itself. But what if they all work together? It isn't very difficult to make a script, which dissects the input data into separate testcases, runs them on different devices and combines their output together:
require 'open3'
# This script splits a big test into individual testcases. And then runs
# them on multiple computers in parallel.
# Each entry in the 'workers' array describes a command line, which can be
# executed to process a single testcase. Testcases can be run on Android
# phones/tablets via 'adb' or on other Linux computers via 'ssh'.
workers = [
"./a.out", # Desktop computer
"ssh ssvb@t61 /tmp/a.out", # Laptop
"adb shell /data/local/tmp/a.out", # Android tablet
]
# If true, then print debugging information and statistics to stderr
VERBOSE = true
#########################################################################
jobs_todo = {}
jobs_in_progress = {}
jobs_done = {}
# read all testcases from standard input
testcases = gets.to_i.times.map {|i| gets ; [i + 1, gets.strip] }
# sort testcases by lengths of strings, longest first
testcases.sort! {|x, y| y[1].length <=> x[1].length }
# initialize the 'jobs_todo' hash (key: testcase number, value: string)
testcases.each {|x| jobs_todo[x[0]] = x[1] }
workers_contribution = {}
workers.each {|w| workers_contribution[w] = {:cases => 0, :work => 0} }
mutex = Mutex.new
# Spawn a thread for each worker
threads = workers.map do |cmdline|
Thread.new do
while true
id, data = nil, nil
mutex.synchronize do
queue = jobs_todo
queue = jobs_in_progress if queue.empty?
if queue.empty?
# Nothing left to do? Then print results and exit.
if VERBOSE
total_work = workers_contribution.to_a.map {|x| x[1][:work] }.sum
workers_contribution.each do |k, x|
x[:work_percentage] = x[:work].to_f / total_work * 100
end
sorted_wc = workers_contribution.to_a.sort {|x, y| y[1][:work] <=> x[1][:work]}
STDERR.printf("\n== Workers contribution: ==\n")
sorted_wc.each do |worker, x|
STDERR.printf(" '%s' : %.2f%% (%d cases)\n",
worker, x[:work_percentage], x[:cases])
end
end
jobs_done.to_a.sort {|x, y| x[0] <=> y[0] }.each do |x|
printf("Case #%d: %s\n", x[0], x[1])
end
exit
end
id, data = queue.shift
jobs_in_progress[id] = data
end
STDERR.printf("'%s' working on case #%d\n", cmdline, id) if VERBOSE
ans, status = Open3.capture2(cmdline, :stdin_data=>format("1\n%d\n%s\n", data.size, data))
unless status.success? && ans =~ /^Case\s+\#1\:\s*(\d+)/
STDERR.printf("Error: something failed when running '%s'\n", cmdline)
break
end
ans = $1.to_i
STDERR.printf("'%s' finished case #%d (ans = %d)\n", cmdline, id, ans) if VERBOSE
mutex.synchronize do
if jobs_in_progress.delete(id)
# Account only useful work (the other worker hasn't completed it first)
workers_contribution[cmdline][:cases] += 1
workers_contribution[cmdline][:work] += data.size * data.size
end
jobs_done[id] = ans
end
end
end
end
# Wait until threads completion
threads.each {|thread| thread.join }
# time ruby a2_dispatcher.rb < weak_typing_chapter_2_input.txt > output.txt
'ssh ssvb@t61 /tmp/a.out' working on case #17
'./a.out' working on case #19
'adb shell /data/local/tmp/a.out' working on case #18
'adb shell /data/local/tmp/a.out' finished case #18 (ans = 148472118)
'adb shell /data/local/tmp/a.out' working on case #16
'adb shell /data/local/tmp/a.out' finished case #16 (ans = 834058048)
'adb shell /data/local/tmp/a.out' working on case #37
'adb shell /data/local/tmp/a.out' finished case #37 (ans = 56032428)
'adb shell /data/local/tmp/a.out' working on case #84
'adb shell /data/local/tmp/a.out' finished case #84 (ans = 58222271)
'adb shell /data/local/tmp/a.out' working on case #52
'adb shell /data/local/tmp/a.out' finished case #52 (ans = 50657009)
'adb shell /data/local/tmp/a.out' working on case #101
'adb shell /data/local/tmp/a.out' finished case #101 (ans = 42545057)
'adb shell /data/local/tmp/a.out' working on case #91
'adb shell /data/local/tmp/a.out' finished case #91 (ans = 56100830)
'adb shell /data/local/tmp/a.out' working on case #112
'adb shell /data/local/tmp/a.out' finished case #112 (ans = 59108176)
'adb shell /data/local/tmp/a.out' working on case #65
'adb shell /data/local/tmp/a.out' finished case #65 (ans = 27529065)
'adb shell /data/local/tmp/a.out' working on case #79
'adb shell /data/local/tmp/a.out' finished case #79 (ans = 41898492)
'adb shell /data/local/tmp/a.out' working on case #67
'adb shell /data/local/tmp/a.out' finished case #67 (ans = 12521059)
'adb shell /data/local/tmp/a.out' working on case #29
'adb shell /data/local/tmp/a.out' finished case #29 (ans = 64753838)
'adb shell /data/local/tmp/a.out' working on case #85
'adb shell /data/local/tmp/a.out' finished case #85 (ans = 7280734)
'adb shell /data/local/tmp/a.out' working on case #50
'adb shell /data/local/tmp/a.out' finished case #50 (ans = 52510928)
'adb shell /data/local/tmp/a.out' working on case #68
'adb shell /data/local/tmp/a.out' finished case #68 (ans = 36776368)
'adb shell /data/local/tmp/a.out' working on case #113
'adb shell /data/local/tmp/a.out' finished case #113 (ans = 31104515)
'adb shell /data/local/tmp/a.out' working on case #96
'adb shell /data/local/tmp/a.out' finished case #96 (ans = 54221446)
'adb shell /data/local/tmp/a.out' working on case #57
'adb shell /data/local/tmp/a.out' finished case #57 (ans = 24358475)
'adb shell /data/local/tmp/a.out' working on case #60
'adb shell /data/local/tmp/a.out' finished case #60 (ans = 19820558)
'adb shell /data/local/tmp/a.out' working on case #41
'adb shell /data/local/tmp/a.out' finished case #41 (ans = 29639004)
'adb shell /data/local/tmp/a.out' working on case #47
'adb shell /data/local/tmp/a.out' finished case #47 (ans = 44802909)
'adb shell /data/local/tmp/a.out' working on case #102
'adb shell /data/local/tmp/a.out' finished case #102 (ans = 15457955)
'adb shell /data/local/tmp/a.out' working on case #51
'adb shell /data/local/tmp/a.out' finished case #51 (ans = 17078071)
'adb shell /data/local/tmp/a.out' working on case #20
'adb shell /data/local/tmp/a.out' finished case #20 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #38
'adb shell /data/local/tmp/a.out' finished case #38 (ans = 17236422)
'adb shell /data/local/tmp/a.out' working on case #82
'adb shell /data/local/tmp/a.out' finished case #82 (ans = 20074945)
'adb shell /data/local/tmp/a.out' working on case #74
'adb shell /data/local/tmp/a.out' finished case #74 (ans = 39100184)
'adb shell /data/local/tmp/a.out' working on case #53
'adb shell /data/local/tmp/a.out' finished case #53 (ans = 33315064)
'adb shell /data/local/tmp/a.out' working on case #31
'adb shell /data/local/tmp/a.out' finished case #31 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #105
'adb shell /data/local/tmp/a.out' finished case #105 (ans = 24581824)
'adb shell /data/local/tmp/a.out' working on case #64
'adb shell /data/local/tmp/a.out' finished case #64 (ans = 6801215)
'adb shell /data/local/tmp/a.out' working on case #71
'adb shell /data/local/tmp/a.out' finished case #71 (ans = 25461827)
'adb shell /data/local/tmp/a.out' working on case #58
'adb shell /data/local/tmp/a.out' finished case #58 (ans = 12450431)
'adb shell /data/local/tmp/a.out' working on case #69
'adb shell /data/local/tmp/a.out' finished case #69 (ans = 18137497)
'adb shell /data/local/tmp/a.out' working on case #43
'adb shell /data/local/tmp/a.out' finished case #43 (ans = 14613562)
'adb shell /data/local/tmp/a.out' working on case #70
'adb shell /data/local/tmp/a.out' finished case #70 (ans = 16770582)
'adb shell /data/local/tmp/a.out' working on case #59
'adb shell /data/local/tmp/a.out' finished case #59 (ans = 10269141)
'adb shell /data/local/tmp/a.out' working on case #95
'adb shell /data/local/tmp/a.out' finished case #95 (ans = 18002252)
'adb shell /data/local/tmp/a.out' working on case #26
'adb shell /data/local/tmp/a.out' finished case #26 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #90
'adb shell /data/local/tmp/a.out' finished case #90 (ans = 2042265)
'adb shell /data/local/tmp/a.out' working on case #23
'adb shell /data/local/tmp/a.out' finished case #23 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #109
'adb shell /data/local/tmp/a.out' finished case #109 (ans = 3227509)
'adb shell /data/local/tmp/a.out' working on case #110
'adb shell /data/local/tmp/a.out' finished case #110 (ans = 17228709)
'adb shell /data/local/tmp/a.out' working on case #72
'adb shell /data/local/tmp/a.out' finished case #72 (ans = 10254945)
'adb shell /data/local/tmp/a.out' working on case #61
'adb shell /data/local/tmp/a.out' finished case #61 (ans = 3391455)
'adb shell /data/local/tmp/a.out' working on case #77
'adb shell /data/local/tmp/a.out' finished case #77 (ans = 6142681)
'adb shell /data/local/tmp/a.out' working on case #116
'adb shell /data/local/tmp/a.out' finished case #116 (ans = 9898960)
'adb shell /data/local/tmp/a.out' working on case #21
'adb shell /data/local/tmp/a.out' finished case #21 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #78
'adb shell /data/local/tmp/a.out' finished case #78 (ans = 10133048)
'adb shell /data/local/tmp/a.out' working on case #75
'adb shell /data/local/tmp/a.out' finished case #75 (ans = 8831706)
'adb shell /data/local/tmp/a.out' working on case #80
'adb shell /data/local/tmp/a.out' finished case #80 (ans = 11639385)
'adb shell /data/local/tmp/a.out' working on case #103
'adb shell /data/local/tmp/a.out' finished case #103 (ans = 10890701)
'adb shell /data/local/tmp/a.out' working on case #32
'adb shell /data/local/tmp/a.out' finished case #32 (ans = 5747095)
'adb shell /data/local/tmp/a.out' working on case #88
'adb shell /data/local/tmp/a.out' finished case #88 (ans = 6963374)
'adb shell /data/local/tmp/a.out' working on case #22
'adb shell /data/local/tmp/a.out' finished case #22 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #98
'adb shell /data/local/tmp/a.out' finished case #98 (ans = 8240529)
'adb shell /data/local/tmp/a.out' working on case #100
'adb shell /data/local/tmp/a.out' finished case #100 (ans = 8808845)
'adb shell /data/local/tmp/a.out' working on case #89
'adb shell /data/local/tmp/a.out' finished case #89 (ans = 1542116)
'adb shell /data/local/tmp/a.out' working on case #63
'adb shell /data/local/tmp/a.out' finished case #63 (ans = 4589254)
'adb shell /data/local/tmp/a.out' working on case #104
'adb shell /data/local/tmp/a.out' finished case #104 (ans = 2110139)
'adb shell /data/local/tmp/a.out' working on case #81
'adb shell /data/local/tmp/a.out' finished case #81 (ans = 4156428)
'adb shell /data/local/tmp/a.out' working on case #48
'adb shell /data/local/tmp/a.out' finished case #48 (ans = 960469)
'adb shell /data/local/tmp/a.out' working on case #24
'adb shell /data/local/tmp/a.out' finished case #24 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #62
'adb shell /data/local/tmp/a.out' finished case #62 (ans = 3086380)
'adb shell /data/local/tmp/a.out' working on case #42
'adb shell /data/local/tmp/a.out' finished case #42 (ans = 330330)
'adb shell /data/local/tmp/a.out' working on case #27
'adb shell /data/local/tmp/a.out' finished case #27 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #25
'adb shell /data/local/tmp/a.out' finished case #25 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #97
'adb shell /data/local/tmp/a.out' finished case #97 (ans = 2311248)
'adb shell /data/local/tmp/a.out' working on case #92
'adb shell /data/local/tmp/a.out' finished case #92 (ans = 3409902)
'adb shell /data/local/tmp/a.out' working on case #87
'adb shell /data/local/tmp/a.out' finished case #87 (ans = 3018037)
'adb shell /data/local/tmp/a.out' working on case #44
'adb shell /data/local/tmp/a.out' finished case #44 (ans = 437384)
'adb shell /data/local/tmp/a.out' working on case #106
'adb shell /data/local/tmp/a.out' finished case #106 (ans = 1645320)
'adb shell /data/local/tmp/a.out' working on case #34
'adb shell /data/local/tmp/a.out' finished case #34 (ans = 1615237)
'adb shell /data/local/tmp/a.out' working on case #111
'adb shell /data/local/tmp/a.out' finished case #111 (ans = 2286510)
'adb shell /data/local/tmp/a.out' working on case #56
'adb shell /data/local/tmp/a.out' finished case #56 (ans = 671796)
'adb shell /data/local/tmp/a.out' working on case #49
'adb shell /data/local/tmp/a.out' finished case #49 (ans = 2486902)
'adb shell /data/local/tmp/a.out' working on case #33
'adb shell /data/local/tmp/a.out' finished case #33 (ans = 1626785)
'adb shell /data/local/tmp/a.out' working on case #73
'adb shell /data/local/tmp/a.out' finished case #73 (ans = 944403)
'adb shell /data/local/tmp/a.out' working on case #46
'adb shell /data/local/tmp/a.out' finished case #46 (ans = 1871972)
'adb shell /data/local/tmp/a.out' working on case #76
'adb shell /data/local/tmp/a.out' finished case #76 (ans = 1036408)
'adb shell /data/local/tmp/a.out' working on case #54
'adb shell /data/local/tmp/a.out' finished case #54 (ans = 1223607)
'adb shell /data/local/tmp/a.out' working on case #55
'adb shell /data/local/tmp/a.out' finished case #55 (ans = 1109920)
'adb shell /data/local/tmp/a.out' working on case #30
'adb shell /data/local/tmp/a.out' finished case #30 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #114
'adb shell /data/local/tmp/a.out' finished case #114 (ans = 697327)
'adb shell /data/local/tmp/a.out' working on case #66
'adb shell /data/local/tmp/a.out' finished case #66 (ans = 907592)
'adb shell /data/local/tmp/a.out' working on case #94
'adb shell /data/local/tmp/a.out' finished case #94 (ans = 778303)
'adb shell /data/local/tmp/a.out' working on case #108
'adb shell /data/local/tmp/a.out' finished case #108 (ans = 316631)
'adb shell /data/local/tmp/a.out' working on case #35
'adb shell /data/local/tmp/a.out' finished case #35 (ans = 207097)
'adb shell /data/local/tmp/a.out' working on case #45
'adb shell /data/local/tmp/a.out' finished case #45 (ans = 729879)
'adb shell /data/local/tmp/a.out' working on case #40
'adb shell /data/local/tmp/a.out' finished case #40 (ans = 220988)
'adb shell /data/local/tmp/a.out' working on case #39
'adb shell /data/local/tmp/a.out' finished case #39 (ans = 154206)
'adb shell /data/local/tmp/a.out' working on case #86
'adb shell /data/local/tmp/a.out' finished case #86 (ans = 230182)
'adb shell /data/local/tmp/a.out' working on case #107
'adb shell /data/local/tmp/a.out' finished case #107 (ans = 132748)
'adb shell /data/local/tmp/a.out' working on case #99
'adb shell /data/local/tmp/a.out' finished case #99 (ans = 274610)
'adb shell /data/local/tmp/a.out' working on case #36
'adb shell /data/local/tmp/a.out' finished case #36 (ans = 107906)
'adb shell /data/local/tmp/a.out' working on case #83
'adb shell /data/local/tmp/a.out' finished case #83 (ans = 32833)
'adb shell /data/local/tmp/a.out' working on case #115
'adb shell /data/local/tmp/a.out' finished case #115 (ans = 16016)
'adb shell /data/local/tmp/a.out' working on case #93
'adb shell /data/local/tmp/a.out' finished case #93 (ans = 3280)
'adb shell /data/local/tmp/a.out' working on case #28
'adb shell /data/local/tmp/a.out' finished case #28 (ans = 1390)
'adb shell /data/local/tmp/a.out' working on case #5
'adb shell /data/local/tmp/a.out' finished case #5 (ans = 146)
'adb shell /data/local/tmp/a.out' working on case #10
'adb shell /data/local/tmp/a.out' finished case #10 (ans = 188)
'adb shell /data/local/tmp/a.out' working on case #4
'adb shell /data/local/tmp/a.out' finished case #4 (ans = 36)
'adb shell /data/local/tmp/a.out' working on case #9
'adb shell /data/local/tmp/a.out' finished case #9 (ans = 32)
'adb shell /data/local/tmp/a.out' working on case #11
'adb shell /data/local/tmp/a.out' finished case #11 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #12
'adb shell /data/local/tmp/a.out' finished case #12 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #13
'adb shell /data/local/tmp/a.out' finished case #13 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #14
'adb shell /data/local/tmp/a.out' finished case #14 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #15
'adb shell /data/local/tmp/a.out' finished case #15 (ans = 165)
'adb shell /data/local/tmp/a.out' working on case #3
'adb shell /data/local/tmp/a.out' finished case #3 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #8
'adb shell /data/local/tmp/a.out' finished case #8 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #2
'adb shell /data/local/tmp/a.out' finished case #2 (ans = 1)
'adb shell /data/local/tmp/a.out' working on case #7
'adb shell /data/local/tmp/a.out' finished case #7 (ans = 2)
'adb shell /data/local/tmp/a.out' working on case #1
'adb shell /data/local/tmp/a.out' finished case #1 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #6
'adb shell /data/local/tmp/a.out' finished case #6 (ans = 0)
'adb shell /data/local/tmp/a.out' working on case #17
'./a.out' finished case #19 (ans = 553484346)
'./a.out' working on case #17
'./a.out' finished case #17 (ans = 735866676)
== Workers contribution: ==
'./a.out' : 66.68% (2 cases)
'adb shell /data/local/tmp/a.out' : 33.32% (114 cases)
'ssh ssvb@t61 /tmp/a.out' : 0.00% (0 cases)
real 4m45.626s
user 38m2.469s
sys 0m0.111s
Now the time is down to 4m45.626s and this is within the 6 minutes limit. Assigning jobs to different devices could be surely improved. Right now I'm just sorting them by the lenghts of strings and processing more difficult cases first. The old laptop was totally useless and couldn't handle even a single big testcase #17. The other devices came to the rescue in the end and finished it faster.
TL;DR;
Facebook Hacker Cup has an unusual format. The contestants have 6 minutes to run their solutions on their own computers. This provides a lot of opportunities to speed up even poor time complexity solutions by taking advantage of multiple CPU cores and multiple computers. OpenMP is easy to use.
To be continued
This is not the end, but the blog is already becoming too large and it's necessary to wrap it up. The next part will feature Clang vs. GCC, different compiler optimization options and taking advantage of SSE2/AVX2.
It's possible to process the true worst case (5 x 800000) using just a single thread of an Intel Core i7-860 processor in less than 6 minutes by a vectorized $$$O(N^2)$$$ solution.