Hello everyone,
Since the practice round doesn't seem to provide a scoreboard, I'd like to open this thread to discuss scores and possible approaches for the problem. We got the following scores after some ideas:
A: 2
B: 5
C: 5
D: 1802
E: 2083
Total: 3897
We also tryied to calculate an upper bound for some of the tasks:
D upper bound: 1900
E upper bound: 2804 (probably can be significantly improved)
I will post my ideas in the comments. Did anyone managed to get a better sccore? or had an insight to the problem?
Auto comment: topic has been updated by kingmoshe (previous revision, new revision, compare).
Auto comment: topic has been updated by kingmoshe (previous revision, new revision, compare).
Upper bound explanation(I worked on the problem with kingmoshe):
First we need to explain the graph of clients. We represent each client as a vertex in a graph. Two clients or vertexs has an edge between them if there is no pizza that can satisfy both clients(for exmple one cllients like cheese and the other doesnt like cheese). It is easy to see that if you take a group of vertexs with no edges between each two vertex, then there is a pizza that satisfy does clients vertexes. Also it is very easy to see, that no pizza could satisfy 2 vertexes with an edge between them.
Therefore we can make a reduction to an max anti-clique problem(which is NP).
We did the upper bound by: 1. Search for the biggest clique that we can find. 2. We deduce that only one vertex from the clique could be chosen. 3. Remove the clique from the clients and start again. Then the number of cliques we find is a good upper bound.
Where in d we found 1900 cliques. And in e we found 2804 cliques.
This is approach for a good upper bound that could obviously be reduced more.
Static solution for e that reached 2025 points(to get to 2078 we did a lot of dynamic approach of approving a given output). 1. Calculate the degree of each vertex. 2. Chosen the vertex of the lowest degree. 3. Remove the vertex and his neighbors of the graph. 4. go to step 1.
How did you improve the score from 2025 to 2078?
We had two main *dynamic improvments
1) We allready have a function for finding a large anti-clique so lets try the next improvment: a) take randomly half of you anti-clique b) remove from the graph all nodes in that half and all of their neighbor c) calculate the large anti-clique of the remaining graph d) if the new anti-clique plus the old half is larger than the old anti-clique keep it, otherwise remain with old anti-clique e) go back to part a for 1000 times
2) lets call the selected anti-clique C, and the rest of the graph G a) create a new list, lets call it K b) pick at random a node from G that has a minimal amount of neighbors in C, and add it to K c) remove that node, and any of its neighbors (in G) from G d) remove all of the node neighbors from C e) if the amount of nodes in K is larger than the amount of nodes removed from C than K is better than the removed nodes in C ( and we increased the solution) f) repeat those steps 1000 times
both of this improvments helped allot, but using both of them is what really improved, also after trieng a little bit more random, we manged to increase our score to 2803
how the function that calculate large anti-clique works ?
its already an NP problem
basicly, every time take the node with the least amount of neighbors and after taking it delete it's neigbors, if two nodes exists with same amount of nodes, then look at the degrees of its neighbors, you want does degrees to be as big as posible (or more precisley we maximized on the minum degree of the neighbor, the idea is that a neighbor with a small degree as a good chance to be picked later), and of course we added some random along the line, so we could run the process many times and get different outcomes.
hey so I am a beginner and I have absolutely no idea what we have to do so if you don't mind could you explain? I am a student from first year so I don't know dsa or dynamic programming yet.
Hey, no dynamic programming or complicated dsa required yet (although in the qualification round some may take advantage of it) but little knowledge on graph might help.
First, if it's your question and you are interested, if you didn't already do it find a team of between 2 and 4 people (including you) and register for google Hash code.
Second, you will be able to submit to the training problem which I explain below in spoiler in order that the post appears shorter.
You have to design a unique pizza that satisfy the most people possible. Someone A is satisfied if you put all the ingredient you like and none of which he dislikes, there is many ingredient he neither like nor dislike, regarding A you can do whatever you like about those. every people like L between 1 and 5 ingredients and dislike D between 0 and 5 ingredient. reread together with the subject it should become clear. I explain below how to read the dataset, compute a score and output a result in python2 (I recommend pypy for speed) although not the optimal way. It uses dictionnaries which you will gain to learn about, but try to do it yourself
from future import print_function from random import randint
I don't know how to add tabulation
n=int(input()) description=[] for k in range(n): .description.append((raw_input().split()[1:],raw_input().split()[1:])) choix={} for a,b in description: .for k in a: ..choix[k]=randint(0,1) .for k in b: ..choix[k]=randint(0,1) def score(): .sc=0 .for a,b in description: ..satisfied=True#supposed satisfied until a contradiction ..for k in a: ...if choix[k]==0: ....satisfied=False ..for k in b: ...if choix[k]==1: ....satisfied=False ..if satisfied: ...sc+=1 .return sc print(sum(choix[k] for k in choix),"".join([k for k in choix if choix[k]])
this code can become more efficient but I wanted it to be understandable
you can in particular try to convert ingredient in integer in order to make
the internal process faster and convert back at the end for the output
try to think to a better strategy alone, if you fail read the following for
explanation
first hint : think greedy (take what seems locally optimal even if it's not part of a globally optimal solution,it's optimisation, it's hard and often impossible to find and prove the true optimal second hint : randomize, repeat and take the best third hint : think graph (although not part of all strategy a first strategy (explained below by somebody else) is to keep two python sets of ingredient accepted and refused and take the people from the less annoying (minimal L+D) to most (maximal L+D) see (you can do this with a heap although it's an overkill). For the person you look at if you have already refused something he like or accepted something he dislike then you won't satisfy him. Else you decide to satisfy him in which case you add or keep in accepted everything he likes and the same for refused and everything he dislikes. An improvement to this algorithm is to decrement the value of the annoyance of A when an ingredient that A like or dislike is added because of B liking it as well, but I won't explain how to implement it. by running it several and taking the best one you should achieve something like 1830 on dataset E.
previous stategy looked at the people. Why not look at the ingredient. For each ingredient store how many times it's like and dislike if it's more liked than disliked take it so you achieve 1600 on dataset D. To improve on that replace +1 and -1 by a function f of L and D (for instance 1/(L+D)), to randomize replace f by a random function of L and D but fixed on an iteration. so I achieved 1749 on D. an improvement is to keep a part of the best solution found so far and improve only the rest and keep the change only if it's an improvement (very near from opti 1 of kingmoshe) but I won't tell my hyperparameters I achieved so 1798 while the best possible according to Rhodoks is 1805
Far more technical, but more beautiful and effective with it I achieved 2081 on dataset E Consider the people two by two. A is incompatible with B if A dislike something B like or the reverse. Let's build a graph (a graph is a set of points (nodes) with link between them (edges)which can have weight but that's not the case here) Nodes are people and edges are drawn between incompatible people. The problem is exactly finding most people possible without two having an edge between them that's what we call a maximal anticlique. Again think greedily go from the least to most annoying (and use a heap), people with smaller degree (number of neighbors) first and remove people B neighbor of already selected people A and in turn decrease degree of neighbors C of B. You should reach 2000-2040 to optimize keep a part of the best solution so far and reapply algorithm on the remaining graph. second optimization : look at unselected people A with only one selected neighbor B you can swap A and B and keep unchanged the number of selected people but B might have a neighbor C which can be now selectable or unblocked for exchange with D which in turn might have a freeer neighbor E, that's near augmenting path something you'll see with flow algorithm
I wrote 500 lines by trying this problem Hope it'll help you and that I did not too much state the obvious, don't hesitate if you have any further question which are not complete solution of the problem
what a great explanation!
Insighets on the problem. Test case e biggest clique is only 3. And after some removel of clique of size 3(~60), the biggest clique is of size 2. Also there are 68 clients of degree 0 which should allways be chosen.
A: 2
B: 5
C: 5
D: 1804
E: 2083
Total: 3899
cool, what was your solution?
Update: D is 1805 points now.
Nothing interesting. A lot of random swaps in 10 threads for ~30 minutes
A — 2 b- 5 C — 4 D — 1697 E — 799 total — 2507
The optimal solution of D is 1805, which can be calculated using Binary Linear Programming.
However, E is too large. It cost me about 1h to solve D but I failed to solve E with a night.
How do you calculate it with binary linear programming? And can you calculate this for c and e as well?
$$$choose[i]$$$: if we choose ingredient $$$i$$$.
$$$satistied[i]$$$: if person $$$i$$$'s conditions are satisfied.
maximize $$$\sum satistied[j]$$$
s.t. $$$\sum_{j~\text{like}~i} choose[i] + \sum_{j~\text{dislike}~i}~(1-choose[i]) \geq cnt_j \times satisfied[j]$$$ for $$$j = 1...n$$$, $$$cnt_j$$$ is number of $$$j$$$'s like and dislike.
when $$$j$$$ is served($$$satisfied[j]=1$$$), that means $$$\sum_{j~\text{like}~i} choose[i] + \sum_{j~\text{dislike}~i}~(1-choose[i]) \geq cnt_j$$$ so for $$$j~\text{like}~i,choose[i]=1$$$,for $$$j~\text{dislike}~i,choose[i]=0$$$ must hold.
C's answer is 5. E has too many ingredients so it will consume too much time...
How did you specifically solved it ? I mean, in BIP you cannot drop some constraints as some of your constraints are going to be unsatisfied (we can't serve all the clients). And also, how did you calculate your objective in terms of variables ? Could you please give us more details about your approach ?
Sorry, there are some mistakes in my formula. I have fixed it and added some explanation. Thank you very much.
Our Scores: 2, 5, 5, 1687, 1849.
Our Approach: Randomized Greedy.
Our scores: A: 2 B: 5 C: 5 D: 1805 E: 2085 Total: 3902
We first did a greedy graph algorithm for finding an approximate independent set (picking minimum degree node). Then we iteratively improved the solution with simulated annealing (basically random swapping and checking if the result becomes better). It found the solution of 2084 in E in 10 minutes, while A through D were found in a minute. During a testrun of the code, we scored 2085 in E, but we didn't implement writing the best solutions to a file, so this solution is lost forever.
Edit: Found another 2085 solution and saved the solution this time.
How do you do swaps? Randomly picking a vertex you selected and one you did not select probably will give you too often a combination that is invalid?
For the swaps we actually did two different things:
We did a 50/50 split between these two kinds of swaps, which seemed to get the best result.
A: 2 B: 5 C: 5 D: 1805 E: 2047
Total: 3864
My hashcode team required one more person. If anyone wants to participate google hashcode but till now not getting any team. So DM me!
I accidentally misread the statement and thought of a variant of the problem where one person likes the pizza if it contains at least one of its favourite ingredients. Is this version easier or has an interesting solution?
Looking for 1-2 last-minute replacements who can actually participate during the timeslot (apparently, timezones are hard). Python preferred but ultimately optional, visualization skills a plus! Got a score of 8,926,023 for last year's qualifying round which is definitely not a brag, but maybe that can speak to expectations (upper bound on stress level, let's say). Get in touch however you like, I will edit/update here if/when the spots are filled.
Care to explain how?
If you're familiar with the problem (archived on kaggle?), then you'll know that despite the number of digits, it's a fairly minimal score. I'd have more useful to say about what not to do...
Swung a bit too far the opposite way this year, reached out early to people I knew, and welp...
Thank you!