The submission time is over for VeeRoute Marathon. Let us discuss the contest — the problem itself and related stuff — while the final testing takes place. Which solutions work best? Were the problem statement and materials sufficient to start solving, or something was lacking? What tools did you implement yourself?
I used a greedy approach to solve the problem, first sort all customers according to time, and then for each customer, find out the driver which gives the lowest 'cost' to drive the customers to their destination.
Here, the cost is made of four parts,
If the driver haven't been used before, the cost will be something in terms of base
The extra distance the driver need to drive relative to the ideal case
The extra time used by the driver in driving relative to the ideal case (base/32400 each second)
The waiting time when the driver has nothing to do (base/43200 each second)
And finally, I get around 8400 for the ten cases, ~600 lines of code: 16703064
I wonder how the solutions that get 9000 or above look like. 0.0
orz #adore
Here is my final submission : http://lpaste.net/8778658654736875520
Score on the test cases :
I used the lemon library (link) to compute minimum weight maximum matchings and minimum weight maximum bipartite matchings. (i had to reduce the lemon sources to get it to fit in a 64kb zip).
The main idea of my solution is that the right representation we want to work with is a partition of the orders into chains, as a min cost max bipartite matching can then match these chains with the drivers (function tryChains). To know if a chain is admissible (function canDoChains), I ensure that there exists a garage with at least 3 drivers able to do this chain of orders, (in practice all chains can then be matched in most cases).
A possible algorithm would be to try local optimization on a small number (2-4) of chains (swap orders between chains, or go from AAABBB CCCDDD to AAADDD BBBCCC, etc.), but it is better to optimize all chains at the same time. To do that, I split each chain (function splitChains) in two parts randomly and compute (function mergeChains) a min cost maximum matching where the nodes are the chains and edges are possible ways to merge the chains. Because the matching before the splitting of the chains is still admissible the cost can only decrease. We need to optimize both the number of chains and the total distance, so the cost depends on the time and the distance at the beginning of the algorithm as we want to reduce the number of chains, and only on the distance at the end.
That's the basic idea, but the fact we can do 2 orders at the same time makes it more complex. I compute in an initialization step a good set of merged orders, again with a min cost max matching and a suitable cost. However we may want to change this set. To do so, when merging chains, we compute several new chains with different ordering of the inner orders (function orderChain::merge). Also when we split chain, we can split inside a merged order. I also do a local optimization pass (function optimizeDoubles) to optimize this further.
I can do between 300 and 10000 (split, merge)-iterations in 30 sec depending on the test case.
I had roughly similar ideas, but didn't have time to implement the split-merge optimization. I did have the pairwise optimization, though. Also, "split each chain randomly" means pick a point in time where to split the chains, or something else?
How exactly did you choose the initial chains? I processed the (unpaired) orders by time and then tried to pair the current order with another order and a driver so that the total added distance is minimum — so basically greedy.
When will we have the large set of tests' verdict ?
Looks like large tests are beginning now. Lots of problemset submissions have been queued and I'm guessing this is because of VeeRoute. You can see when the systests started (I think) here. They will take a while though: 30 seconds per test * 500 tests = 15,000 seconds or a little over 4 hours.
If you care, Lost's final is not dream.
Edit #1: Maybe I should move it to a separate blog post, considering the amount of space it takes? :)
Edit #2: Link to solution: 16709816
Provisional Score: 9863.098
Summary:
Simple greedy for initialization, SA #1 for minimizing number of drivers, SA #2 for minimizing the total distance (which equals to optimizing the final score).
Intro:
Originally, I decided to participate because I almost haven't done any coding in 2016. And whenever I did it, I spent extraordinary amount of time on debugging. Since this contest was very heavy on implementation, It looked like a good excuse to exercise a bit. My goal was to just implement all of the "basic" ideas, do a little bit tweaking and be done with it. But the problem turned out to be so complex (mostly due to not forcing triangle inequality), that my solution is longer than in most TC's Marathons, even though it's not very polished.
Details:
My greedy phase can almost be skipped, because later SA phases are good enough to start from a random (or empty) state. I left it only, because I didn't want to add another round of bug hunting.
SA has 2 phases and the main difference is that they use a different scoring function. The first one tries to optimize the number of drivers and the second tries to maximize the score. The reason for two phases is that improving the scores in small steps, will try to split orders evenly among the drivers. This means that in order to get rid of single driver, local search has to perform a lot of "up the hill" moves in a row, which most probably won't happen.
There were two workarounds for this. Either create a very complex transition function that tries to get rid off a single driver entirely by splitting/merging existing chains of orders (something along what Rafbill did), or create alternative scoring function that will achieve the same goal using simple (and in my case, already implemented) set of transitions.
The scoring function for first phase is as simple as it gets:
pow(number_of_orders * X, Y) * Z
, where X, Y & Z are some constants. When Y is smaller than 1.0 (in my case it's 0.3), the scoring function increases exponentially when reducing the number of orders for one driver. In practice, this will just try to reduce the number of orders from drivers that already have very few of them.State Representation:
Despite complex statement, representing states is surprisingly easy. We either try to complete a single order at the time, or double (two different people, that either share arrival or destination point). This means, that it's enough to just remember the list of orders for each driver.
Moves (Transitions):
I decided to keep only simple types of moves. The reason is that SA works pretty bad with moves of different complexity (and thus, the range of returned scores).
All of them, except for moveSplitOrders() are pretty important. Most of the work is done by moveSwapOrders() and movePushOrders(), but other moves are important for escaping local extremum and increasing the number of double orders.
Important Optimizations (in no particular order):
Random Stuff:
And finally, thanks to Gassa, KOTEHOK and other people behind organizing this contest. This was an interesting problem (I don't think I ever had more than one SA phase in my code), and I was really impressed that there were no bugs in the tester considering the complexity of the problem. My only small hint would be to have something more approachable for beginners ;)
Sorry about stupid questions, but what is SA and HC? And what was Phase1 SA and what was Phase 2 SA in your solution?
SA means simulated annealing and HC means hill climbing I think.
Thanks, but questions about what is Phase 1 and Phase 2 still remain
That's just me, not sticking to the chosen terminology. By saying "SA has 2 phases" I just meant those two separate runs of SA. In practice, they are completely separate: The first one runs for roughly 15 secs and then the other one. But since the only major difference is the scoring fiction (and the temperature schedule), it's easy to think about them in terms of two phases.
I wonder if someone else like me didn't note that seed is an unsigned int instead of an int.
The judging program seems treat the number as an unsigned int and using scanf and printf with %d is still considered correct while using cin and cout will cause WA.
And I am just asking, will you consider to make any adjustment on the seed so that it fits the int type? It is sad to lose around half of the points because of this.
Anyway, need to look carefully on the input format next time :(
Lol, I haven't noticed it too :)
Looking at the codes from others, almost no one is using unsigned int.
I guess people overlooked it, because it's a very strange requirement. Also, because the partial tests don't check that. I remember that I thought something like "yeah whatever, probably it's an old workaround for something that they fixed already, and they don't even read this number" :)
The intent for having the seed printed was to open more possibilities for the tooling.
The seed is included in the input to enable re-generating the test instead of reading it — which is generally faster. Reading a whole 40M file may consume a second or two unless you specifically optimize for input speed. It is not too important for a solution which has 30 seconds to run, but more so for the checking program or other tools which are otherwise fast.
The seed is included in the output to enable using the output without the input.
The seed is included in the output to enable using the output without the input.
But, in order to tell that printed seed is wrong you have to know the original seed beforehand? :)
Or do you just generate whole test and look for any wrong data (too many drivers/orders, etc.)?
I mean if you have a local tool which re-generates the input based on the seed, you don't have to even store whole inputs.
For example, the checking program reads the seed from the input file, and then it has two possibilities:
Read the whole input.
Generate the input based on the seed.
To put it another way: locally, I've used one-liner input files containing just the seed. It saves disk space and disk access time.
Edit: Of course all of this makes sense if we don't consider any custom inputs and only use generated inputs. However, test generator specification is the part of the problem statement anyway, so there's little use in solving tests of other kinds.
My comment was about the server testing code, because I assumed that the line you wrote was about it. I just found it funny that we have to output seed, but the tester knows it anyway. In any case, that was slight misunderstanding on my part.
Otherwise, I understand the idea. I think it's pretty neat, but I wonder how many people used it. In my case adding faster I/O (reading whole lines and parsing from memory) took maybe 2-3 minutes and cut down input reading time to around 0.5s worst case on my non-SSD system.
I'm using Long Long for Seed, because read problem statement carefully :)) And I don't think that I'm the only one person who correctly process such situation.
We are aware of the problem, and will discuss if we can restart the testing using only seeds that fit into [0, 231). I'd expect the decision to be made in 12 hours, but unfortunately, not right now.
I think if you give for us a code of tests generator, some participant could make some evaluation of quality of his solution based on tests where 2^32 > Seed >= 2^31, and make some decision of what parameters to use for his optimization methods. Also his solution could work's better for such testcases, for example, and that's why I think, that the restrictions for Seed should be remained as "< 2^32", as was in problem statement.
P.S. All participants can re-read a problem statement during 2 weeks, how many times you can do it in such time term (more than several thousands :))), so such restriction for Seed should be read too, and it is only problems of someones carelessness, other participants, which are correctly process such case should not be impacted.
I used unsigned, but if most of top participants fail on such stupid bug it will make competition less interesting. So I think jury need to change testset (but please don't rejudge it all again, just remove some tests)
The final decision is to use the reduced interval [0, 231) for choosing seeds.
We considered the following points in favor of this decision:
The seed line does not relate to the problem domain in any way.
Seeds not fitting into signed int32 were not present in the preliminary test set, while most other aspects of the problem were covered.
For some extra confusion, all other input fitted into signed int32 quite well.
While we recognize that careful reading of problem statement is an important skill, in this case, correct behavior also depends on the choice of programming language and other factors.
I've published the current version of checking program on Pastebin. The checker uses testlib.h. Please report if you find anything wrong with it!
What decision about Seed range is finally?
I wonder what is the meaning of the verdict JUDGEMENT FAILED?
The platform is not very comfortable yet with testing Marathon-type problems, so issues happen. Don't worry, we'll rejudge all such cases.
Psyho Great job!
Thanks!
What will you choose? iPhone or Nexus.
Huh, asking the important questions.
This weekend is Google Code Hash and I'm blindly guessing that they will give out Nexuses for prizes. Not that it matters since I'm already drowning in gadgets ;)
My approach consisted of constructing a time-expanded network. Basically, each node is a (time,place) pair and there are arcs from (t1,p1) to (t2,p2) if an order or a pair of orders can be satisfied starting from time t1, place p1, moving to do the pickup(s), then doing the dropoff(s). So (t2,p2) was always the time and place of a drop-off. In order to keep the size of this network feasible: - I rounded up all times to multiples of 30 seconds (so, essentially, there were much fewer distinct times) - I introduced "waiting" nodes and arcs in order to avoid creating too many arcs (otherwise there would have been an arc from the end of each order/pair of orders to the beginning of each potential subsequent order/pair of orders) - I also "collapsed" all the arcs with fixed starting times (e.g. pickup from the airport) into one - I limited the number of 2-order arcs to a small-ish number if the test was large (keeping only those with the best "cost", where cost was the difference in travelled distance compared to satisfying the orders independently)
The number of nodes in this network was in the order of 10s of thousands and the number of arcs in the order of millions (up to 19-20 millions arcs on larger tests). Still, I easily exceeded the 512 MB memory limit, which was my biggest source of pain. I had to remove fields from my structs and otherwise optimize things in ways I didn't necessarily intend, just to squeeze my solution under the memory limit.
Then I ran a greedy solution using this time-expanded network. At each step I considered each starting time of a driver (at most 13 different values) and tried to compute a path through this graph containing only arcs with yet unsatisfied orders, such that the average cost per order is as low as possible (i.e. trying to minimize (base penalty + total distance) divided by number of satisfied orders). Of course, I didn't do this optimally (that would have required me to at least keep the best cost for each number of orders — not that large, since I noticed drivers couldn't satisfy more than about 30 orders), but heuristically somehow. I also had to avoid picking up the same order twice (for orders where the travel time is very low, the time-expanded network, despite being a DAG, may contain paths with arcs handling the same order at different starting times).
I also had to make this path computation somewhat independent of the destination garage, since it was too time-consuming to try it out for every (starting time, starting garage) pair. So I just guaranteed that the path can end in some garage with drivers still available at that garage at the given starting time, at most 12 hours after the starting time and with at most 9 hours of travel time.
I noticed that the initial drivers were picking up quite a lot of orders (20+), but as I added new drivers, they were less and less efficient (having even many drivers which only picked up 1 or 2 orders).
This is what I did in a few days in the first week of the contest and gave me about 8374 points (if I remember correctly). In the 2nd week I spent almost no time on this problem, because I participated in another contest (the Codechef monthly long contest), so I only added a step which, while time permits, clears the schedules of a few randomly selected drivers (at least 10% of the used drivers and at least 5% of the satisfied orders) and tries to rebuild the schedule for them by considering the arcs in some order (depending on total distance travelled, but giving preference to arcs with 2 orders) and inserting them in the schedule of a driver where it adds the least additional cost (as long as all the other constraints are still satisfied). This gave me +150 points (so about 8524 points in total).
I have a question for the other competitors, though. I just looked at my results on the 500 test cases now and I was surprised to notice that my solution very rarely ran up to the time limit I set for it (which was 29 seconds). I saw many cases where it ran even less than 20 seconds (sometimes as low as 14 seconds). So, unless I have some stupid bug in the code, I'm now wondering if I measured time incorrectly. I used "gettimeofday", which is what I use (and need to use) in Topcoder Marathon matches. But this measures the actual "real" time, while the time spent running my solution may be less. Was the 30 seconds time limit imposed on just the time spent in my code? (e.g. if multiple solutions were run on the same machine and my solution was sometimes being preempted, then in a physical interval of 30 seconds, my solution could have actually only run much less than that, e.g. 20 seconds only). Was this the case? And if so, was this mentioned anywhere? (I am wondering if I just missed this, which seems to be a pretty important thing)
What did you use for measuring time here on Codeforces?
About the time questions, look at the last announcement in the contest page.
Thanks, I hand't noticed it (as I said, I kind of stopped working on the problem after the first week, and this announcement was posted one day before the end of the contest?). Anyway, it only cost me one place in the final standings, due to my solution not using the full amount of time that it should have, so it's no big deal. But it would have been nice to find out such things much earlier.
Take a look at the last announcement. I had the same issues with using gettimeofday()
It was a pleasure to participate. Thanks to organizers. Wish Codeforces have more marathons.