[2015 day 9][Rust] Solving TSP faster than permutations
2015 day 9, day 13, and 2016 day 24 are all problems involving a variation of finding the longest/shortest path/cycle with 8 nodes that form a complete graph (every node has a distance to every other), also known as the Travelling Salesperson Problem (TSP). This is an NP-complete problem: there is no general-purpose polynomial-time algorithm known to solve it, and if you can come up with one, YOU will be rich for having proven P=NP; alternatively, if you can definitively prove the counter-conjecture P!=NP you will still be famous, even if the result is a letdown.
So the best you can do is either using an exponential-time algorithm, or find specialized algorithms for certain subclasses of the TSP problem (for example, relying on sparseness for pruning, or being satisfied with an approximate answer without proving it is the global answer). The saving grace for all three of these puzzles is that n=8 is small enough for the global answer to be easily tractable with an exponential-time algorithm (contrast that to things like 2019 day 18 where finding the shortest path with n=26 is prohibitively expensive if you were to attempt to do naive exponential searching).
That said, MOST solutions in the megathreads for these days just whip out a permutation engine (either one built into your language, or rolling your own, with Heap's algorithm being a common choice), performing O(n!) work to just check every possible path, or solve by recursion (where your call stack does the same factorial effort as a permutation engine). The naive approach says that once you have a working permutation engine, you then test 8! paths, or 40,320 different variations - still lightning fast for modern computers and programming languages. And if you get your star in less than a second, do you really need to try anything else?
But I wouldn't be writing this post if that were the best you could do. The first major optimization is to realize that if you just visit 7! permutations, you can still check all cycles with 8 elements by connecting your last item with your first, and turn that into a check of the longest path by creating the cycle and then tossing out the shortest edge. This adds complexity to each permutation checked (you are now doing 8 times as much work per permutation to find the shortest edge that will need to be tossed), but reduces the number of permutations visited to 5,040, and the work to find a shortest edge is probably more localized and less overhead than the work being done in your permutation engine to advance to the next permutation, so it is an overall win.
The next observation is that when your graph is undirected (the cost from A to B is the same as the cost from B to A - true for all 3 of these days), you can again cut the work in half if you avoid a permutation that is lexically reversed from an earlier permutation you already visited (path a->b->c->d will have the same weight as path d->c->b->a). Heap's algorithm does NOT make this easy, but there are other permutation engines that DO make it easy to stop iterating after just half the permutations, avoiding all lexical reverses in the resulting output. Among them is Steinhaus-Johnson-Trotter, which I implemented in Rust to get 2015 day 9 down to 2,520 permutations visited. But remember, even that is still doing 8 checks per permutation to cast out the shortest edge when looking for the longest path.
So today, I finished up a patch that goes for an entirely different approach. The Held-Karp algorithm solves TSP in O(n^(2)*2^(n)) - still exponential, but better than the super-exponential O(n!) that other solutions were using. It is based on a dynamic programming technique of using a table of n * 2^(n) distances; each distance g(set,k) represents the set of nodes in a path from the chosen start point, and a node k that is an element of that set and the last node visited along the path so far. When set is a singleton, the distance is obvious: the distance from your start point to that node. For any larger set, there are O(n) elements in the set, and so you need to generate O(n) distances to be stored as n values of g(set,k); computing any one of those distances is also O(n) work to find which out of n-1 g(set\k,m) for another point m!=k in the set added to d(m,k) produces the best path one element longer. The dynamic programming aspect says that if you iterate from 0 to 2^(n)-1, you will visit every possible set size, and all recursively-smaller sets that you depend on will already be populated by the time you need to reference them in building your larger set values. So even though the algorithm is defined by a recursive relationship, it is implemented with straightforward looping.
I was impressed that the result was still fairly legible. Below is my implementation for 2015 day 9. It produces 8*7/2 comparisons (the O(n^(2)) each pairing of bit m with k) /2 (even though there are 8*256 bits across all sets, on average half of them are not set) * 256 sets (the O(2^(n)) number of sets visited), or 3,584 total comparisons. That's more than the 2,520 permutations of the 7!/2 approach, but remember, the permutation approach required an additional 8 comparisons per permutation to find the edge to discard, whereas Held-Karp does not. Plus, permutations requires the overhead of moving to the next permutation (often work hidden behind your library interface), while set iteration is lighter-weight (admittedly, this code uses maneatingape's biterator() function to encode access the next bit index in a bitmask). Timing-wise, my patch sped up the solution from 40µs to 12µs on my laptop.
// Initialize a table for each part: 2ⁿ sets with n distances per set. Default 0 matches
// g({k},k) of zero for all singleton sets. Initial value of other sets does not matter.
let mut table_one = vec![0_u16; stride * (1 << stride)];
let mut table_two = vec![0_u16; stride * (1 << stride)];
// Visit each non-empty set in order, with no work to do for singleton sets.
for set in 3_usize..(1 << stride) {
if set & !(set - 1) == set {
continue;
}
// For a given set, compute each g(set,k) for all k in the set.
for k in set.biterator() {
let subset = set ^ (1 << k);
let mut shortest = u16::MAX;
let mut longest = 0;
// For a given destination k, find which other bit m gives the best path from the
// subset to m, and then m to k. All table[subset] references were filled in prior
// iterations of the outer loop or the singleton base cases.
for m in subset.biterator() {
shortest = shortest.min(table_one[subset * stride + m] + distances[m * stride + k]);
longest = longest.max(table_two[subset * stride + m] + distances[m * stride + k]);
}
table_one[set * stride + k] = shortest;
table_two[set * stride + k] = longest;
}
}
// With the sets now built, we have stride candidates for each answer.
(
*table_one.iter().skip(((1 << stride) - 1) * stride).min().unwrap(),
*table_two.iter().skip(((1 << stride) - 1) * stride).max().unwrap(),
)
2016 day 24 is even better. Days 9 and 13 must visit 255 sets (n=8) to ensure that you find the right path regardless of which point started the path. That is because Held-Karp forces you to pick a starting node; but while an arbitrary starting node will always be part of the longest cycle, it is easy enough to generate a graph where the longest path is not part of the longest cycle if you picked the wrong starting node (the 8! down to 7! optimization for permutations did not have that problem, because there you are casting out the short edge from every possible path, rather than just the one longest cycle). To find the longest path among 8 nodes, you can either run 8 iterations of Held-Karp on sets up to 127, or more efficiently do 1 iteration of Held-Karp on sets up to 255. But day 24 explicitly pins the path to start at node 0, so you only need to visit 127 sets. For that day, being able to use n=7 lets the number of comparisons drops to 1344 (=7*6/2*128/2), which is hands-down better than 7!/2 permutations.