I benchmarked Cartesian product implementations in Haskell, then compared them with C
I wrote a small article around implementing Cartesian products, starting from Haskell’s sequence.
The article goes through a naive Haskell implementation, a more idiomatic list-comprehension version, native sequence, then versions using Data.Vector.Unboxed, mutable vectors, runST, unsafeFreeze to try a different memory representation.
The second half compares those designs with C implementations, mostly to look at what changes when the memory layout and allocation model are made explicit.
The most interesting result for me was that changing representation in Haskell reduced allocations a lot without automatically improving runtime. In some cases, fusion helped a bit (no temporary indices).
I’d be happy to get feedback on the Haskell side, especially the vector/ST implementations and whether there are more idiomatic or faster ways to express this.
Here is the article link:
If you want to share any optimizations, you can do a PR at this repo:
https://github.com/julienlargetpiet/PerfLabs
It will be mentioned in the next article update.
PS: We now managed to find (currently) best version which is this one (in C) running at 160ms for 100k iterations to 5^5 lists here:
https://github.com/julienlargetpiet/PerfLabs/blob/main/CartesianProduct/contrib2/contrib2.c
I'm updating after trying some new Haskell impl, thanks everyone for the help and the intuition on where to dig !