u/EducationalTackle819 — reddlx

I had a hot path in .NET where ~90% of CPU time was being spent inside the Levenshtein distance function. I looked at the fastest implementations of Levenshtein and found that the fastest, Myers Bit Parallel (MBP), had never been ported to C#. So I did so, and added a few more optimizations that really sped up my workflow. Note: MBP has limitations (see below) but it is 100% accurate. It's a lossless optimization

I just open-sourced the library and created a NuGet package:

Github Repo

NuGet Package

In addition to the speedups from MBP, I also added some clever pruning tricks that give me another 2x-5x in speed ups. Namely, using character masks and max distance for pruning.

The library is hyper-optimized for the use case of 1 query against a set of candidates

MyersBitParallel64 engine = MyersBitParallel64.AsciiCaseInsensitive;
using MyersPattern64 pat = engine.Prepare("James Cmaeron Junior");
ulong requiredCharMask = engine.BuildCharMask("Junior");

foreach (string candidate in haystack)
{
    int d = engine.Distance(in pat, candidate, maxDist: 2, requiredCharMask);
    if (d != int.MaxValue)
    {
        // candidate is within 2 edits of "James Cmaeron Junior"
        // and is guaranteed to have all the characters in "Junior"
    }
}

Limitations

Query max length of 64 characters
Maximum of 256 possible characters (Usually ASCII, full unicode wouldn't work)

Benchmark

(1 query × 1000 candidates, MaxDist = 3, .NET 10, BenchmarkDotNet):

Method	Mean	Ratio (vs library)
Naive Levenshtein (no maxDist)	202.9 µs	37x slower
Naive Levenshtein (maxDist=3)	63.1 µs	12x slower
Ukkonen (maxDist=3)	51.1 µs	9x slower
Wagner-Fischer (maxDist=3)	42.5 µs	8x slower
MyersBitParallelDotnet	5.4 µs	1.00

For anyone who wants the long-form story with animated DP visualizations and explanations of why each pruning step works, I wrote it up on my blog here.

Let me know what you think!

Edit:

I just updated the package to also include MyersSubstringBitParallel64. The Myers Bit-Parallel engine can be modified (like 3 lines) so that it finds the best substring distance for a query anywhere within a text. Think of it as approximate substring search. For example, Query = "CSHARM", Text = "ISN'T CSHARP AWESOME" would be 1, because when aligned with "CSHARP" it is 1 substition. This is approximately 5x-10x faster than traditional Wagner-Fischer like Levenshtein algorithms