u/Emotional_Mention337

▲ 0 r/dotnet

What "calculate Excel formulas without Excel" actually involves in .NET

A common problem in .NET: you have .xlsx files with formulas, you need the computed results on a server, and Excel itself is not an option.

COM interop is not something you want in a web app, it does not help on Linux, and it is hard to scale safely. A lot of .NET Excel libraries are excellent at reading and writing files, but formula calculation is a different problem. Some libraries support parts of it, some rely heavily on cached values, and once you get into real workbooks, the edge cases start showing up quickly.

I spent the last several months building a formula engine from scratch, and I wanted to share what turned out to be harder than expected. "Just evaluate the formula" hides a lot.

Excel-compatible coercion is a rabbit hole. SUM, SUMIF, and SUMIFS sound simple until you deal with blanks, booleans, text that looks numeric, dates, ranges, direct arguments vs referenced cell values, wildcard criteria, error values, and criteria like ">=10" or "<>Apple". A mathematically clean implementation is not enough. You have to match the behavior spreadsheet users already depend on.

Dates are worse than they look. Excel's 1900 date system includes the famous phantom leap day. Serial 60 exists as 1900-02-29, even though that date did not exist. Functions like DATE, WORKDAY.INTL, and holiday-aware calculations have to handle serial conversion, weekend masks, negative workdays, invalid serials, blanks, text dates, and the 1900 edge cases. I ended up writing Excel-generated test workbooks just for date behavior.

Dynamic arrays change the shape of evaluation. Once formulas can return more than one value, evaluation is no longer "one formula equals one scalar." Spill ranges, #SPILL!, #CALC!, array dimensions, broadcasting, and preserving errors all matter. You need the engine to understand not just the result, but where that result is allowed to land.

References are not just A1. Cross-sheet references, named ranges, structured table references, 3D references, whole-row/whole-column references, implicit intersection, and spill references all affect evaluation. Tokenizing Excel formulas is much more annoying than tokenizing normal programming expressions.

The function long tail is real. The first batch of functions gives you confidence. Then you get to financial/date/lookup functions and the edge cases start multiplying. WORKDAY.INTL was one of the functions that made me realize I needed a proper Excel-oracle test strategy instead of "this seems right."

Verification matters more than optimism. My approach has been to generate workbooks with Excel-compatible expected results and then run the engine against those cases. That includes normal cases, error cases, weird coercion cases, date boundary cases, cross-sheet references, and randomized tests where useful.

I am not posting this to sell anything — mostly I wanted to share the engineering side, because "evaluate an Excel formula" sounds simple until you try to match Excel closely enough for real business workbooks.

Happy to answer questions about the parser, evaluator, date system, dynamic arrays, testing strategy, or anything else.

reddit.com
u/Emotional_Mention337 — 24 days ago