u/Altruistic-Bug-3145

Deflux - streaming ODS parser for .NET, pause/resume across process restarts, parallel work across sheets, instant sheet switching in large ODS files via checkpointable streaming.
▲ 4 r/dotnet

Deflux - streaming ODS parser for .NET, pause/resume across process restarts, parallel work across sheets, instant sheet switching in large ODS files via checkpointable streaming.

Built this because I needed to parse large ODS files (up to 1 GB in practice) with the ability to stop, persist progress to disk, restart the process, and continue exactly where I left off - without re-decompressing or re-parsing anything.

ODS keeps all sheets inside a single content.xml entry, so jumping to a specific sheet normally means ecompressing everything before it. Deflux does one ScanSheets() pass that snapshots a checkpoint at the start of each sheet (45 KB each), and OpenSheet(name) then restores instantly without re-decompressing.

Checkpoints are fully self-contained byte arrays - they outlive the process, and any reader (in the same process or a different one) can restore from them. Parallel processing is just "open the file N times, restore each reader to a different checkpoint" - no shared state, no coordination.

The checkpoint captures the full vertical state: DEFLATE sliding window + Huffman trees + bit buffer, plus the XML parser's element stack and namespace bindings. Restore seeks the compressed stream to the saved bit position and rebuilds state.

Invariant:

Read(0→P) + Save(P) + [restart] + Restore(P) + Read(P→end)

=== Read(0→end)

Forked SharpZipLib's inflater to access internal state - no algorithmic changes, just exposed the fields needed for serialization.

Pure C#, .NET 8+.

https://github.com/daniilvaino/Deflux

Happy to answer questions. Curious if this co-checkpointing approach would be of interest to authors of perf-focused readers like Sylvan.Data.Excel

u/Altruistic-Bug-3145 — 7 days ago