u/IlPresidente995

▲ 34 r/databasedevelopment+1 crossposts

The case for Direct I/O - why it matters for high performance storage

Hello everyone,

Recently I published on GitHub HedgeDB, my high-perf and persisted Key-Value store.

Internally, it uses Direct I/O (O_DIRECT) almost everywhere. In this article I explain the reasons behind this choice, also motivated from some fun experiments I had with fio that you can find in the article. and some consideration about the page cache.

fede-vaccaro.github.io
u/IlPresidente995 — 9 days ago

I built HedgeDB, a high-performance and persisted Key Value store

Hello fellow people from r/databasedevelopment, after many months of late-night experiments, I'm happy to share with you the first version of HedgeDB, a high performance and persisted Key Value store, (freely) inspired from RocksDB!

The project was born because as I was working with RocksDB, I grew a bit unhappy with its code bloat, and it has a hard time keeping up with modern NVMe device. So I decided to give it a try reinventing the wheel.

Here is the repo on GitHub, and also I spent some time preparing the hedgedb.github.io a few articles about architecture design trade-offs, and also it includes a performance comparison between HedgeDB and RocksDB (hopefully the bundled benchmark is "standard enough").

Features and core design

HedgeDB is an LSM-Tree engine designed to saturate the NVMe device. Inspired by RocksDB, the engine targets write-heavy workloads with uniformly-distributed keys (UUIDs, hashes), and is structured around:

  • Asynchronous execution. io_uring + C++20 coroutines via TooManyCooks, a fast work-stealing coroutine threadpool.
  • Partitioned LSM-tree. The key space is sharded into 2^N independent partitions (default 16). Compactions on different partitions run fully in parallel.
  • Size-tiered compaction. Lower write amplification than leveled, with a quotient filter on the read path to skip SSTs that can't contain a key.
  • Per-thread WAL. Each writer thread owns its own WAL file, so inode contention is avoided.
  • Direct I/O. O_DIRECT everywhere on the SST path: predictable latencies and transparent memory usage, avoiding IO stalls from page-cache pressure.
  • MVCC. Snapshot isolation over range scans.

Before you ask, this is not some auto-generated AI slop. I did leverage coding agents or chatbots for research, prototyping or testing support and help with proving correctness of some sections; but generating code was always followed by a phase of heavy manual refinement and refactor.

I hope you will find it interesting!

If you're interested in the project/wanna know more/need anything we can keep in touch on the Discord channel!

u/IlPresidente995 — 12 days ago