u/One-Novel1842

pg-status 2.1.0 — HTTP discovery for PostgreSQL streaming replication, now with read-your-writes

pg-status 2.1.0 — HTTP discovery for PostgreSQL streaming replication, now with read-your-writes

Hi r/PostgreSQL!

I've been working on pg-status, a tiny C microservice that polls your PostgreSQL hosts and exposes their status over HTTP — answers questions like "who's the primary?", "which replica is lagging less than 100 ms?", "which replica has already replayed this specific LSN?".

Wrote about version 1.6.1 here; 2.1.0 is out and the framing of what it's good at became sharper, so I wanted to share an update.

TL;DR — what it is: a sidecar that lives next to your app, polls a static list of PG hosts in the background, and answers HTTP requests in sub-millisecond time. It is not a SQL proxy — your app still connects to Postgres directly, pg-status just tells it which host.

The headline feature: read-your-writes via min_lsn

This is the thing I'd ask you to look at even if you ignore the rest.

After a write to the primary, capture pg_current_wal_lsn() (returns something like 0/3000060). Pass it to pg-status as a query param:

GET /replica?min_lsn=0/3000060

pg-status returns a replica that has provably replayed up to that LSN. If none has, it returns the primary as fallback. You compose this with lag_ms/lag_bytes:

GET /replica?min_lsn=0/3000060&lag_ms=100

This is real read-your-writes:

On the application's side: catch the LSN immediately after write (pg_current_wal_lsn()) and drag it to the next read — through session/cookie/header or Redis, if write and read occur on different nodes of the application. This is the same job as any other read-your-writes approach.

What pg-status does: it keeps fresh replica positions in memory from background polling. When reading, the application makes a single HTTP call instead of round trips to each replica with pg_last_wal_replay_lsn() — and gets the name of the host that has successfully rolled. As far as I know, neither pgpool-II, HAProxy, nor the Patroni REST API have this particular lookup primitive.

What's new

  • min_lsn query param (above)
  • New endpoint /most_sync_by_bytes — deterministic pick of the most-current replica
  • Per-request lag thresholds: ?lag_ms=&lag_bytes=. Although, as before, you can set global thresholds through environment variables.
  • max_fails / possible_dead — host marked dead only after N consecutive fails, but routing immediately avoids possible_dead primaries if a healthier one exists
  • Concurrent non-blocking polling of all hosts through a single poll() syscall (was sequential before — a slow host blocked the rest)

Limitations

  • MAX_HOSTS = 10 is a compile-time cap. If you hit it, please open an issue, easy to bump
  • Streaming replication only
  • Static host list — adding hosts means restart
  • No split-brain quorum. First alive master in pg_status__hosts wins.

Numbers

  • 9 MiB RSS
  • 1600–2000 RPS on 0.1 CPU; 8600–9000 RPS on 1 CPU
  • Fast enough to call on every request

Try it

GitHub: https://github.com/krylosov-aa/pg-status

I will be very grateful if you put a star. Issues and comments are all welcome as well.

Thanks for reading!

u/One-Novel1842 — 3 days ago