r/PostgreSQL

Open source data governance compiler for PostgreSQL
▲ 18 r/PostgreSQL+1 crossposts

Open source data governance compiler for PostgreSQL

I never thought of data governance as a sexy topic. My main focus has always been on performance, insights, cost reduction.

That is, until I joined a startup as the sole data engineer. Dealing with tons of PII/PHI, I realized just how much effort it was to write all these custom tools to handle everything: infinite GRANTs, trigger functions for versioning, cron jobs for retention - and it all needed so much attention and maintenance. Or I could go with an off-the-shelf product that's a complete black-box with a learning curve.

Always one to prefer spending 10x longer automating the task than just doing it, I built a CLI tool that lets users build their DB/governance specs in declarative yamls, and writes all the SQL code for you. And it's open source, fully transparent, as secure as I could conceive of making it, and hopefully super user-friendly too.

I've linked the first release in my repo. Anyone want to try it out?

In the interest of transparency: I did code this with assistance from Claude, but I've been in data engineering for almost 20 years and manually debugged every line. I also got it to build me a suite of over 300 tests that run through GitHub Actions automatically on each commit.

github.com
u/River_Bass — 1 day ago
▲ 2 r/PostgreSQL+1 crossposts

What's your preferred way of migrating postgres databases?

Hi, as I learn about and work with Postgres I will have to do some db migrations.

Most of my servers are managed Postgres but also some are on-prem on vms.

Anyway, when you are asked to migrate Postgres databases, what's your goto procedure?

Backup/restore?

Logical replication?

Cloud based migration services?

reddit.com
u/A55Man-Norway — 1 day ago

TypeGraph: graph queries that compile to a single recursive CTE on Postgres/SQLite (no graph DB needed)

I'm a huge fan of graphs, they tend to simplify a lot of problems (permissions that inherit, content that relates, entities for RAG, etc.) and preserve optionality when the data modeling is uncertain. Most of the time you need a graph, you don't need a graph database. Your app already has a perfectly good SQL DB and you can get pretty far with recursive CTEs. After building this a few dozen times I decided to package it all up with a nice DX and open source it.

TypeGraph (open source, I'm the author) is a graph modeling + query layer that compiles to SQL. It is explicitly not a graph database — you keep your Postgres/SQLite, your transactions, your backups, and you inherit your DB's performance. Comes with some tradeoffs but also a lot of power, like being able to connect from relational to graph.

  • Each algorithm (shortestPath, reachable, canReach, neighbors, degree) compiles to one recursive CTE with cycle detection and depth limits — identical semantics on SQLite and Postgres
  • Postgres CTEs emit NOT MATERIALIZED hints; LIMIT is pushed past GROUP BY in safe aggregation cases
  • Server-side prepared statements (named) cache plans — ~6× faster on multi-hop traversals in my benchmarks
  • refreshStatistics() wraps ANALYZE (per-table on PG) for post-bulk-load plan stability
  • withTransaction(externalTx) shares one transaction across TypeGraph and your existing Drizzle/relational writes — atomic across both models, no data syncing
  • Multi-driver Postgres: node-postgres, postgres-js, Neon WS + HTTP; SQLite via better-sqlite3, libsql, and Cloudflare Durable Objects

Embraces TypeScript and related libraries: a single Zod schema per node/edge is the source of truth for runtime validation, storage, and type inference. Result types come from your select clause. Traversal autocomplete only shows valid target kinds.

It ships a basic ontology with the ontology as data so you can do things like admin.implies(editor) and every query expands automatically, no getEffectivePermissions() duplicated across services.

It does all the fancy AI stuff too (semantic search, fulltext, hybrid retrieval w/ RRF) but it's really just graphs done right on top of a DB you're probably already using

Honest feedback welcome, especially on the type ergonomics.

GitHub: https://github.com/nicia-ai/typegraph · Docs: https://typegraph.dev

reddit.com
u/pdlug — 1 day ago

pg_deltax (δx): Fast time-series extension for PostgreSQL

Hi,

There's something I've been working on for a few months and wanted to share here:

pg_deltax, a fast time-series extension for PostgreSQL. Basically an Apache-licensed Timescale alternative.

I've started working on this project because I wanted a pure-OSS Postgres extensions for time-series. Then I got competitive on ClickBench and wanted to see how fast I can make it.

It’s very early days, but the benchmarks already look really good (see in the README).

It's still slower than analytics-focused DBs like clickhouse or duckdb, but actually not that far behind. Consider getting that kind of performance without ever having to take the data out of Postgres. This is the part that I'm most excited about.

github.com
u/tee-es-gee — 2 days ago

pg-status 2.1.0 — HTTP discovery for PostgreSQL streaming replication, now with read-your-writes

Hi r/PostgreSQL!

I've been working on pg-status, a tiny C microservice that polls your PostgreSQL hosts and exposes their status over HTTP — answers questions like "who's the primary?", "which replica is lagging less than 100 ms?", "which replica has already replayed this specific LSN?".

Wrote about version 1.6.1 here; 2.1.0 is out and the framing of what it's good at became sharper, so I wanted to share an update.

TL;DR — what it is: a sidecar that lives next to your app, polls a static list of PG hosts in the background, and answers HTTP requests in sub-millisecond time. It is not a SQL proxy — your app still connects to Postgres directly, pg-status just tells it which host.

The headline feature: read-your-writes via min_lsn

This is the thing I'd ask you to look at even if you ignore the rest.

After a write to the primary, capture pg_current_wal_lsn() (returns something like 0/3000060). Pass it to pg-status as a query param:

GET /replica?min_lsn=0/3000060

pg-status returns a replica that has provably replayed up to that LSN. If none has, it returns the primary as fallback. You compose this with lag_ms/lag_bytes:

GET /replica?min_lsn=0/3000060&lag_ms=100

This is real read-your-writes:

On the application's side: catch the LSN immediately after write (pg_current_wal_lsn()) and drag it to the next read — through session/cookie/header or Redis, if write and read occur on different nodes of the application. This is the same job as any other read-your-writes approach.

What pg-status does: it keeps fresh replica positions in memory from background polling. When reading, the application makes a single HTTP call instead of round trips to each replica with pg_last_wal_replay_lsn() — and gets the name of the host that has successfully rolled. As far as I know, neither pgpool-II, HAProxy, nor the Patroni REST API have this particular lookup primitive.

What's new

  • min_lsn query param (above)
  • New endpoint /most_sync_by_bytes — deterministic pick of the most-current replica
  • Per-request lag thresholds: ?lag_ms=&lag_bytes=. Although, as before, you can set global thresholds through environment variables.
  • max_fails / possible_dead — host marked dead only after N consecutive fails, but routing immediately avoids possible_dead primaries if a healthier one exists
  • Concurrent non-blocking polling of all hosts through a single poll() syscall (was sequential before — a slow host blocked the rest)

Limitations

  • MAX_HOSTS = 10 is a compile-time cap. If you hit it, please open an issue, easy to bump
  • Streaming replication only
  • Static host list — adding hosts means restart
  • No split-brain quorum. First alive master in pg_status__hosts wins.

Numbers

  • 9 MiB RSS
  • 1600–2000 RPS on 0.1 CPU; 8600–9000 RPS on 1 CPU
  • Fast enough to call on every request

Try it

GitHub: https://github.com/krylosov-aa/pg-status

I will be very grateful if you put a star. Issues and comments are all welcome as well.

Thanks for reading!

u/One-Novel1842 — 2 days ago
▲ 10 r/PostgreSQL+3 crossposts

Database monitoring utilities

We had some bugs in our web application and it led to bad data entering the database and eventually crashing our service. It was happening intermittently and had low priority so it wasn't patched quickly enough. To catch the errors I scripted a quick monitoring utility which ran the needed queries and compared the result sets. This became pretty useful so I created a more refined version and made it open source. It supports Postgres, SQL Server, MySQL and SQLite. Its also available as a docker image if you wanted to try it.

Source code: https://github.com/leoCorso/DBGuard-Web/releases/tag/1.0.0

Do you guys think something like this is useful? Are there any pitfalls or improvements you think of? Looking to improve the utility.

u/taohz — 3 days ago
▲ 17 r/PostgreSQL+2 crossposts

Ultimate guide to POSETTE: An Event for Postgres, 2026 edition

Now in its 5th year, POSETTE: An Event for Postgres 2026 is a free & virtual developer event with 44 talks across 4 livestreams—organized by the Postgres team at Microsoft in partnership with AMD.

No travel budget required. You could just check out the PosetteConf.com schedule & speakers page (and then mark your calendars if it looks useful to you) but this ultimate guide is mean to give you a map to help you find which talks and which livestreams are right for you.

Let me know what you think.

techcommunity.microsoft.com
u/clairegiordano — 3 days ago

Where can I find a course that teaches PostgreSQL DBA fundamentals (and beyond) for a SQL Server DBA?

Hi all,

Our systems are slowly starting to move to PostgreSQL, at least for some services. I'm currently a senior SQL Server DBA and I want to learn PostgreSQL from the DBA side.

Over the past few days I've been trying to find a course that walks through the database step by step, but I can't find anything good or consistent. I've tried Udemy and YouTube and it's all over the place. I can't find a structured course on this.

Does anyone have a recommendation? I'm willing to pay a good amount of money for a quality course, I'm not looking for something free. But it has to be DBA-focused, not developer-focused. I've already picked up most of the new syntax from W3Schools, and after years of T-SQL it's not too hard to adjust to.

Thanks.

reddit.com
u/Dimonzr — 5 days ago

How we turned a 40+ minute startup query into 5 seconds on a 10-schema, 80k-table Postgres setup

Story behind a fix that just shipped in JobRunr 8.6.0 that other folks doing JDBC-side table validation might find useful.

A user reported that on their Postgres setup (10 schemas, ~8000 tables each, so roughly 80k tables total) just validating which JobRunr tables existed during application startup was taking over 40 minutes.

The offending code was straightforward:

ResultSet tables = conn.getMetaData().getTables(catalog, null, "%", null);

That "%" pattern pulls metadata for every table in every schema in the catalog. On a small Postgres database it's instant but on a database with tens of thousands of tables, it's a pg_class / pg_namespace scan can take a while...

The fix: filter by name pattern at the metadata-query level instead. We let DatabaseMetaData tell us how identifiers are stored (upper / lower / mixed case) and pass a narrowed pattern:

DatabaseMetaData md = conn.getMetaData();
String pattern = "%";
if (md.storesMixedCaseIdentifiers()) pattern = "%";
else if (md.storesUpperCaseIdentifiers()) pattern = "%JOBRUNR%";
else if (md.storesLowerCaseIdentifiers()) pattern = "%jobrunr%";
ResultSet tables = md.getTables(catalog, null, pattern, null);

On the same 80k-table Postgres setup, validation went from 40+ minutes to under 5 seconds.

The identifier-casing dance matters because Postgres folds unquoted identifiers to lowercase, Oracle / DB2 fold to uppercase, and mixed-case databases like MS SQL keep them as written. A blanket %JOBRUNR% pattern would miss tables on Postgres, and %jobrunr% would miss them on Oracle.

For anyone hitting similar slowness on getMetaData().getTables() in their own apps, this is the lever to pull.

PR: https://github.com/jobrunr/jobrunr/pull/1539
Original issue with the user's reproduction: https://github.com/jobrunr/jobrunr/issues/1538

(JobRunr is an open-source background job library for Java if you haven't heard of it. This post isn't really about JobRunr, just a Postgres / JDBC startup-perf pattern that's worth sharing.)

reddit.com
u/JobRunrHQ — 7 days ago

Another tui for postgres

I was recently laid off and that gave me time to finally fix a headache I had for quite a while. There is already a lazysql but its UI felt a bit clunky and having dabbled in bubbletea before the UI felt way too complicated which pushed me to write my own thing and why not I love go and postgres

https://github.com/umairabid/lazysql

u/progodevil — 5 days ago
▲ 11 r/PostgreSQL+2 crossposts

Where do I refresh my skills after some years

I'm working as a Software Engineer (Frontend) and didn't work with Databases that deep for so many years. Last time was 6-7 years ago in University and I was working with MySQL

Now I'm gonna work on my side projects and I want to know about everything on the Database side and also use them at my work. I totally forgot so many topics.

Right now the tool which I'm gonna use is Supabase and PostgreSQL
I found this course on FrontendMaster (Which I have subscription) but it's for 7 hours.

Does anyone recommend any other courses or better ways so I can go through all the topics and not miss anything? (I generally like watching courses)

u/Big-Walrus6334 — 8 days ago
▲ 20 r/PostgreSQL+5 crossposts

My Orange Pi 5 Plus just ran a full SEO campaign for a local business, $0 in hosting costs making $100/month for a pilot program!

The Orange Pi 5 Plus in my stack serves as the Ollama model server, running qwen3.5:4b at about 7GB RAM. This week it powered something I'm pretty proud of.

My AI crew ran a complete local SEO campaign for a tattoo shop in San Diego. Here's what it generated automatically:

→ 8 keyword-optimized landing pages targeting San Diego

tattoo searches

→ 3 blog posts with proper meta descriptions and local

keyword targeting

→ 4 weeks of Google Business Profile posts

→ Review request SMS templates for the artists to send

after every appointment

→ Competitor research logged to PostgreSQL

The Orange Pi handled all the inference. Total AI cost for the content generation: about 30 cents in Claude API calls for the final polish pass.

A traditional SEO agency charges $1,500-2,500/month for this. The whole pipeline runs automatically every Monday morning via cron job.

Just posted a full video walkthrough showing exactly how it works, the agent architecture, the PostgreSQL schema, the Squarespace implementation, everything.

https://www.youtube.com/watch?v=a0NXVsqu5jQ

What are you all running on your Orange Pi?

u/Weird_Night_2176 — 11 days ago
▲ 25 r/PostgreSQL+4 crossposts

paradedb/benchmarker: a workload agnostic, multi-backend benchmarking tool.

Hi r/postgresql!

We just open sourced ParadeDB Benchmarker, a multi-backend benchmarking framework built on top of the excellent Grafana k6 (blog post).

One of the goals was avoiding a shared query abstraction layer. PostgreSQL queries stay PostgreSQL queries, with their own driver and native SQL.

Supports PostgreSQL, Elasticsearch, OpenSearch, ClickHouse, MongoDB, and ParadeDB with:

  • mixed read/write workloads
  • support for docker-compose profiles per backend
  • dataset loader
  • config and setup capture
  • live metrics + exported reports

One of the ah-ha moments I had building this was using the pgx Go driver in anger for the first time, I'm a Rust guy, but I'm seriously impressed with pgx and what it can do.

Any comments welcome, we will be using this to benchmark ParadeDB, but you can write your own datasets and workloads which have nothing to do with full-text search.

github.com
u/jamesgresql — 9 days ago