r/PostgreSQL

▲ 34 r/PostgreSQL+1 crossposts

Understanding Postgres 19 Property Graphs

▲ 14 r/PostgreSQL+1 crossposts

Thoughts on new LTAP/Lakebase arch

Read a Databricks piece on Lakebase/LTAP and wrote a short note on the idea that clicked for me: maybe OLTP and OLAP should meet at storage, not inside one engine.

https://ameeer.in/posts/ltap-storage-layer/

reddit.com

u/ExmachinaCoffee — 16 hours ago

▲ 3 r/PostgreSQL+1 crossposts

Noob here

I've been self-teaching and practicing SQL, Power BI, Python, for about three months now. I also know how to use Excel as well.

Is there anything else I should add to my skill set, or are these enough to help me land an entry level job? I don't have any professional experience with (Postgre)SQL, PB, or Python, and I don't have any formal education in those subjects.

reddit.com

u/Putrid_Winn — 15 hours ago

▲ 5 r/PostgreSQL+2 crossposts

Graphical Version: Rethinking Database Storage: From Monolith to Lakebase and LTAP by Reynold Xin

Using NotebookLM, I turned Reynold Xin's blog into a nice deck. I hope this graphical version is more consumable to some of the folks who prefer to read infographics like myself. Enjoy!

https://medium.com/@jasonyip_77999/rethinking-database-storage-from-monolith-to-lakebase-and-ltap-by-reynold-xin-graphical-version-0d362e382142

Original blog:

https://www.databricks.com/blog/lakebase-ltap-rethinking-database-storage

reddit.com

u/CelebrationSea9296 — 21 hours ago

▲ 0 r/PostgreSQL

a single NUL byte (0x00) in a TEXT column rolled back our entire 500-row batch insert

third-party webhook payloads we ingest occasionally contained a literal NUL byte (U+0000). Postgres text and varchar columns reject it outright, and we were inserting via a single multi-row statement of roughly 500 rows at a time.

one bad row failed the whole statement, so the entire batch rolled back. the symptom was intermittent, cryptic insert errors with no clean pattern, because it only fired on the batches where a payload happened to carry a NUL.

the fix was a strip-and-normalize pass at the data boundary, before the batch insert, not at the database. by the time Postgres rejects the byte, you've already attempted and lost the whole batch.

broader takeaway I took from it: sanitize anything from an external system between "it arrived" and "it enters the pipeline," because the DB is too far downstream to catch this without collateral damage.

curious how others handle invalid bytes in text columns in practice. do you strip upstream like we did, lean on a domain or check constraint, or insert row-by-row so one bad row can't take the batch with it?

reddit.com

u/kumard3 — 1 day ago

▲ 113 r/PostgreSQL

Postgres DB TUI

If not allowed, please delete.

Decided to create a fully featured DB TUI client for Postgres.

Fully customisable vim-style keybindings + a heap of other features.

https://github.com/davesavic/pgsavvy

Hopefully you find it as useful as I have!

u/Savalonavic — 3 days ago

▲ 46 r/PostgreSQL+1 crossposts

10x smaller vector indexes in pgvector

I added the TurboQuant algorithm published by Google to pgvector as part of my discovery and learning process with RAG systems. Just this past weekend, I ran a test with the 100M row Wikipedia dataset from Cohere where I observed a 10x reduction in index size relative to HNSW. I figure with the direction RAM and storage prices have been going, we could use some more ways to save space!

github.com

u/Zirh — 3 days ago

▲ 8 r/PostgreSQL+1 crossposts

Bitemporal time-travel + truth-maintenance-style provenance retraction on Postgres/SQLite (open-source TS graph library)

I just shipped bitemporal provenance for TypeGraph, my open-source graphs-on-SQL library. Three pieces, usable independently but most powerful together:

Valid time: when a fact was true in the world (an invoice's effective date, a role grant's window).
Recorded/system time: when the system captured that fact (what you knew, as of a commit instant; the SQL:2011 FOR SYSTEM_TIME / Datomic system-time axis).
Provenance: why the system still believes a derived fact, and what happens downstream when a source it depended on turns out to be wrong.

Derived facts are the annoying case that surfaces the issue(s) these primitives solve. For example, a Vulnerability node exists because a scanner and a vendor advisory both pointed at it. The graph concluded it; nobody asserted it directly.

ScannerSource ──┐
                ├──▶ Vulnerability (CVE-2026-1234, libvector)
VendorSource  ──┘

So when the scanner turns out to be garbage, you can't treat retracting it as a delete. The vendor might still back that vulnerability. The scanner might have been the only thing propping up a bunch of other facts. You want the graph to sort out which.

What you want: retract a source and it recomputes which derived facts still have grounded support. Retract the vendor too and the vulnerability finally goes non-current, and a "block the deploy" decision sitting on top of it goes with it.

The behavior, then the theory

A fact stays believed while it has at least one justification whose premises are all still supported. Premises bottom out at sources. Retract a source and every justification that leaned on it stops counting; a fact loses currency only once it runs out of surviving justifications.

const provenance = createRetractionCapability(store, {
  source: { kinds: ["ScannerSource", "VendorSource"] },
  justification: { kind: "Justification" },
  fact: { kinds: ["Vulnerability", "DeployDecision"] },
  premiseOf: { kind: "premiseOf" },
  derives: { kind: "derives" },
});

const report = await provenance.retract({ kind: "VendorSource", id: vendorId });
// report.died:        facts that lost all grounded support
// report.survivedVia: facts that still have an alternate justification

This is modeled on truth-maintenance systems. The storage follows the JTMS shape (Doyle 1979, "A Truth Maintenance System"): AND-justifications over premises, sources at the bottom, a fact in the well-founded support set only if some justification has all its premises supported. I use the monotonic, inlist-only fragment, so this is the easy part of Doyle's system; the hard part, non-monotonic belief revision, isn't here. The question retract actually answers, "which facts survive because an alternate justification still holds," is the ATMS question (de Kleer 1986): which combinations of sources hold each fact up. So it's JTMS-shaped storage with an ATMS-flavored query.

Retraction is a normal write, so you get replay for free

Retraction doesn't hard-delete. It recomputes support and flips unsupported facts to non-current, leaving the justification edges in place so you can still see why something used to be believed. Because that write lands on TypeGraph's recorded-time (system-time) substrate, you can replay the belief transition:

const before = await store.recordedNow();
await provenance.retract(badSource);
const after = await store.recordedNow();

await store.asOfRecorded(before).nodes.Vulnerability.getById(id); // believed
await store.asOfRecorded(after).nodes.Vulnerability.getById(id);  // not current

TypeGraph tracks both temporal axes as explicit read lenses, valid time ("when true in the world") and recorded time ("when the database learned it"), and because they're lenses they compose:

store.asOf(validTime).asOfRecorded(recordedTime)

Architecture

No engine-native temporal tables. Postgres needs an extension for system-versioning and SQLite has nothing, so TypeGraph stores history explicitly and reconstructs point-in-time views in the query compiler. That's why one implementation runs on both backends.

Limits

Only TypeGraph-managed writes are captured. Raw SQL bypasses it; this isn't a database-level CDC/audit layer.
No backfill. Enable history on a fresh graph.
Point-in-time reads reconstruct from history relations, so they're slower than current-state reads. It's an audit tool, keep it off hot paths.
Per-write overhead runs ~2.5–6x unless you batch writes in one transaction, where it drops to ~1–1.5x.

A naming note

My asOf is valid time, the reverse of SQL:2011 FOR SYSTEM_TIME AS OF and Datomic (d/as-of db t), where a bare as-of is system time. Valid-time reads are the common case here so they took the short name; system time is asOfRecorded.

I'd love to compare with other systems that handle provenance retraction, or truth maintenance generally, modeled directly on ordinary SQL tables instead of a dedicated reasoning engine. There's plenty of JTMS/ATMS literature but not much on mapping it onto relational storage. Pointers welcome.

GitHub: https://github.com/nicia-ai/typegraph Docs: https://typegraph.dev/provenance

Examples: https://typegraph.dev/examples/provenance-retraction/ https://typegraph.dev/examples/bitemporal-time-travel/

u/pdlug — 3 days ago

▲ 5 r/PostgreSQL

Veteran SQL writer, noob DBA, need help restoring a PostgreSQL DB

Hello all - I'm new here. Please be kind and excuse any transgressions of customs here that I'm ignorant of.

My problem in a nutshell - I reimaged my boot drive with Windows 11 (was windows 10). Now I want to convince Postgres server running on my machine to find / use my old existing data which is on a separate drive.

I need help with a couple things:

There are a couple places on the data drive which might be the instance I'm looking for.
- Is there a quick way to look at something in the folder/subfolder to identify the data at a high level (like a list of schemas, for example).
- Otherwise, I can just try each of them until I say "eureka!"
I think I was running version 16 before the change. But I could be wrong, it might have been 18.
I've installed both 16 and 18 Postgres on my new Windows 11 OS and they are listening on different ports. (5432 and 5433 respectively)
I have no idea what to do to configure the server instance(s) to "go look over there for your data".
I have installed pgAdmin in both 16 and 18, if that helps.
I'm pretty competent at using the DBeaver client, if that helps.
Relatively low importance - but it does represent a ton of lost work if I can't recover.

If anyone thinks they can talk me through it, I would appreciate the help.

Thanks in advance!

reddit.com

u/Emptor66 — 3 days ago

▲ 11 r/PostgreSQL+1 crossposts

plPHP v2.0 released

PL/php is a procedural-language handler that lets you write database functions in PHP, stored and executed inside PostgreSQL. You get the convenience of PHP's standard library with the full power of a native PostgreSQL function — plain functions, set-returning functions, triggers, event triggers, and procedures with transaction control.

github.com

u/linuxhiker — 4 days ago

▲ 0 r/PostgreSQL

PgAdmin helped me learn actual SQL and how DBeaver or Beekeeper Studio actually hurts

PGAdmin is so confusing and obnoxious that 99.9% of the time you will actually want to write SQL by hand. This tremendiously helps learn SQL and its many secrets that you otherwise will never get to see

This is why I feel hurt by DBeaver and Beekeeper Studio. Their UI is so good, so elegant, its easy and non confusing to do most things you actually never see the beauty of raw SQL.

reddit.com

u/earthscorpioanchapie — 5 days ago

▲ 0 r/PostgreSQL

What postgre db provider (EU) for 20GB expanding every month?

Two users obly. Data scraping project.

reddit.com

u/Appropriate_Papaya_7 — 5 days ago

▲ 8 r/PostgreSQL

I need help migrating a MYSQL database to Postgres

I have been handed a project that uses mysql and the db already has some data that cant be lost. i want to migrate the database to postgres because i'm more comfortable working with mysql and it doesn't have some annoying nuances. i have a dump file from the db and i have explored a lot of options like using pgloader, but pgloader doesnt work with this error

2026-06-30T15:59:44.235006Z ERROR mysql: Failed to connect to mysql at "localhost" (port 3306) as user "pgloader": Condition QMYND:MYSQL-UNSUPPORTED-AUTHENTICATION was signalled. (pgloader is a user i created just to by pass the caching_sha2_password issue ).

i have tried to manually modify the dump sql file and alter all the conflicts that may arise from things like datetime, bigint, enums and the id's not incrementing from the max id etc. i'm pretty sure there's going to be a bug and i need a better way to achieve this. Are there any services like pgloader that can make this easier. thanks in advance!

Edit: I found a solution with pgloader v4.0.0 and it worked by creating a migration.load file to specify the type casting I wanted etc. and running it as specified in the docs

reddit.com

u/MrGuam — 6 days ago

▲ 10 r/PostgreSQL

Scaling PgBouncer across every core with SO_REUSEPORT and peering

clickhouse.com

u/Admirable_Morning874 — 4 days ago

▲ 79 r/PostgreSQL+2 crossposts

I looked into how Lakebase LTAP works exactly, to save you some research

Databricks spent most of the Summit keynote telling us LTAP means "no more pipelines, no more ETL, one copy of data." Fine. I've heard "no more ETL" enough times to be suspicious of it on reflex. But I got curious about the one part nobody really spells out in the press releases: if your app is writing plain Postgres rows, how does that same data show up in Iceberg as columns, fast enough to query, without a pipeline you can actually see? So I went reading. Here's the mechanism as best I can piece it together.

First, why this is even a problem. Postgres stores data by row. Everything about one record sits together on disk, which is exactly what you want for "give me user 48213 and update their balance", since you touch one row and you're done. Analytics wants the opposite. "Average order value over the last 90 days" only needs one column out of forty, but in a row store you still drag every row (and every other column) off disk to get at it. Columnar formats like Parquet flip the layout so each column is stored together:

Row store (Postgres):

[id=1, name=Ana, amount=50]

[id=2, name=Ben, amount=80]

[id=3, name=Cy, amount=20]

Column store (Parquet / Iceberg):

id: 1, 2, 3

name: Ana, Ben, Cy

amount: 50, 80, 20

Now "sum the amounts" reads one tidy contiguous list and skips everything else. It also compresses far better, because similar values end up sitting next to each other. Old Lakebase basically kept Postgres data in Postgres format on object storage, so you still needed a conversion step before the analytical engines could do anything useful with it. LTAP's whole pitch is killing that step.

The thing actually doing the work is Moonlink, a component from the Mooncake team Databricks bought last year. It's a replication engine (written in Rust, for what it's worth). It taps Postgres's logical replication stream, the same change feed Postgres already emits for replicas. Every insert, update and delete flows out of that feed. Moonlink consumes it and mirrors the changes into Iceberg, but rewritten as columns, with sub-second lag.

The detail I found genuinely clever is where the row-to-column conversion happens. Object storage (S3 and friends) is slow: response times in the seconds, way too slow to serve actual transactions. So Postgres keeps a fast caching tier in front of it, and that tier usually has spare CPU sitting idle. Databricks does the transcode right there, on that idle CPU, before the data ever lands in object storage. And because going row-to-column compresses something like 10x, you've also shrunk what you have to push down to S3 in the first place. You're not paying for a separate conversion job later; you're paying with cycles that were going to waste anyway, and you ship less data for the trouble. (Reynold Xin walked through this in a VentureBeat interview if you want the source.)

The other half is the freshness trick, because there's normally an annoying tradeoff here. If Moonlink wrote every single change straight into a new Iceberg file, you'd get freshness but also a blizzard of tiny files and constant metadata commits, which is miserable to live with. If it batches writes up to be efficient instead, your analytics goes stale. Moonlink dodges this with what they call union reads. Newly arrived rows sit in an in-memory columnar buffer (Arrow). A query then reads the committed Parquet files on object storage, that in-memory buffer, and any pending updates or deletes, all stitched together as one logical table. That means an analytical query can see data that hasn't even been written into an Iceberg snapshot yet. That's how they claim sub-second freshness without drowning in small files.

One thing worth being clear-eyed about: this isn't HTAP in the old "one engine does both" sense, no matter how the slides read. Postgres is still the transactional engine; Spark/Photon and the new Reyden engine are still the analytical ones. A commit in Postgres is not running your analytical query inside the same transaction. What's actually unified is the storage. One logical copy of the data, one write path (Postgres to Moonlink to Iceberg), and several read paths sitting on top. "One copy for everything", not "one engine for everything". Which, honestly, is probably the more achievable version of the dream anyway.

Caveat: LTAP was just announced and is rolling out as part of Lakebase, so most of this is stitched together from the announcement, interviews, and a couple of good technical breakdowns rather than me running it in prod. If anyone's got it live in their workspace and I've got a detail wrong, please correct me.

TL;DR: Your app writes rows to Postgres like normal. Moonlink tails Postgres's replication stream, converts those changes to columnar Parquet on idle CPU in the caching tier (which also compresses around 10x before anything hits S3), and writes them into Iceberg. Queries read the committed columnar files plus an in-memory buffer of brand-new rows together (union reads), so analytics stays fresh within a second. Separate engines, one shared copy of the data, not classic single-engine HTAP.

reddit.com

u/CautiousUse8597 — 8 days ago

▲ 0 r/PostgreSQL

Looking for Postgresql 9.6.x ?

Hello.

As I mentioned in the title, I'm looking for Postgresql 9.6.x. X being the last version of the 9.6 mainline for that program .

I did look online but I couldn't find the 64 bits exe/MSI file anywhere, only the binaries but I don't need them.

I did some research online and apparently it's the last supported DB version for windows 7.

Please, if anyone can point me to where I can find it or if anyone has the latest installer for 9.6 and could upload to a file sharing service and share it with me, I would greatly appreciate it!

**PS: the exact version I'm looking for is PostgreSQL 9.6.24 64 bits. ( I'm using windows 7 64 bits ultimate edition)**

Thank you in advance

reddit.com

u/Weirdhipster294 — 5 days ago

▲ 1 r/PostgreSQL+1 crossposts

Challenge my permission-aware RAG: denormalized ACLs in the vector payload

Building a self-hosted enterprise wiki + RAG backend (C++/Drogon, Postgres, Qdrant). Schema is done; about to build retrieval. Want this torn apart before I commit.

The model:
- Postgres is authoritative. Qdrant is rebuildable.
- Ingest: upload -> Redis queue -> chunk -> PG txn (chunks + audit + outbox) -> outbox worker -> embed -> Qdrant upsert.
- Each chunk's Qdrant payload carries the permitted org-unit IDs (inherited from the org hierarchy at ingest time) + sensitivity label + lifecycle state.
- At query: compute the user's effective org scope, hit Qdrant with a MatchAny filter, then re-validate top-K against PG (lifecycle, sensitivity clearance, tenant) before handing to the LLM.

Considered and rejected:
- One Qdrant collection per OU: cardinality explosion.
- PG-only filtering post-search: latency.
- Hashed scope-key: loses set-membership semantics.

What I'm worried about: grant revocation and org-tree edits. Re-encoding every affected chunk via the outbox feels right, but at scale (10k+ docs under a moved subtree) this is a thundering herd. Eventual consistency is fine for me (seconds), but I haven't proven it doesn't rot.

Where does this break that I haven't seen yet?

u/Willy__Wonka__ — 7 days ago

▲ 5 r/PostgreSQL

HA Postgresql on docker

I'm reasonably experienced with managing postgresql clusters but I'm fairly new to dockers. I have a requirement to run Postgresql on a 3 node docker swarm cluster. I want to run a Patroni cluster in containers on top of this swarm cluster.

Should I build my own image or are there any production ready images that can be trusted?

I need Patroni + pgbackrest and postgres 17 or 18

reddit.com

u/PreakyPhrygian — 8 days ago

▲ 75 r/PostgreSQL+2 crossposts

Building an API with Zig and Postgres

Hey everyone!

I'm following up on a post I made a couple weeks ago where I made a tutorial on building a basic CRUD API in Zig with just the standard library.

A lot of people seemed to enjoy the video and I got a lot of feedback on extending the video and turning it into a mini-series where we build a more full-fledged API.

So today I extended it from an in-memory datastore to using Postgres! The next video in the mini-series will be about scaling the API to handle thousands of concurrent requests but if you have any other ideas please let me know!

youtu.be

u/Pokelego11 — 8 days ago

▲ 50 r/PostgreSQL

The database that refused to die: How Postgres survived its own creators

theregister.com

u/yourbasicgeek — 8 days ago