r/Database

Databases keep the data, but not always the access history

One messy part of data governance is that once data lands in a database or warehouse, a lot of the original access context gets lost. A customer export from Salesforce, a spreadsheet from Google Drive, or a file pulled from another SaaS tool might eventually become clean structured tables. But the database usually does not show who had access to that data before ingestion, whether the file was overshared, or whether external collaborators had visibility before it moved downstream. So even when the database layer has good roles, encryption, audit logs, and retention policies, there can still be a blind spot upstream. The data may be protected now, but its earlier exposure path is not always clear.

That makes source-level access history and SaaS permission context a bigger part of governance than people usually account for.

reddit.com
u/PersimmonFar9942 — 10 hours ago
▲ 42 r/Database+6 crossposts

Hey folks! A post on cache types, but from a different angle than the classic one tied to where the cache lives.

u/teivah — 2 days ago

redis is not a database no matter how many times you SET something

yes it has persistence. yes you can technically store data in it and have it survive a restart. no that does not make it your source of truth and im tired of pretending the AOF/RDB thing makes this an actual debate

watched a team lose a few hours of user sessions last year because someone decided redis was the session store, no postgres behind it, nothing. box got restarted during a routine deploy, the snapshot was however many minutes stale, everyone got logged out mid-checkout. the postmortem treated it like some freak event and not the completely predictable result of using an in-memory cache as your only copy of something you cared about

the thing is redis is genuinely incredible at what its for. caching, pub/sub, rate limiting, ephemeral counters, a leaderboard, a lock. its so good at being fast that people start reaching for it for everything and forget the fast comes from it living in memory, which is the exact property that makes it a bad place to keep the only copy of anything

and i get why it happens. its right there, its already in the stack, adding a real write to postgres feels like more work than just SET and moving on. but durability isnt a feature you bolt on later when it bites you, its the whole reason databases are annoying to work with in the first place. the annoying parts are the point

persistence is not durability. a snapshot every few minutes is not a transaction log. eventual-on-a-good-day is not the same as committed

use it for what its great at. put the stuff you cant lose somewhere that was built to not lose it. this isnt even a hot take its just what the docs have been saying the whole time and somehow we;re still here

reddit.com
u/Motor_Ordinary336 — 3 days ago

SAP PowerDesigner is going end of life, erwin Data Modeler can be a replacement

Hey,

So, I work at Quest, so I'll be upfront about that

I don't want to violate the rules, so I will just make a case and leave it at that, if this post is in violation, you guys let me know.

PowerDesigner is being discontinued, and there's a case to be made for erwin Data Modeler

Reason 1: Direct import bridges for PowerDesigner models, which means your existing work comes over as-is

Reason 2: erwin supports Databricks, Google BigQuery, and graph databases, along with NoSQL native support and DevOps integrations with Git, GitHub, and Bitbucket.

Reason 3: Once PowerDesigner stops receiving security patches, companies will struggle with compatibility with newer databases.

That's what I'm proposing.

reddit.com
u/MikeAtQuest — 2 days ago

Architecting a 3-stage framework for cross-engine DB synchronization and migration. I'd love to get some architectural feedback.

I’ve spent a lot of time dealing with the friction of modernizing legacy systems, specifically the headaches that come with database schema evolution and cross-engine synchronization.

Instead of treating database migration as a series of manual, one-off scripts, I’ve been working on a theoretical 3-stage framework designed to automate the pipeline across several of the most common database engines. I’m sharing the core architecture here because I’d really value some raw engineering feedback on this approach.

Phase 1: The "X-Ray" Component (Blueprint Extraction)

The whole process starts with a deep inspection—what I call an "X-Ray"—of the source database. Instead of just copying raw, dialect-specific schemas, the goal here is to extract a completely unified, agnostic semantic representation of the entire infrastructure.

This intermediate blueprint standardizes tables, data types, indexes, and constraints into an engine-agnostic core., i.e. central schema definition. It strips away the syntax noise between legacy and modern engines before any data even moves.

Phase 2: Schema Orchestration (The Sync Engine)

Once you have a universal blueprint, the orchestrator handles the heavy lifting of schema synchronization against a completely different destination backend.
The real engineering challenge here is handling type-mapping anomalies and structural translation without breaking relational integrity. The sync engine calculates the differences and generates the exact DDL required to align the destination with the blueprint state.

Phase 3: The Migration Engine (Data Streaming)

The final layer is a data transfer engine built to move actual records from the legacy environment to the new backend.
By decoupling the data streaming from the schema definition, this phase focuses entirely on high-throughput extraction, on-the-fly data transformation, and post-migration consistency checks.

reddit.com
u/slavkomatanovic — 3 days ago

New into databases - need advise on prices.

Hey everyone!

I'm spontaneously involved in financing/managing a project that requires a database. I've never dealt with databases before and based on what I was able to talk out of AI agents, the architecture for the database of the project is something like:

Workers

PostgreSQL + TimescaleDB

FastAPI Backend

REST/GraphQL API

Next.js Frontend

Client Dashboard

What I want is advise with what is the pricing on something like this done from scratch. I don't have anyone whom I can ask on the ballpark on what this would cost, as I've never been in a remote industry with databases. Point is simple - don't want to get scammed on the price and I don't want to under-pay to get a half-baked database. I've already had approached 2 people on this, but the price that they offer seems oddly high per my understanding (north of 5k and it's not US-based development).

Obviously, I can share some more details, if needed, but not deep details about the project.

reddit.com
u/LarysaB — 3 days ago

Legal sent us an eDiscovery request for emails from 2009 on a Friday afternoon

I have blocked most of the following three weeks from memory, but I'll reconstruct it as accurately as possible.

The request came in at 4:47pm, emails from a specific date range in 2009, specific custodians, 90-day response window, which sounds generous until you remember what "2009 email archive" means for us: a mix of exchange backup tapes from three different generations of infrastructure, some labeled properly, some not, stored across two locations, one of which had flooded in 2016 and been "dealt with."

Week one was mostly inventory. finding the tapes. identifying which format they were in, realizing our current tape hardware couldn't actually read two of the formats without sourcing legacy drives. The drives cost more to rent than I expected.

It was a successful week two. While a few tape drives were working well, there were some that did not work. One drive happened to be empty, but whether the problem resulted from data or labeling was never known.

Week three was all about ingestion, deduplication, and privilege review. By now, Legal was demanding updates every other day.

We got there. But the margin was not comfortable, and the cost, staff time, hardware rental, and emergency vendor work from Tape Ark for the problematic tapes, was not budgeted anywhere.

The thing is, none of this was unusual. eDiscovery from decade-old tape archives is a known problem, and organizations keep not solving it until the request arrives. The can gets kicked because the tapes are "fine," and the scenario feels theoretical.

It stops feeling theoretical pretty fast.

reddit.com
u/rmoreiraa — 2 days ago
▲ 0 r/Database+1 crossposts

Do AI or ML specific work needs to use persistent kv database??

So I was eager to know like these AI/ML specific work needs to store something in the persistent kv database? like for example do they needs to use LMDB for any work? Lets say if I have to design a persistent kv database in 2026 for AL/ML workload what should I keep in mind ? and at the first place do they even need ?

reddit.com
u/ankush2324235 — 3 days ago

Complete noob looking for easy software

Hey, I know almost nothing about how database software works, and do not have the time currently to learn it, but I do have a need to keep track of something.

I'm looking for an (open source/free) application that would let me make "profiles" of people, and then link pictures to those profiles. Preferably this would be local and not need to be hosted in any way - I just want to put everything needed on a hard drive, open the application and then open the database from there.

It sounds to me like I need something like a local contact database? But I'm not knowledgeable about databases in the slightest, so I have no idea how to begin looking. I would greatly appreciate any recommendations that might fit my need.

Thanks a lot in advance!

reddit.com
u/flameinthepinkpan — 4 days ago
▲ 10 r/Database+1 crossposts

How dragonfly DB or Redis is different form persistable K.V. storage?

So as we know, databases like DragonflyDB can persist data on disk and also use modern async IO techniques like io_uring.

Then why would someone choose a persistent key-value database/storage engine like FoundationDB, TiKV, ScyllaDB, or LMDB-style systems instead?

What architectural or workload differences make those systems preferable over something like DragonflyDB with persistence enabled?

Trying to understand the deeper storage-engine tradeoffs here.

reddit.com
u/ankush2324235 — 4 days ago
▲ 3 r/Database+1 crossposts

Should I still use CreatedAt & UpdatedAt on the main table if I also have Audit tables?

Say I have a table with users:

USERS
-----
ID
EMAIL
PASSWORD
CREATED_AT
UPDATED_AT

then I also create a table to track changes:

USERS_LOGS
----------
USER_ID
TIMESTAMP
ID
EMAIL
PASSWORD
CREATED_AT
UPDATED_AT

Does it make sense to have CreatedAt and UpdatedAt on the USERS entity if there is already a "TIMESTAMP" field in USERS_LOGS?

reddit.com
u/Loud_Wrangler1255 — 5 days ago
▲ 7 r/Database+1 crossposts

Mysql Innodb cluster with Multi-instance.

I have a VM (for ease lets name it ALPHA) with multiple instance of MySQL. Each instance have their own user, my.cnf, datadir. I used systemd. Not docker due to the requirement being suitable. Less resource use and all instance uses the same version of mysql.

I have a question. How do i set a cluster with the ALHPA which holds all the instances to to be a primary node in the cluster and the secondary nodes (2 secondary nodes) will have the same instances and replicate all those instances. Its it even possible to do it natively?

Why i set it like this is because the requirement specifically asks for each system manage their own instance and have their own encryption and configuration.

reddit.com
u/BugAdministrative357 — 6 days ago
▲ 17 r/Database+2 crossposts

Ultimate guide to POSETTE: An Event for Postgres, 2026 edition

Now in its 5th year, POSETTE: An Event for Postgres 2026 is a free & virtual developer event with 44 talks across 4 livestreams—organized by the Postgres team at Microsoft in partnership with AMD.

No travel budget required. You could just check out the PosetteConf.com schedule & speakers page (and then mark your calendars if it looks useful to you) but this ultimate guide is mean to give you a map to help you find which talks and which livestreams are right for you.

Let me know what you think.

techcommunity.microsoft.com
u/clairegiordano — 5 days ago

What is the core concept of two phase locking protocol?

For example when I say strict 2pl, I understand that the core concept is that:

- exclusive locks cannot be released until the end of the transaction.

When I say rigorous 2pl core is:

- all locks cannot be released until the end of the transaction.

when i say conservative 2pl, core is:

transaction acquires all required locks before the start of transaction.

But when I say 2pl, I do not quite get what the core is:

- is it "the growing and shrinking phase"

- is it "after the first lock is released, more locks cannot be acquired"

what is the core?

reddit.com
u/2082_falgun_21 — 5 days ago

How can I connect a SQLite Database to NetBeans?

Been searching and I can't find a lot of information about how to do it and the few I've found is either too confusing, old or for Windows when I'm using Linux Mint.

I also tried with LibreOffice Database but nothing. And trying to use MySQL Workbench results in failure. I've asked on various Discord servers, Facebook and different subreddits but no one seems to give me better insight.

What should I do?

reddit.com
u/The_Meme_Lady_69 — 6 days ago

Looking for an open source cloud database

Hey data folks, I'm looking for an open source cloud database to store telecom distributor data.

This project is both personal and professional the distributor I'm building this for is my uncle,

so I want to help him generate insights from his distribution data and get a clearer picture of his

business. I'll be using Power BI for the dashboard and visualization.

The challenge is I don't know which open source database to go with. Azure and AWS are off

the table since their free tiers only last 30 days, and I need something long-term.

Also want to avoid Google Sheets or Drive it doesn't feel like a proper database, and

honestly when explaining the tech stack later, it won't sound great. Looking for something more

structured and scalable.

In short, my requirements are:

  1. Open source database that a non-tech person can easily use to insert data

  2. Can connect with Power BI

  3. At least 1 GB of free storage

reddit.com
u/Desi__Popeye — 7 days ago

handling sensitive data once it moves from SaaS apps into databases?

in many environments, data constantly flows from SaaS platforms like Google Workspace, Slack, Salesforce, and similar tools into internal databases, warehouses, or BI pipelines through exports, integrations, and automated syncs. The difficult part seems to be understanding whether the original access and sharing permissions around that data were already too broad before ingestion even happened.

What makes this especially messy is that SaaS permissions tend to change gradually over time. External collaborators get added temporarily, links remain active longer than expected, and inherited access quietly expands visibility without anyone intentionally creating a security issue.

reddit.com
u/DOMANIMUNGA — 8 days ago