u/oleg_mssql

▲ 79 r/devops

We rebuilt infrastructure from backups as a DR-test. The restore worked. The environment didn’t.

Recently we rebuilt infrastructure from backups while setting up a new environment.

Part of the idea was also just seeing how recovery would actually go in a real disaster situation and what kind of hidden problems would show up along the way.

Luckily this wasn’t a production outage, so nobody was panicking and we could take our time digging through issues properly.

We thought it would take maybe a couple of days. It ended up taking weeks...

Every few hours we discovered something new: forgotten settings, incompatible software versions, undocumented dependencies, random unexplained errors, or some component nobody had touched in years.

The good part is that the next test restore was dramatically faster because we already understood most of the weak spots and had documentation for the recovery process.

reddit.com
u/oleg_mssql — 9 days ago
▲ 1 r/Cloud

Another real-world story about ai tool making a mess in production

Lately i’ve been seeing more people use ai tools for sql, migrations, infra changes, automation, etc.

honestly, ai making mistakes isn’t the surprising part, humans have always made mistakes.

what worries me more is how many teams still don’t know whether their backups are actually restorable under pressure.

this was a pretty good example of the discussion around it:

https://x.com/lifeof_jer/status/2048103471019434248

reddit.com
u/oleg_mssql — 10 days ago

The problem isn’t that AI might break something. It’s that you didn’t have backups.

Recently there was yet another real-world case of an AI assistant generating a destructive command during a workflow.

The mistake itself wasn’t the scary part.

The problem isn’t that AI might break something. It’s that your backups weren’t usable when you needed them.

https://sqlbak.com/blog/the-problem-isnt-that-ai-might-break-something-its-that-you-didnt-have-backups/

u/oleg_mssql — 11 days ago