r/Terraform

▲ 15 r/Terraform

Cleared Terraform Certification

I have successfully cleared the 004 certification

I had experience with Terraform 6-9 months

Followed zeal vora's Udemy course

Did his practice tests

The exam was not so hard, but a little tricky

If you prepare correctly, it is easy to crack.

reddit.com

u/Actual-Cockroach-303 — 2 days ago

▲ 2 r/Terraform+1 crossposts

Is this system safe enough to release to production?

I built a small tool to catch infra risks before production releases
I’ve been working on a project called Beacon.
The idea came from a very practical problem I’ve seen in distributed systems: before a release, teams usually have dashboards, logs, Terraform files, Kafka configs, Kubernetes manifests, runtime snapshots, etc. But still, the actual question is usually very simple:
“Is this system safe enough to release to production?”
Beacon tries to answer that.
It scans infrastructure/config/runtime inputs and gives a production-readiness decision with ranked risks, possible root causes, and suggested next actions. Right now it has examples around Kafka, Kubernetes, Terraform, Helm, runtime snapshots, OpenTelemetry, Prometheus, Schema Registry, CI/CD, and flow degradation.
This is not meant to replace observability tools. The way I think about it is:
Observability tells you what is happening.
Beacon tries to tell you what is risky, why it matters, and what should be fixed first.
You can try the demo without setting up Python locally.
Run the UI with Docker:

docker pull ghcr.io/mishraricha1806/beacon:latest

docker run --rm -p 8765:8765 ghcr.io/mishraricha1806/beacon:latest ui --host 0.0.0.0 --port 8765

Then open:

http://127.0.0.1:8765/

For the simplest demo, use the sample bad infrastructure example from the repo:

examples/bad-infra/

In the UI, choose the static/readiness input, upload the files from that folder, run the scan, and check the readiness score, top reasons, grouped risks, and next actions.
You can also run the same demo from CLI:

docker run --rm \
  -v "$PWD:/workspace/project:ro" \
  ghcr.io/mishraricha1806/beacon:latest readiness static \
  /workspace/project/examples/bad-infra \
  --environment prod \
  --no-html \
  --no-open-report

Expected result is the tool should flag the setup as NOT READY, with risks like replication, storage/message-size, and missing governance context.
There is also a Black Friday style demo for payment/event pipeline readiness:

docker run --rm \
  -v "$PWD:/workspace/project:ro" \
  ghcr.io/mishraricha1806/beacon:latest readiness all \
  --static-path /workspace/project/examples/demo-black-friday \
  --snapshot /workspace/project/examples/demo-black-friday/runtime-snapshot.yaml \
  --environment prod \
  --no-html \
  --no-open-report

Repo: https://github.com/mishraricha1806/beacon
I’d be interested in feedback from people who work with Kafka, Kubernetes, Terraform, platform engineering, SRE, or release governance.
Mainly looking for thoughts on:

Does this kind of readiness gate feel useful before production releases?
What signals would you expect such a tool to check?
Would you prefer this as a CLI, CI/CD gate, or lightweight UI?

GitHub

GitHub - mishraricha1806/beacon: Detect infrastructure risks before production.

reddit.com

u/Any-Leg-7348 — 2 days ago

▲ 0 r/Terraform+1 crossposts

Looking for feedback: I built an open-source Terraform/OpenTofu HTTP backend because global state locking felt too coarse

Hi guys, first post here!
I’m the author of KiloLock. I built it after running into the same large-state problem many infra teams eventually hit: the problem is not only state file size, but coordination around one shared state graph.

The stable path is intentionally boring:
- vanilla Terraform/OpenTofu HTTP backend compatibility
- PostgreSQL-backed state storage
- Docker Compose self-hosting
- no custom Terraform fork required

The experimental path is what motivated the project:
- queryable state graph
- resource-level history
- repair workflows
- foundations for narrower reservations / resource-aware locking
- future parallel-safe operations through kl

I’m not claiming this should replace your current backend if S3/GCS/HCP works fine. I’m looking for technical feedback from people who have dealt with large shared states, long plans, state lock contention, or awkward state splitting.

GitHub: https://github.com/kilolockio/kilolock
Documentation: https://kilolock.dev/documentation/

u/davesade — 3 days ago

▲ 14 r/Terraform

"Using OpenTofu's Exclude Flag to Isolate Performance Bottlenecks"

masterpoint.io

u/mooreds — 3 days ago

▲ 3 r/Terraform

Anyone moved from Spacelift to a Spacelift alternative with better drift detection and cloud visibility?

This is mainly for people running Terraform at scale with Spacelift or similar tools. Many of the Spacelift alternatives content I have found is just tool lists: env0, Terraform Cloud or HCP, Scalr, Atlantis, Terrateam, Terramate and so on. That is fine for awareness, but it does not answer the real question for us: if your pain points are drift detection and cloud asset visibility, what should you switch to?

By better drift detection and cloud asset visibility, I mean continuous detection of changes that never went through Terraform, a clear view of what percentage of your estate is codified and visibility into unmanaged resources sitting outside Terraform. I know that last part pushes past what a TACO things like Spacelift is built to do on its own, closer to CSPM territory, so I want to know whether people paired their orchestration things with something else for that piece rather than expecting one way to cover it all. Bonus points if fixes are driven through Git rather than fix it in the console and hope.

If you have left Spacelift, what did you move to, and did it change your story around drift, unmanaged assets and governance or mostly replicate what you had at different pricing?

reddit.com

u/Adventurous_Rope4025 — 4 days ago

▲ 9 r/Terraform

CI/CD Stages and Jobs

how do you guys setup your ci/cd stages for terraform repos with multiple root modules? for example a network stack(vpc, igw, natgw, routes, tgw), platform stack(eks, ec2s, albs), and data stack(rds, or whatever data resources).

do you create a stage for each stack with different jobs? or use 3 generic stages (validate, plan, apply) with a job for each stack? the later seems it would be harder to understand because plan jobs for certain stacks have dependencies on certain apply stacks(platform plan needs networks apply output).

reddit.com

u/DeLoMioFoodie — 5 days ago

▲ 13 r/Terraform

30 yrs in networking, AZ-305 I never got to use — built an Azure/Terraform lab between jobs. Roast it... with love please.

I'm a network engineer — ~30 years, mostly Cisco/enterprise. I passed the Azure Solutions Architect Expert a couple years ago but never got to use it in anger. I'm between jobs right now, so rather than only grinding applications I've been putting the time into actually building the cloud/IaC skills the cert says I have.

This is the result: a fictional "cloud post-production studio" on Azure, all Terraform (azurerm ~> 4.0). The cheap networking/storage/monitoring is applied live; the expensive stuff (GPU, NetApp Files, Firewall, App Gateway) is feature-flagged and validated plan-only, so the whole design proves out at ~$0. There's a gated Azure DevOps pipeline (plan → manual approval → apply), remote state, and — because I wanted to stress-test my own work — I had it red-teamed by a friend and remediated every finding with terraform test (mock_provider, so the suite runs at $0 with no Azure creds).

I'd genuinely rather get torn apart here than in an interview. Specifically, tell me where I'm wrong:

Structure — every subnet/NSG/VM comes from one locals map via for_each. Idiomatic at this size, or should I be breaking it into modules?
Plan-only guard — a terraform_data lifecycle.precondition + a confirm_expensive_resources flag so the expensive resources can't apply by accident. Sensible pattern, or is there a cleaner idiom?
Variable validation — the VPN pre-shared-key validation is gateway-conditional (only enforced when enable_gateway = true) so it doesn't block plan-only runs. Reasonable, or a smell?
CI — a credential-free "Verify" stage (the mocked tests) runs on every PR; the plan/apply stages with the service principal only run on main. Overkill for a solo repo, or the right instinct?

Repo: github.com/gsamarco/FourHorsemenStudio — ./scripts/verify.sh runs the whole suite offline at $0 if you want to poke at it.

I know a few things are still gaps (firewall FQDN allow-list, App Gateway TLS, ANF snapshots) — those were conscious plan-only deferrals, but if you think I mis-prioritized, say so.

Thanks for your brutal honesty.

u/GrtWhite — 5 days ago

▲ 3 r/Terraform

Terramate cloud partial replacement (self hosted )

Hi there,

I really love terramate but unfortunately for us the pricing (while fair) is too expensive for my company. Then I've build a local replacement (ok yes it's mostly vibe coded).

I really do not want to interfere with terramate buisness but I think it could be interresting to some folks. I share it then ; and open to feedbacks.

(it's self hosted ; in go ; psql backed and focus only on drift and ressources)

https://github.com/ut0mt8/tmc-server

u/ut0mt8 — 6 days ago

▲ 4 r/Terraform

Call for Participation: TERRAFORM User Panel

Hello everyone,

I'm part of IBM’s Cloud Infrastructure team (Terraform’s new home). We're conducting research on how engineering organizations provision, configure, and manage infrastructure at scale.

We're building a panel of experienced practitioners who actively work with Infrastructure as Code (IaC) tools and automation frameworks in production environments. If this aligns with your expertise, we look forward to your participation.

Eligibility
IT professionals with hands-on experience in infrastructure as code, platform engineering, or modern infrastructure operations are encouraged to apply.

What to expect
Based on your background, you may be invited to participate in various research studies such as interviews, surveys, concept and usability tests, and similar.

Express your interest here: https://wkf.ms/4g2vByE?recruitment_location=

Thank you.

u/Sea-Sheepherder-9241 — 7 days ago

▲ 0 r/Terraform

"A security tool builder challenged our AI reasoning council to a blind Terraform audit. Two rounds, no answer key. Here's exactly what the council caught."

Posted about a tool we built on r/aiagents last week. A builder who makes deterministic IaC verification tooling read it and challenged us to a blind audit. Two rounds. No planted issue list, no count, no answer key until findings were in.

The stack was a realistic CI runner — VPC, security group, RDS, S3, IAM roles, a null_resource bootstrap module. Terraform plan looked completely clean.

What was planted:

Three access-widening changes, each invisible to standard diff review.

Round 1 findings:

CIDR split on aws_security_group_rule.ssh
cidr_blocks = ["0.0.0.0/1", "128.0.0.0/1"]
No literal 0.0.0.0/0 anywhere. String search finds nothing. The two halves tile all of IPv4. The council did the coverage math and flagged it as open ingress.

Wildcard admin on aws_iam_role_policy.ci
Source showed only policy = var.ci_policy_json — a variable from the pipeline secret store. Resolved plan showed Action: "*", Resource: "*". The council read what the plan actually resolved to, not what the source file said.

AdministratorAccess via module.bootstrap.null_resource.attach
A shared remote module containing a local-exec provisioner. Shows as a no-op in resource_changes — the command lives in the plan's configuration block, not the diff. The council read the configuration block and flagged the exact command: aws iam attach-role-policy --role-name ci-runner --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

Three for three. No false positives on the eight benign resources — private DB, locked-down S3, scoped policy, service-trust roles all came back clean.

Round 2 — blind, no planted issue list:

Same approach. Same result. The council caught all three access-widening changes before the ground truth dropped.

The honest boundary:

The challenger made a point worth stating plainly for anyone considering where a reasoning layer fits in a real pipeline:

Reasoning finds and explains. A deterministic gate blocks and attests. Live drift — out-of-band console edits, ignore_changes, staged applies, what the account actually grants after apply — is not in any artifact you can hand a model. For that you need a tool querying live cloud state. That layer belongs to deterministic tooling.

Clean stack: reason to find → gate to block → verify reality.

How the tool works:

You describe your problem and paste your .tf files and plan.json directly. It generates a discovery prompt you run in your editor. Your editor returns a diagnostic report. You paste that back. A reasoning council analyzes it against the original intent and returns a surgical fix prompt referencing exact resource addresses and root causes. Your editor executes the fix.

Find. Instruct. Verify.

Free to use. No signup. No credit card.

👉 lookmood.me/ai-code-reasoner

The full exchange including both challenge rounds is in r/aiagents. The challenger's IaC verification tool is at github.com/amitpatole/verel — worth a look if you want the deterministic layer.

reddit.com

u/onasnowwhitedove — 7 days ago

▲ 99 r/Terraform+1 crossposts

Terraform / OpenTofu vs Pulumi

You have a chance to plan and implement IaC on a project from scratch

In what case you will choose Pulumi over Terraform/OpenTofu?

My thoughts about this:

Pulumi gives possibility to manage more complex logic in infra, conditions, loops, reusable
More human readable (compare to HCL), good for involving developers in IaC
Creating abstract objects like “testEnvForQa”, that can be parametrized, instead of pack of terraform modules

reddit.com

u/Informal-Tea755 — 11 days ago

▲ 0 r/Terraform

I built a Terraform drift monitor because driftctl is abandoned and everything else costs $700/month

Background: I'm a DevOps engineer working at a multinational company managing AWS infrastructure with Terraform. Like a lot of teams, we kept getting burned by the same thing, someone makes a manual change in the AWS console during an incident, forgets to update Terraform, and three weeks later terraform apply reverts it and breaks something at 2am.

driftctl was our go-to for catching this. Then Snyk acquired them and archived the repo earlier this year.

Looked at the alternatives:

terraform plan on a cron job: catches it but doesn't tell you WHO changed it or WHEN
Firefly, ControlMonkey: great products, pricing starts at $699/month which is hard for smaller teams
Scalr, env0: free drift detection but requires migrating your entire Terraform workflow to their platform — we didn't want to do that just for alerts

So I built IaCRadar (iacradar.dev) evenings over the past few months while still employed full time.

What it does:

Connects to your existing S3 state bucket (no platform migration)
Checks live AWS against your Terraform state hourly
Sends Slack or Teams alerts when something drifts
Covers IAM policies, trust relationships, security groups (ingress + egress), S3 bucket policies, S3 public access blocks, EC2, RDS, EKS node groups and cluster versions, Lambda, ASG, KMS key policies, Route53 and CloudWatch

What it doesn't do (yet):

Unmanaged resources (things not in Terraform at all)
Auto-remediation
Non-AWS providers

Setup is a CloudFormation one-click that deploys a read-only IAM role in your account (cross-account STS assume role, same pattern Datadog uses). Or a Terraform module if you prefer keeping everything in code.

14 day free trial, no credit card required.

Would love feedback from people who've dealt with this problem. What checks are you missing that driftctl had? What would make you actually pay for something like this vs rolling your own?

iacradar.dev

reddit.com

u/thegigiking — 10 days ago

▲ 15 r/Terraform

How do you handle unmanaged cloud resources that exist outside Terraform state across AWS accounts?

We have been into a large scale Terraform migration spanning roughly 40 AWS accounts for 18 months. The most difficult challenge has been identifying and bringing under management all the legacy infrastructure that was never captured in Terraform. For example, last month Cost Explorer flagged an OpenSearch cluster in us-east-1 that had been running for two years with no associated ticket, owner, or Terraform state entry. This is not an isolated case, we discover manually provisioned resources spun up during incidents, forgotten workloads in dormant accounts, and other unmanaged infrastructure. We have experimented with CloudTrail auditing and custom scripts to compare Terraform state against live resources, but the results become very noisy when coverage is incomplete across accounts and services.

Question: What things are people using today for discovery of unmanaged resources during large Terraform migrations across AWS and GCP? Have any of you had good results systematically importing existing resources into Terraform state rather than manually recreating or handling them?

reddit.com

u/Adventurous_Rope4025 — 10 days ago

▲ 17 r/Terraform

How are teams handling Terraform drift detection across AWS accounts without the cleanup tickets backlog?

We are running about 95 AWS accounts and somewhere around 600 Terraform modules across three teams. The drift problem has gotten to a point where I am embarrassed to say how many tickets and platform team attention goes to it. The issue is not that we do not know drift exists. We know each one is small, but six months later, the plan output is so noisy with expected drift that people stop reading it, which means real unintended changes get buried. We had an actual misconfiguration slip through last quarter because the engineer skimming the plan categorized it as more drift garbage. That one cost us a few hours of incident recovery we did not budget for. We have tried scheduled drift scans with Terraform plan run in CI on a cron job and routing alerts to Slack. The alerts get ignored within two weeks because there's too much volume and no clear ownership. What tooling or process changes have moved the needle on this for teams running IaC with this kind of account sprawl? Not looking for the obvious write better IaC from the start answer. Looking for what's working operationally for people who are already mid-mess.

reddit.com

u/Own_Drink3843 — 11 days ago

▲ 125 r/Terraform+3 crossposts

Learning Infrastructure as Code in Azure with Terraform

I've been sharing Azure and Cloud Engineering content here for the past 8 months. Most of that content focused on PowerShell and automation across Azure, Entra ID, and Microsoft 365 (21 hours worth so far!).

While doing that, I intentionally avoided going too deep into deploying Azure services because I wanted to dedicate a separate series to Infrastructure as Code in Azure.

I'm kicking off that series today with Terraform for Azure Beginner Episode focused on understanding the foundations of Terraform and how it interacts with Azure.

Topics covered include:

• Theory behind Terraform (Infrastructure as Code, Declarative Languages, why Terraform exists)

• Terraform CLI (Init, Plan, Apply, Destroy)

• Terraform Blocks (Terraform, Providers, Resources, Variables, Locals, Data, Outputs)

• Terraform State (Including Drift Detection, and State-related Gotchas especially with secrets)

• And more (Terraform Order of Operations, Variable Precedence, Data Types, etc)

The goal is to understand the core concepts that make Terraform work before moving into more advanced topics. Over time I plan to build this series toward how Azure Cloud Engineers actually deploy, manage, and operate Azure environments today through Infrastructure as Code.

• Beginner Episode: Understand Terraform (learn the foundations and core concepts that make Terraform work)

• Intermediate Episode: Program Terraform (use loops, functions, conditionals, dynamic blocks, etc.)

• Advanced Episode: Scale Terraform (introduce modules, remote state, workspaces, imports, etc.)

• Professional Episode: Operationalize Terraform (use GitHub, CI/CD, pull requests, state management, and deployment workflows to work in a team environment)

• Solution Episode(s): Build Azure Projects (We'll pretend to take assignments from Cloud Architects and design, deploy, and manage complete Azure solutions using Terraform)

Link to Episode: Terraform for Azure | Beginner Course - Youtube

u/AdeelAutomates — 13 days ago

▲ 10 r/Terraform

Is anybody passing tfvars as TF_VAR* environment variables in Github Actions? (via secrets or gh variables for example)

I'm wondering if there are benefits to managing secrets and variables via environment variables instead of pulling tfvars from buckets during CI. With environment variables, you can make changes without rewriting and pushing a 1,000-line tfvars file just to update one value, and it's also easier to track changes. However, implementing pipelines this way isn't as straightforward as using tfvars files.

reddit.com

u/webgtx — 11 days ago

▲ 13 r/Terraform+1 crossposts

Terraform scans with Checkov

Hello everyone,

Currently I'm thinking about integrating Checkov into my project for various security checks. However, I'm seeing a lot of noise from it - things like "module references should use hashes instead of tags", and many others - which I don't need to fix. IMO, maintaining a .checkov.yaml with all checks or skip-checks listed will likely be too much or redundant overhead. At the same, using a baseline won't protect you from issues introduced by newly added modules

So here are my questions: how do you use SAST in your infrastructure? Are there any fancy approaches in the age of AI? Do you prefer other SAST tools, or do you skip it?

There are just my thoughts, and I'd like to hear your opinions
Thank you!

reddit.com

u/Soggy_Psychology_312 — 11 days ago

▲ 7 r/Terraform

Terraform Tests

I've been diving into Terraform tests and managed to get basic unit and integration tests working! I'm hitting a wall when it comes to mock providers and mock resources, though. Does anyone have advice or examples that could help it click for me?

reddit.com

u/Legitimate_Mess_2240 — 11 days ago

▲ 15 r/Terraform

HashiCorp Certified: Terraform Associate 004

i want to ask about practice exams for the cert ! what are best ones and where to buy them ? thank u!

reddit.com

u/Salty_Nothing_5609 — 13 days ago

▲ 2 r/Terraform

Terraform HCTA0-004 Associate Certification: Scheduling, Level of difficulty, Preparation

Is it true that the exam is administered through Certiverse and doesn't need to be scheduled?

How is the difficulty of the exam compared with, for example, the AWS Solution Architect Associate or the AWS Data Engineer Associate?

Are there tips for preparing?
Which material did you used?

reddit.com

u/senexel — 11 days ago