I've deployed infrastructure with all four tools, broken production with all of them, and debugged the aftermath at ungodly hours. Here's what really happens, not the polished bullshit on their marketing sites.
Terraform: The Boring Choice That Actually Works
Terraform is like driving a Honda Civic - not exciting, but it gets you there. I've deployed everything from 3-server startups to 500-node Kubernetes clusters with it.
What works: HCL is easy to read during code reviews. Your ops team can learn it in a week. The provider ecosystem is massive - if it has an API, there's probably a Terraform provider for it. Over 3,000 providers covering everything from AWS to GitHub to PagerDuty.
What doesn't: Try doing anything dynamic and you'll want to scream. I spent 4 hours debugging why `count` wouldn't work with computed values. The error was "Error: value of 'count' cannot be computed" - super fucking helpful, right? This is why `for_each` exists, but good luck explaining that to someone learning HCL.
The state file will corrupt eventually. I've seen it happen to every team. You'll be running `terraform state pull | jq` at 2am trying to figure out why your load balancer disappeared from the state but not from AWS. The state corruption GitHub issue has 500+ comments because this happens constantly.
Recent bullshit: Terraform 1.7+ finally fixed some of the count/for_each crap that's been broken since 2019. The "deferred actions" feature helps, but we found out the hard way that it breaks if you use depends_on
with computed values. Broke our CI pipeline three fucking times in two weeks.
Pulumi: For When You're Tired of HCL's Bullshit
Pulumi lets you write actual code. After years of fighting HCL's limitations, being able to use real loops and conditionals feels like freedom. You can write infrastructure in TypeScript, Python, Go, C#, and even Java.
What works: TypeScript autocompletion is a game changer. Unit testing your infrastructure code with Jest feels natural. Complex logic that took 200 lines of HCL becomes 20 lines of TypeScript. The Pulumi Registry has native providers that are way faster than the bridged ones.
What doesn't: The learning curve is brutal if your team doesn't live in an IDE. I literally watched a senior ops engineer with 15 years experience stare at a TypeScript promise chain for 20 minutes and then ask me what .then()
does. When shit breaks, you get stack traces that look like someone vomited Java all over your terminal - especially when the real error is 12 frames deep in some AWS SDK v3 bullshit. And cross-language stack references? Good fucking luck - I've seen grown engineers quit over trying to pass outputs between a TypeScript stack and a Python one.
Recent failure: Our staging went dark for 4 hours because Pulumi decided to set our instance count to zero. Turns out Number("")
returns 0, not 3 like we expected. TypeScript's compiler was totally fine with it because technically it's valid code. I spent the rest of that Friday setting up every fucking ESLint rule Pulumi recommends.
War story: Pulumi's state encryption saved our ass when a contractor accidentally committed AWS credentials to Git. Traditional Terraform state files would have exposed everything in plaintext. The Pulumi Service encrypts secrets by default.
AWS CDK: AWS Lock-in Disguised as Developer Convenience
AWS CDK is perfect if you're all-in on AWS and never plan to leave. It generates CloudFormation templates, which means you get AWS's change management for free, but also inherit all of CloudFormation's limitations. The CDK Construct Hub has over 7,000 reusable components.
What works: New AWS features are available immediately through the AWS Construct Library. The constructs are well-designed - creating an entire VPC with subnets, route tables, and NAT gateways is 5 lines of code. The generated CloudFormation is actually readable, unlike hand-written templates.
What doesn't: The 500-resource limit per CloudFormation stack will bite you in the ass. We had to split our infrastructure into 8 different CDK apps just to stay under the limit. Cross-stack references become a nightmare when you have circular dependencies.
Production disaster: CDK spit out an 850KB CloudFormation template. AWS chokes on anything over 1MB (used to be 450KB before 2020), so our deployment just... stopped working. At 2am on a Tuesday. CDK tries to be smart and uploads big templates to S3, but our deployment role couldn't write to S3. The error? "Unable to upload template" - that's it. Took me 4 hours of digging through CloudTrail logs to figure out the permissions issue.
Version hell: CDK v1 to v2 migration broke every construct we'd written. The import paths changed, the APIs changed, even the basic app structure changed. It was like rewriting everything from scratch. The migration guide is 50 pages long for a reason.
OpenTofu: Terraform Without the Corporate Overlords
OpenTofu is what Terraform should have stayed - truly open source. It's 100% compatible with existing Terraform code, which means migration is literally s/terraform/tofu/g
. The Linux Foundation backs it, so no surprise license changes.
What works: Drop-in replacement for Terraform. Recent releases fixed security vulnerabilities and improved state encryption. All your existing modules, providers, and state files work unchanged. The community governance model means no surprise license changes.
What doesn't: It's still Terraform, so all the same gotchas apply. State file corruption, limited HCL capabilities, and debugging nightmares are all still there. The community is smaller, so finding help can be harder - check the OpenTofu Slack instead of Stack Overflow.
The license drama: HashiCorp changed Terraform to BSL in 2023, which means you can't use it in competing products. Most companies don't care, but if you're building infrastructure tooling or SaaS platforms, OpenTofu is your escape hatch. Read the license FAQ if you're paranoid.
Team Reality Check
Your choice depends more on your team than the technology:
- Ops teams that manage infrastructure: Stick with Terraform/OpenTofu. HCL is configuration, not programming.
- Dev teams doing infrastructure: Go with Pulumi or CDK. Being able to write actual code is worth the learning curve.
- Mixed teams: CDK if you're AWS-only, Terraform if you're multi-cloud. Pulumi if half your team are developers.
I've seen companies switch tools three times in two years trying to find the "perfect" solution. The perfect tool is the one your team will actually use correctly.
Alright, you've seen the carnage each tool can create. Now let's talk money and the shit they don't put on their pricing pages.