What Actually Happens When You Pick Each Tool

Reality Check

Terraform

Pulumi

AWS CDK

OpenTofu

What you'll be writing

HCL (looks like JSON had a baby with YAML)

Real code (TypeScript, Python, etc.)

TypeScript that generates CloudFormation

Same HCL as Terraform

How often it breaks

State corruption every 6 months

TypeScript errors make you question life

CloudFormation templates hit 500-resource limit

Same issues as Terraform

3am debugging involves

terraform state rm and praying

Stack traces from hell

Reading 10MB CloudFormation errors

OpenTofu forums + old TF docs

Cloud support

Works everywhere, slowly

Works most places, faster

AWS only, but actually works

Same as Terraform without vendor lock-in

Learning curve

Learn HCL + 200 providers

Learn programming + infrastructure + Pulumi

Learn AWS + programming + CDK + CloudFormation

Copy your Terraform configs

State file nightmares

Every fucking week

Handled for you (thank god)

CloudFormation's problem now

Same weekly nightmares

When it works

Bulletproof for simple stuff

Great for complex logic

Perfect AWS integration

Terraform without the corporate overlords

License situation

Vendor lock-in disguised as BSL

Actually open source

AWS owns your soul anyway

Truly open source

What I Learned Deploying These Tools in Production

I've deployed infrastructure with all four tools, broken production with all of them, and debugged the aftermath at ungodly hours. Here's what really happens, not the polished bullshit on their marketing sites.

Terraform: The Boring Choice That Actually Works

Terraform Logo

Terraform is like driving a Honda Civic - not exciting, but it gets you there. I've deployed everything from 3-server startups to 500-node Kubernetes clusters with it.

What works: HCL is easy to read during code reviews. Your ops team can learn it in a week. The provider ecosystem is massive - if it has an API, there's probably a Terraform provider for it. Over 3,000 providers covering everything from AWS to GitHub to PagerDuty.

What doesn't: Try doing anything dynamic and you'll want to scream. I spent 4 hours debugging why `count` wouldn't work with computed values. The error was "Error: value of 'count' cannot be computed" - super fucking helpful, right? This is why `for_each` exists, but good luck explaining that to someone learning HCL.

The state file will corrupt eventually. I've seen it happen to every team. You'll be running `terraform state pull | jq` at 2am trying to figure out why your load balancer disappeared from the state but not from AWS. The state corruption GitHub issue has 500+ comments because this happens constantly.

Recent bullshit: Terraform 1.7+ finally fixed some of the count/for_each crap that's been broken since 2019. The "deferred actions" feature helps, but we found out the hard way that it breaks if you use depends_on with computed values. Broke our CI pipeline three fucking times in two weeks.

Pulumi: For When You're Tired of HCL's Bullshit

Pulumi Homepage

Pulumi Infrastructure as Code

Pulumi lets you write actual code. After years of fighting HCL's limitations, being able to use real loops and conditionals feels like freedom. You can write infrastructure in TypeScript, Python, Go, C#, and even Java.

What works: TypeScript autocompletion is a game changer. Unit testing your infrastructure code with Jest feels natural. Complex logic that took 200 lines of HCL becomes 20 lines of TypeScript. The Pulumi Registry has native providers that are way faster than the bridged ones.

What doesn't: The learning curve is brutal if your team doesn't live in an IDE. I literally watched a senior ops engineer with 15 years experience stare at a TypeScript promise chain for 20 minutes and then ask me what .then() does. When shit breaks, you get stack traces that look like someone vomited Java all over your terminal - especially when the real error is 12 frames deep in some AWS SDK v3 bullshit. And cross-language stack references? Good fucking luck - I've seen grown engineers quit over trying to pass outputs between a TypeScript stack and a Python one.

Recent failure: Our staging went dark for 4 hours because Pulumi decided to set our instance count to zero. Turns out Number("") returns 0, not 3 like we expected. TypeScript's compiler was totally fine with it because technically it's valid code. I spent the rest of that Friday setting up every fucking ESLint rule Pulumi recommends.

War story: Pulumi's state encryption saved our ass when a contractor accidentally committed AWS credentials to Git. Traditional Terraform state files would have exposed everything in plaintext. The Pulumi Service encrypts secrets by default.

AWS CDK: AWS Lock-in Disguised as Developer Convenience

AWS CDK Logo

AWS CDK is perfect if you're all-in on AWS and never plan to leave. It generates CloudFormation templates, which means you get AWS's change management for free, but also inherit all of CloudFormation's limitations. The CDK Construct Hub has over 7,000 reusable components.

What works: New AWS features are available immediately through the AWS Construct Library. The constructs are well-designed - creating an entire VPC with subnets, route tables, and NAT gateways is 5 lines of code. The generated CloudFormation is actually readable, unlike hand-written templates.

What doesn't: The 500-resource limit per CloudFormation stack will bite you in the ass. We had to split our infrastructure into 8 different CDK apps just to stay under the limit. Cross-stack references become a nightmare when you have circular dependencies.

Production disaster: CDK spit out an 850KB CloudFormation template. AWS chokes on anything over 1MB (used to be 450KB before 2020), so our deployment just... stopped working. At 2am on a Tuesday. CDK tries to be smart and uploads big templates to S3, but our deployment role couldn't write to S3. The error? "Unable to upload template" - that's it. Took me 4 hours of digging through CloudTrail logs to figure out the permissions issue.

Version hell: CDK v1 to v2 migration broke every construct we'd written. The import paths changed, the APIs changed, even the basic app structure changed. It was like rewriting everything from scratch. The migration guide is 50 pages long for a reason.

OpenTofu: Terraform Without the Corporate Overlords

OpenTofu Logo

OpenTofu is what Terraform should have stayed - truly open source. It's 100% compatible with existing Terraform code, which means migration is literally s/terraform/tofu/g. The Linux Foundation backs it, so no surprise license changes.

What works: Drop-in replacement for Terraform. Recent releases fixed security vulnerabilities and improved state encryption. All your existing modules, providers, and state files work unchanged. The community governance model means no surprise license changes.

What doesn't: It's still Terraform, so all the same gotchas apply. State file corruption, limited HCL capabilities, and debugging nightmares are all still there. The community is smaller, so finding help can be harder - check the OpenTofu Slack instead of Stack Overflow.

The license drama: HashiCorp changed Terraform to BSL in 2023, which means you can't use it in competing products. Most companies don't care, but if you're building infrastructure tooling or SaaS platforms, OpenTofu is your escape hatch. Read the license FAQ if you're paranoid.

Team Reality Check

Your choice depends more on your team than the technology:

  • Ops teams that manage infrastructure: Stick with Terraform/OpenTofu. HCL is configuration, not programming.
  • Dev teams doing infrastructure: Go with Pulumi or CDK. Being able to write actual code is worth the learning curve.
  • Mixed teams: CDK if you're AWS-only, Terraform if you're multi-cloud. Pulumi if half your team are developers.

I've seen companies switch tools three times in two years trying to find the "perfect" solution. The perfect tool is the one your team will actually use correctly.

Alright, you've seen the carnage each tool can create. Now let's talk money and the shit they don't put on their pricing pages.

The Shit They Don't Tell You: Real Feature Comparison

Pain Point

Terraform

Pulumi

AWS CDK

OpenTofu

State File Corruption

Weekly occurrence

Handled by service

CloudFormation's problem

Same as Terraform

Provider Lag

Months behind AWS

Weeks behind AWS

Day-0 AWS support

Same as Terraform

Debugging Hell

HCL stack traces are useless

TypeScript stack traces from hell

CloudFormation errors are novels

Same debugging nightmares

Resource Limits

None (until your laptop dies)

None (until your wallet dies)

500 resources per stack

None (until your laptop dies)

Import Existing Resources

terraform import works 60% of the time

pulumi import mostly works

cdk import is hit or miss

tofu import same as terraform

Plan Takes Forever

15+ minutes with 1000+ resources

Usually fast

CloudFormation change sets take forever

15+ minutes with 1000+ resources

Parallelism Broken

Default parallelism=10 breaks things

Smart parallelism works

CloudFormation handles it

Default parallelism=10 breaks things

How to Pick a Tool Without Getting Fired

I've been through three tool migrations in five years.

Here's what I learned about making decisions that won't ruin your career.

Team Skills Trump Everything

The biggest mistake I see is choosing tools based on technical features instead of team capabilities. I watched a CTO pick Pulumi because "Type

Script is the future" while his entire ops team had zero programming experience.

Six months later, they were back to Terraform after wasting $200K in consulting fees.

Reality check: Your senior ops engineer who's been managing infrastructure for 10 years isn't going to become a TypeScript developer overnight.

And your JavaScript developers aren't going to suddenly understand VPC routing just because they can write loops.

What actually works:

Let them use real programming languages.

The AWS vs Multi-Cloud Decision

Most startups claim they're going multi-cloud.

They're usually bullshitting themselves.

Pick CDK if you're AWS-only and admit it. The deep AWS integration is worth the lock-in.

When EKS adds a new feature, CDK supports it the same day. Terraform providers lag by months.

Pick Terraform/OpenTofu if you actually deploy to multiple clouds. Not because you might someday, but because you already do.

The provider ecosystem is unmatched

Pick Pulumi if you're multi-cloud and your team codes. The abstraction layer helps when you need to deploy the same app to AWS and GCP.

Migration Costs Are Always Higher Than You Think

I told my CTO our Terraform → Pulumi migration would take 3 months. 8 months later, we were still debugging provider edge cases and our consultant was dodging my calls.

That $200K project became a $500K nightmare.

What we underestimated:

  • Converting modules and shared libraries
  • Retraining the team
  • Debugging provider differences
  • Updating all our documentation and runbooks
  • CI/CD pipeline changes

The 2-week rule: If you can't migrate a representative sample of your infrastructure in 2 weeks, multiply your estimate by 3.

Budget Reality vs Marketing Claims

In my experience, free tiers never stay free at scale.

Terraform Cloud starts at $20/user/month but you'll need the $70/user tier for policy enforcement and advanced features.

With 50 engineers, that's $3,500/month.

Pulumi Cloud starts at $40/month but hits you with 18¢ per resource over 500.

We deployed a medium-sized K8s cluster and our bill jumped to $800/month overnight

  • every pod, service, and ingress counts as a resource. The "Individual" tier gives you 500 deployment minutes, which sounds generous until you realize a full deployment takes 45 minutes and you deploy 3 times a day during development.

AWS CDK has no platform cost, but your AWS bill will explode if you're not careful.

CDK makes it too easy to create expensive resources.

OpenTofu is actually free, but you pay in operational overhead.

You're responsible for runners, state storage, and backup strategies.

The License Trap

HashiCorp's license change in 2023 blindsided everyone.

Companies using Terraform in SaaS products technically need commercial licenses now.

Most companies don't care. You're probably fine if you're just managing your own infrastructure.

You should care if:

  • You're building infrastructure tooling as a product
  • You're a managed service provider
  • Your legal team is paranoid about vendor licensing

OpenTofu exists for a reason.

The Linux Foundation backing means no surprise license changes.

Version Hell and Breaking Changes

Terraform: Every major version breaks something. 0.12 → 0.13 → 0.14 → 1.0 each required rewriting parts of our codebase.

Pulumi: Rapid development means frequent breaking changes.

I've seen APIs change between minor versions.

CDK: The v1 → v2 migration was brutal.

Basically rewrote everything.

OpenTofu: Same breaking changes as Terraform since they maintain compatibility.

What Success Actually Looks Like

After five years of tool migrations, successful deployments have the same characteristics:

  1. The team understands the tool
    • Not just senior engineers, everyone who might need to debug at 3am
  2. Clear ownership model
    • Who writes infrastructure code vs who reviews it vs who operates it
  3. Standardized patterns
    • Cookie-cutter templates for common use cases
  4. Disaster recovery procedures
    • How to rebuild from scratch when everything breaks
  5. Gradual adoption
    • Start with non-critical environments, prove it works

My Recommendation Process

When teams ask me what tool to pick, I ask these questions:

  1. Who will be on-call when this breaks? Pick the tool that person is comfortable with.
  2. Are you actually multi-cloud today? Not planning to be, but actually are.
  3. How complex is your infrastructure logic? Simple resources vs dynamic configurations.
  4. What's your team's programming skill level? Be honest about this.
  5. How risk-averse is your organization? Boring solutions are often right.

The most successful migrations I've seen were driven by real pain points, not technology trends. If your current tool works, don't change it just because something newer exists.

Starting fresh? Pick based on what your team actually knows, not what looks cool on Hacker News. You can migrate later when you hit real limits, not imaginary ones.

No matter which tool you pick, shit's going to break. And when it does at 3am on Saturday morning while you're trying to enjoy a beer, here are the questions you'll actually be googling and the answers that might save your weekend.

Questions You'll Actually Ask at 3am

Q

My Terraform state file is corrupted and production is down. What do I do?

A

First, don't panic and don't run terraform apply blindly.

Quick fix:

  1. terraform state pull > backup.tfstate
  • save what you have
  1. terraform state list
  • see what's actually tracked
  1. terraform import the critical resources that are missing
  2. terraform plan to see the damage before fixingNuclear option: terraform state rm everything and re-import, but you'll lose all the metadata.

Pulumi users: This is why we switched.

State corruption is handled by the service, not your laptop.CDK users: Cloud

Formation handles state, so this isn't your problem.

Q

Why is my terraform plan taking 25 minutes?

A

Because Terraform queries every single resource to check its current state, and AWS APIs are slow.

Immediate relief:

  • terraform plan -parallelism=20

  • pump up the concurrency from the pathetic default of 10

  • terraform plan -target=specific.resource

  • only check what you actually changed

  • terraform plan -refresh=false

  • skip the state refresh if you're sure nothing changedLonger fixes:

  • Break your monolith into smaller configs before you lose your mind

  • Use remote state for shared shit so teams aren't stepping on each other

  • Switch to Pulumi if you're tired of waiting

  • their engine is way fasterCDK users: Cloud

Formation change sets are also slow, but at least it's AWS's problem.

Q

Which tool should I pick if I want to sleep at night?

A

Terraform if your team knows HCL and you don't mind state file babysitting. It's boring but predictable.CDK if you're AWS-only. CloudFormation handles state management and AWS takes the blame when things break.OpenTofu if you want Terraform without the licensing drama. Same stability, no vendor lock-in.Avoid Pulumi if your on-call team doesn't write code. TypeScript stack traces at 3am are not fun.

Q

My CDK deployment is failing with "Template too large" errors. WTF?

A

AWS chokes on CloudFormation templates bigger than 1MB (used to be 450KB before they increased it in 2020).

CDK generated something massive and AWS is having none of it.Immediate fix:bash# CDK automatically uses S3 for large templatescdk deploy --require-approval neverIf that fails: 1.

Split your stack into multiple smaller stacks 2. Use cdk synth to see the generated CloudFormation 3. Look for repeated inline policies or large data sectionsLong-term fix: Redesign your constructs to be smaller and more focused.

Q

Can I migrate from Terraform to OpenTofu without breaking everything?

A

Yes, it's a drop-in replacement.

Migration steps:

  1. brew install opentofu (or grab the binary from GitHub releases)2.

Find-replace terraform with tofu in all your scripts/CI configs 3. tofu init -migrate-state

  • it'll ask nicely before touching your state
  1. tofu plan to make sure nothing's fucked upGotcha: Don't forget your CI/CD pipeline, Git

Hub Actions, Docker images, etc.

They all need tofu now.Time estimate: Half day for the migration if you're organized, 2 weeks to find all the places you forgot to update.

Q

Why does Pulumi cost so much compared to the others?

A

Because you're paying for the managed state service and compute resources.

Free tier reality: 2000 resources sounds generous until you deploy one medium K8s cluster and blow through it in a day.

Each pod, service, ingress, configmap

  • they all count.Cost breakdown (reality check):

  • Pulumi Team: $50/month base + resource overages

  • Pulumi Business: $100/month + compute costs for deployments

  • Terraform Cloud: $20-70/user/month (20 users = $400-1400/month)

  • OpenTofu: $0 but you manage everythingHidden cost: Pulumi deployments run in their cloud, so complex deployments cost more compute time.

Q

My terraform apply is stuck. How do I force it to continue?

A

Don't force it. Terraform is probably waiting on a resource that's taking forever to create/update.

Safe debugging: 1.

Check AWS Console to see what's actually happening 2. terraform show to see the current state 3. Wait it out if AWS is just being slow (ELB creation takes 5+ minutes)Nuclear options (dangerous):

  • terraform apply -lock=false to bypass locking
  • terraform force-unlock LOCK_ID if you're sure no other process is running
  • terraform taint resource.name then terraform apply to force recreationBetter solution: Set realistic timeouts in your resource configurations.
Q

Which tool has the least vendor lock-in?

A

**Open

Tofu**

  • Linux Foundation governance, truly open source, no corporate owner.Pulumi
  • Open source with commercial service, but you can self-host the backend.Terraform
  • Open core but controlled by HashiCorp, BSL license limits some uses.AWS CDK
  • Completely locked into AWS, generates CloudFormation templates you can't easily port.

Reality check: All IaC tools create some lock-in through your configuration code. The bigger risk is operational knowledge lock-in with your team.

Q

Should I use workspaces or separate state files?

A

Separate state files. Workspaces are confusing and error-prone.

Why workspaces suck:

  • Easy to accidentally deploy to the wrong workspace
  • State corruption affects all environments
  • Hard to give different teams access to different environmentsBetter pattern:environments/├── dev/├── staging/├── prod/Each directory has its own state file and configuration. More code, but impossible to accidentally destroy prod.
Q

My team wants to switch from Terraform to Pulumi. Should we?

A

**Don't fucking switch unless you have a real problem that's costing you sleep.**Good reasons to switch:

  • Your infrastructure logic is too complex for HCL

  • Your team is primarily developers who want real programming languages

  • You need better testing and CI/CD integrationBad reasons to switch:

  • "TypeScript is more modern" (HCL works fine)

  • "The new developer prefers Pulumi" (train the developer)

  • "It looks cooler in demos" (you'll regret this)Reality: Migration will take 3x longer than estimated and cost more than you think.

Q

How do I debug CloudFormation errors from CDK?

A

Step 1: cdk synth to see the generated CloudFormation templateStep 2: Check the CloudFormation console for the actual error (CDK output is often useless)Step 3: Look for the most common issues:

  • IAM permissions missing
  • Resource limits exceeded (500 resources per stack)
  • Circular dependencies between resources
  • Names that are too long (63 character limit for many AWS resources)Step 4: Add more granular error handling in your CDK codePro tip: Enable CloudTrail to see exactly what AWS API calls are failing.

Related Tools & Recommendations

tool
Recommended

AWS CDK - Finally, Infrastructure That Doesn't Suck

Write AWS Infrastructure in TypeScript Instead of CloudFormation Hell

AWS Cloud Development Kit
/tool/aws-cdk/overview
100%
alternatives
Recommended

Terraform Alternatives by Performance and Use Case - Which Tool Actually Fits Your Needs

Stop choosing IaC tools based on hype - pick the one that performs best for your specific workload and team size

Terraform
/alternatives/terraform/performance-focused-alternatives
68%
tool
Recommended

Terraform - Define Infrastructure in Code Instead of Clicking Through AWS Console for 3 Hours

The tool that lets you describe what you want instead of how to build it (assuming you enjoy YAML's evil twin)

Terraform
/tool/terraform/overview
68%
tool
Recommended

Pulumi Cloud for Platform Engineering - Build Self-Service Infrastructure at Scale

competes with Pulumi Cloud

Pulumi Cloud
/tool/pulumi-cloud/platform-engineering-guide
65%
tool
Recommended

Pulumi Cloud Enterprise Deployment - What Actually Works in Production

When Infrastructure Meets Enterprise Reality

Pulumi Cloud
/tool/pulumi-cloud/enterprise-deployment-strategies
65%
tool
Recommended

Pulumi - Write Infrastructure in Real Programming Languages

competes with Pulumi

Pulumi
/tool/pulumi/overview
65%
tool
Recommended

Ansible - Push Config Without Agents Breaking at 2AM

Stop babysitting daemons and just use SSH like a normal person

Ansible
/tool/ansible/overview
64%
tool
Recommended

Red Hat Ansible Automation Platform - Ansible with Enterprise Support That Doesn't Suck

If you're managing infrastructure with Ansible and tired of writing wrapper scripts around ansible-playbook commands, this is Red Hat's commercial solution with

Red Hat Ansible Automation Platform
/tool/red-hat-ansible-automation-platform/overview
64%
integration
Recommended

Stop manually configuring servers like it's 2005

Here's how Terraform, Packer, and Ansible work together to automate your entire infrastructure stack without the usual headaches

Terraform
/integration/terraform-ansible-packer/infrastructure-automation-pipeline
64%
tool
Recommended

AWS CDK Production Deployment Horror Stories - When CloudFormation Goes Wrong

Real War Stories from Engineers Who've Been There

AWS Cloud Development Kit
/tool/aws-cdk/production-horror-stories
63%
alternatives
Recommended

GitHub Actions Alternatives for Security & Compliance Teams

integrates with GitHub Actions

GitHub Actions
/alternatives/github-actions/security-compliance-alternatives
61%
tool
Recommended

GitHub Actions Marketplace - Where CI/CD Actually Gets Easier

integrates with GitHub Actions Marketplace

GitHub Actions Marketplace
/tool/github-actions-marketplace/overview
61%
alternatives
Recommended

Tired of GitHub Actions Eating Your Budget? Here's Where Teams Are Actually Going

integrates with GitHub Actions

GitHub Actions
/alternatives/github-actions/migration-ready-alternatives
61%
tool
Recommended

Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)

Google runs your Kubernetes clusters so you don't wake up to etcd corruption at 3am. Costs way more than DIY but beats losing your weekend to cluster disasters.

Google Kubernetes Engine (GKE)
/tool/google-kubernetes-engine/overview
60%
review
Recommended

Kubernetes Enterprise Review - Is It Worth The Investment in 2025?

integrates with Kubernetes

Kubernetes
/review/kubernetes/enterprise-value-assessment
60%
troubleshoot
Recommended

Fix Kubernetes Pod CrashLoopBackOff - Complete Troubleshooting Guide

integrates with Kubernetes

Kubernetes
/troubleshoot/kubernetes-pod-crashloopbackoff/crashloop-diagnosis-solutions
60%
pricing
Recommended

Infrastructure as Code Pricing Reality Check: Terraform vs Pulumi vs CloudFormation

What these IaC tools actually cost you in 2025 - and why your AWS bill might double

Terraform
/pricing/terraform-pulumi-cloudformation/infrastructure-as-code-cost-analysis
59%
tool
Recommended

Jenkins Production Deployment - From Dev to Bulletproof

integrates with Jenkins

Jenkins
/tool/jenkins/production-deployment
56%
integration
Recommended

GitHub Actions + Jenkins Security Integration

When Security Wants Scans But Your Pipeline Lives in Jenkins Hell

GitHub Actions
/integration/github-actions-jenkins-security-scanning/devsecops-pipeline-integration
56%
tool
Recommended

Jenkins - The CI/CD Server That Won't Die

integrates with Jenkins

Jenkins
/tool/jenkins/overview
56%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization