Everyone knows terraform init
, plan
, and apply
. But when production breaks and you're staring at a state file corruption error at 2AM, the basic commands won't save your ass. You need the CLI commands they don't teach in tutorials.
State corruption has fucked me over more times than I care to count. Had it take down staging once because someone ran the wrong command. Another time got stuck in a deployment because DynamoDB decided to be DynamoDB. Most recent was my own damn fault - botched an import and spent way too much time fixing it manually.
Here's the CLI arsenal that kept me employed after those disasters.
What Changed That Actually Matters
Terraform 1.14 alpha finally fixed some shit that's been broken since the dawn of time. Testing doesn't randomly panic every other Tuesday, and containers won't murder your CI anymore.
Testing Framework: Went from complete garbage to merely frustrating. Used to fail randomly and take 20 minutes to teardown even simple tests. The parallel cleanup actually works now instead of hanging forever.
Container Performance: Terraform 1.14 alpha detects container resource limits automatically. I used to manually set -parallelism=2
on our GitHub Actions runners because the default would max out CPU and timeout everything. Now it figures this out without me babysitting it.
Import Improvements: Workspace variables and inherited variable sets actually work during imports now. Before this, imports would fail in weird ways if you used Terraform Cloud workspaces.
Error Messages: Type checking errors are slightly less useless now. Still not great, but at least they tell you what went wrong instead of just "Error: Error".
Essential CLI Commands Beyond the Basics
terraform console
- The Underdog Command
Nobody talks about terraform console
but it's saved my ass more times than I can count. When you're debugging some horrific HCL expression and the error message is useless as shit:
$ terraform console
> length(var.availability_zones)
3
> [for zone in var.availability_zones : "${var.region}${zone}"]
["us-west-2a", "us-west-2b", "us-west-2c"]
> exit
Gotcha: If your config has syntax errors, console won't start. You'll get some useless error message and wonder why. Comment out the broken resources first, then test your expressions.
State Surgery (When Everything Goes to Hell)
State corruption happens to everyone. Usually at the worst possible time. Here's how to fix it without making things worse:
## List all tracked resources
terraform state list
## Inspect specific resource state
terraform state show aws_instance.web
## Move resources between state addresses
terraform state mv aws_instance.old aws_instance.new
## Remove resources from state (without destroying)
terraform state rm aws_instance.legacy
Import Hell (Legacy Infrastructure)
Importing existing infrastructure into Terraform is like trying to reverse-engineer someone else's spaghetti code. Here's how to do it without losing your mind:
## Basic import - write the resource config first, THEN import
terraform import aws_instance.web i-1234567890abcdef0
## Import with variables (this used to be broken)
terraform import -var="environment=prod" aws_rds_cluster.main cluster-id
Write the Terraform config first, then import. I wasted 3 hours trying to figure out why my imported resource kept showing drift before I realized I had the resource definition wrong.
For bulk imports, use Terraformer. It's not perfect but beats importing 200 resources manually.
Debugging and Performance Optimization
Debugging When You're Getting Paged at 3AM
Nothing ruins a good night's sleep like getting paged because Terraform broke production. Here's how to debug without losing what's left of your sanity:
## Enable debug logging (prepare for 50MB of logs)
export TF_LOG=DEBUG
export TF_LOG_PATH=terraform.log
terraform apply
## Provider-specific logging (when AWS decides to be special)
export TF_LOG_PROVIDER=DEBUG
export TF_LOG_CORE=ERROR
Debug logs are huge and 99% garbage. But when AWS is returning cryptic 500 errors at 3AM and you're trying not to get fired, they're the only thing that'll show you what's actually happening.
Performance Optimization Techniques
Parallelism Control:
## Reduce parallelism when providers are slow/throttling
terraform apply -parallelism=5
## Increase for small deployments with independent resources
terraform apply -parallelism=20
Skip Refresh When You Know State is Good:
## Skip refresh to save time
terraform apply -refresh=false
## Refresh only specific resources
terraform apply -refresh-only -target=aws_instance.web
Container Performance Issues
Terraform 1.14 alpha finally figured out that containers exist. Used to be if you didn't set -parallelism=2
manually, it would spawn like 20 threads on a 2-CPU GitHub Actions runner and just hang there eating CPU until the job timed out. Took them years to fix this obvious shit.
Testing Framework (Finally Doesn't Suck)
The testing framework used to be complete garbage - tests would randomly fail and take 20 minutes to teardown. Recent versions fixed the worst issues:
File-Level Variable Management
## test/main.tftest.hcl
variables {
environment = "test"
region = "us-west-1"
}
run "validate_vpc" {
command = plan
assert {
condition = aws_vpc.main.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR must be 10.0.0.0/16 for test environment"
}
}
Parallel Teardown (Finally)
Tests don't take forever to clean up anymore:
## Run tests - teardown is now parallel
terraform test -verbose
Tests still randomly fail sometimes (looking at you, AWS provider), but at least they don't take 20 minutes to tell you they failed.
These are the commands that separate the folks who know what they're doing from the people who just blindly copy-paste Stack Overflow answers. Learn them well, because your infrastructure will break, and when it does, you'll need more than basic terraform apply
to save the day.