Exit code 1 is the computer equivalent of rage-quitting. Your app starts, something pisses it off immediately, and it just gives up. Unlike the more helpful exit codes (137 means out of memory, 143 means it was shut down gracefully), exit code 1 just means "something went wrong and I'm done."
I've debugged this nightmare more times than I can count. Your container works perfectly on Docker Desktop, passes all your tests, deploys without errors - then immediately dies with exit code 1 and goes into that maddening CrashLoopBackOff cycle.
The worst part? The error message is usually useless: "Error: Error occurred" or just nothing at all.
The Three Things That Actually Cause Exit Code 1
After years of debugging this crap, 95% of exit code 1 crashes come down to three things:
1. Missing environment variable - Your app expects DATABASE_URL
but Kubernetes doesn't have it
2. Can't connect to database/service - Postgres isn't ready yet, or the service name is wrong
3. File permissions are fucked - Your app can't read config files or write logs
Everything else is just variations of these three problems.
War Stories: When Exit Code 1 Ruined My Day
The Missing Environment Variable That Took Down Production
Last month, I deployed a Node.js API to production. Worked perfectly in staging. Five minutes after deployment, CrashLoopBackOff with exit code 1.
The logs showed: Error: JWT_SECRET environment variable is required
Turns out the staging ConfigMap had JWT_SECRET
but production had JWT_TOKEN
. Same fucking value, different key name. App couldn't start without it.
Took me 2 hours to figure this out because I kept assuming the ConfigMap was identical between environments.
The Database That "Was Ready" But Actually Wasn't
Init container checked that Postgres was accepting connections. Main app container started immediately after and died with "connection refused".
Postgres was accepting connections but wasn't actually ready to handle queries yet. Init container passed, main app failed. Spent 3 hours debugging this before adding a 10-second delay after the init container.
Learned the hard way: pg_isready
doesn't mean "ready for your application queries."
The File Permissions Nightmare
Deployed with readOnlyRootFilesystem: true
for security. App tried to create a temp file and crashed with "Permission denied". The filesystem was read-only but the app needed to write logs.
Had to mount an emptyDir
volume at /tmp
to give the app somewhere to write. Should have been obvious but it wasn't. Cost me a whole afternoon.
The Commands That Actually Work
When your app is stuck in CrashLoopBackOff with exit code 1, start with these:
## Get the actual error message (if there is one)
kubectl logs <pod-name> --previous
## Check if environment variables are missing
kubectl exec <pod-name> -- env | grep -E "(DATABASE|API|SECRET)"
## Test connectivity to services your app needs
kubectl exec <pod-name> -- nc -zv database-service 5432
## Check file permissions if your app writes files
kubectl exec <pod-name> -- ls -la /app/
kubectl exec <pod-name> -- id
90% of the time, one of these commands shows you exactly what's wrong. The other 10% of the time, you're fucked and need to add debug logging to your app.
Version-Specific Gotchas That Will Bite You
Node.js 18.x changed environment variable behavior - If you're using dotenv
, it now throws errors for missing variables that it used to ignore. Your app might work locally with Node 16 but crash with exit code 1 on Node 18.
Kubernetes 1.25+ deprecated Docker runtime - If you're still using Docker as the container runtime, some volume mount permissions might be different now with containerd.
Python 3.11 import changes - Some packages that worked in Python 3.9 now fail to import in 3.11, causing immediate exit code 1 crashes.
Don't ask me how I know all of this.
The Debug Process Flow That Actually Works
- Look at the logs first -
kubectl logs <pod-name> --previous
- Check environment variables -
kubectl exec <pod-name> -- env
- Test connectivity -
kubectl exec <pod-name> -- nc -zv service-name port
- Check file permissions -
kubectl exec <pod-name> -- ls -la /app/
- Nuclear option - Override the command to keep container alive for debugging
This process catches 95% of exit code 1 issues in under 10 minutes.
Now that you understand what causes exit code 1 crashes and how to diagnose them, let's move on to the actual fixes that work in production.
Essential debugging links:
- Kubernetes Pod troubleshooting - Actually useful, unlike most K8s docs
- ConfigMap debugging guide - For when your env vars are fucked
- Application debugging - The nuclear option debugging guide
- Troubleshooting clusters - When everything is broken
- Container runtime debugging - For the really weird edge cases
- Pod lifecycle docs - Understanding why your pod keeps restarting
- Troubleshooting DNS - When service names don't resolve
- Secret troubleshooting - When your secrets aren't actually secret or available
- Volume mount issues - For permission problems
- Resource limits debugging - When your limits are too restrictive
- Exit code meanings - Official exit code documentation
- Container lifecycle hooks - PreStop and PostStart hook failures
- Init container troubleshooting - When dependency checks fail
- Security context issues - User and permission problems
- Environment variable injection - All the ways env vars can break