Your 3AM Debugging Toolkit:
Commands That Actually Work
When your Helm deployment is broken and you're troubleshooting in production, you need tools that work fast and reliably. Here's your debugging arsenal based on real production experience. The official Helm debugging guide is helpful but lacks the nuclear options you need at 3am.### The Debugging Flow That Works**Step 1:
Generate the YAML First**bashhelm template myapp ./my-chart --debug > rendered.yaml
This is your first line of defense. The --debug
flag shows you exactly what Helm is generating, including all the variable substitutions. If this command fails, you have a template syntax error. The template function reference helps decode complex template failures.**Step 2:
Validate Against Kubernetes**bashhelm install myapp ./my-chart --dry-run --debug
This connects to your Kubernetes API server and validates the resources without actually creating them. Different from `helm template` because it checks API versions and resource schemas.
The Kubernetes API reference shows which fields are valid for each resource type.**Step 3:
Check What Actually Got Created**bashhelm status myapp --show-resourceskubectl describe all -l app.kubernetes.io/instance=myapp
The first command shows Helm's view of the deployment. The second shows what Kubernetes actually created, including error messages. Use the kubectl describe reference to understand the full command syntax options.### Template Debugging Hell (And How to Escape It)The Go templating language in Helm is unnecessarily complex.
The Go template documentation explains the syntax, but Sprig function library adds most of the useful functions.
Here's how to debug template issues without losing your sanity:**Problem:
Variables Not Expanding**```yaml# This breaks silentlyreplicas: {{ .
Values.replicaCount }}Use the `required` function to catch missing values:
yaml# This fails loudly when missingreplicas: {{ required "replicaCount is required" .
Values.replicaCount }}```Side note: I cannot fucking stress this enough
- always use
required
for critical values.
I've seen production deployments succeed with 0 replicas because someone forgot to set the value. Your app just... disappears. Silent failures in Helm are the worst kind of failures.Problem: YAML Indentation ErrorsHelm templates are whitespace-sensitive.
Two common patterns that break:```yaml# WRONG
- conditional indentation breaks YAMLenv:{{
- if .
Values.env }}
- name: FOO value: bar{{
- end }}# RIGHT
- use proper indentation controlenv: {{
- if .
Values.env }}
- name: FOO value: bar {{
- end }}```**Problem:
Template Functions Failing**Use the default
function to provide fallbacks:```yaml# WRONG
- breaks if image.tag is emptyimage: {{ .
Values.image.repository }}:{{ .
Values.image.tag }}# RIGHT
- provides fallbackimage: {{ .
Values.image.repository }}:{{ .
Values.image.tag | default "latest" }}### Production Debugging Strategies**Check Release History**
bashhelm history myappThis shows all deployment revisions. Each upgrade creates a new revision number, which you can rollback to. The [Helm release management docs](https://helm.sh/docs/intro/using_helm/#helpful-options-for-installupgraderollback) explain how revisions are tracked.**Fast Rollback (Nuclear Option)**
bashhelm rollback myapp 1```Rolls back to revision 1 immediately.
This usually works when everything else fails. The rollback typically takes 10-30 seconds.Debug Resource Creation Issues```bash# Check what resources Helm createdkubectl get all,configmap,secret -l app.kubernetes.io/managed-by=Helm# Check for resource conflictskubectl get events --sort-by=.metadata.creation
Timestamp**Handle Stuck Upgrades**When you get "another operation is in progress", someone else's upgrade got stuck:
bash# List all Helm releases and their statushelm list -A# Check what's actually happeningkubectl get secrets -A | grep "sh.helm.release"# Nuclear option: delete the stuck release lockkubectl delete secret sh.helm.release.v1.myapp.v3 -n default```### Real Production War Stories**Case 1:
Image Pull Errors**Symptom: Pods stuck in ImagePullBackOff
Reality: Registry authentication failed or wrong image tagDebug: kubectl describe pod
shows the exact error
Wait, actually, let me back up and explain why this one's so common.
At my current company, we push to a private ECR registry but developers constantly forget to update their local kubectl contexts with the right AWS credentials. So their charts deploy fine, but nothing starts because the nodes can't pull the images. The container image documentation covers all image-related issues, but the real fix is usually just running aws eks update-kubeconfig
again.**Case 2:
Resource Quota Exceeded**Symptom: Deployment succeeds but no pods startReality: Not enough CPU/memory in clusterDebug: kubectl describe pod
shows FailedScheduling
events.
Check resource management for proper limits.**Case 3:
Service Account Issues**Symptom: Pods crash with permission errorsReality: ServiceAccount doesn't exist or lacks RBAC permissionsDebug:
Check pod logs with kubectl logs
. The RBAC documentation explains permission troubleshooting.### The Commands You Copy-Paste at 3AMbash# See what Helm thinks it deployedhelm get all myapp# See what Kubernetes actually haskubectl get all -l app.kubernetes.io/instance=myapp# Check for obvious failureskubectl get events --field-selector type=Warning# Get pod logs when things are brokenkubectl logs -l app.kubernetes.io/instance=myapp --tail=50# Force-delete everything and start overhelm uninstall myappkubectl delete all -l app.kubernetes.io/instance=myapp
The last set of commands is your "fuck it, start over" option when debugging takes longer than rebuilding.
Look, I know this sounds extreme, but I've learned that sometimes the nuclear option saves you hours. I once spent 3 hours debugging why a PVC wouldn't mount, trying every kubectl command in the book. Finally said screw it, deleted the entire release, and redeployed. Took 2 minutes and worked perfectly. Sometimes the Kubernetes troubleshooting guide has solutions, but usually you just need to delete everything and start fresh.
The kubectl cheat sheet has more emergency commands for desperate times.### Survival Strategy
Helm debugging gets easier with experience, but it never stops being frustrating.
The templating language is needlessly complex, error messages are cryptic, and dependencies will break at the worst possible time. Your survival strategy: learn the core debugging commands, pin your dependencies, use helm template
religiously, and keep rollback as your nuclear option. Most importantly, test everything in staging first
- production is not the place to discover that your chart templating breaks with the latest Kubernetes API version.