Why is my container "running" but "unhealthy" at the same time?

Yeah, it's confusing as hell. Your container can be "running" but "unhealthy" because Docker's being picky about your health check. The container process is fine, but Docker can't verify your app actually works. Use `docker inspect --format "{{json .State.Health }}" container_name | jq` to see what's actually failing.

How do I see what Docker's health check is actually doing?

Run this: `docker inspect --format "{{json .State.Health }}" container_name | jq '.Log[].Output'`. This shows you exactly what the health check command returned and why it failed. Most of the time the error message points right to the problem.

My health check works when I run it manually but fails automatically. WTF?

You've got an environment mismatch. When you run commands manually, you might have different environment variables, user permissions, or working directory. Try this to replicate Docker's environment exactly: `docker exec container_name sh -c 'your-health-check-command'`

What do the different exit codes mean?

Exit code 0 = success. Exit code 1 = general failure (your app is broken). Exit code 125 = Docker couldn't run the command (usually missing dependencies like `curl`). Timeouts usually just show up as the health check hanging, not specific exit codes. If it times out, Docker kills it and marks it failed.

How do I stop getting false alarms during startup?

Set a proper `start-period` so Docker ignores health check failures while your app boots up. Most people set this too short. If your app takes 30 seconds to start, set `start-period=60s` to be safe. Don't trust the default of 0 seconds.

Can I make Docker automatically restart unhealthy containers?

Docker won't restart containers just because they're unhealthy - that's an orchestration system thing (Kubernetes, Docker Compose with restart policies). If you want hacky automatic restarts, you can modify your health check to kill the main process on failure: `CMD your-health-check || kill -15 1`. But this is janky.

Why do my health checks randomly timeout?

Usually resource problems. If your container is hitting memory or CPU limits, health checks will randomly fail when the system is under pressure. Check `docker stats` to see if you're maxing out resources. Also, garbage collection pauses can cause timeouts in some applications.

How often should I run health checks?

Every 30 seconds is Docker's default and it's fine for most stuff. Don't get fancy unless you have a reason. High-frequency checks (every 5-10 seconds) eat resources. Long intervals (2-5 minutes) mean you won't detect problems quickly. 30 seconds is the sweet spot.

What's the difference between timeouts and failures?

Timeouts: Your health check command hangs and Docker kills it after the timeout period. Usually means your app is overloaded or unresponsive. Failures: Your health check completes but returns a non-zero exit code. Usually means your app returned an error or couldn't connect to something.

How do I debug Docker Compose health checks?

Use `docker-compose ps` first to see which services are unhealthy. Then get details: `docker inspect --format "{{json .State.Health }}" $(docker-compose ps -q service_name) | jq`. Docker Compose service names don't match container names, which is annoying.

My health check works sometimes and fails other times. How do I fix this?

Intermittent failures will make you want to quit programming because you can't reproduce the damn things. Common causes: - Resource constraints (memory/CPU spikes) - Database connection pool exhaustion - External dependencies being flaky - Race conditions during app startup Increase your retry count to 5+ and monitor what's happening when failures occur.

Should my health check test external dependencies?

Only test dependencies that would make your app completely unusable. Don't test every single external API or you'll get false failures when third-party services have hiccups. Focus on critical stuff like your database.

My health check passes locally but fails in CI/CD. What gives?

Different environments, different problems. Your local Docker has different resource limits, network config, and timing than your CI runner. Common issues: CI containers get less memory (health checks timeout), different DNS resolution, missing environment variables, or filesystem permissions. Test your health check in the exact same environment where it's failing.

My app takes 5 minutes to start up. How do I handle this?

Set a long `start-period` - like `start-period=300s` for a 5-minute startup. You can also implement progressive health checks that test different things at different startup phases, but honestly that's usually overkill. Just give it enough time to boot.

What monitoring tools actually work for health checks?

For local development: `docker events --filter event=health_status` shows real-time health changes. For production: Most monitoring systems (Datadog, New Relic, Prometheus) can track Docker health check metrics. But they cost money and most of the time just checking `docker ps` tells you what you need to know.

How do I write a custom health check script that doesn't suck?

Keep it simple: 1. Test the things that matter for your app to work 2. Exit with code 0 if everything's fine, 1 if something's broken 3. Don't do expensive operations every 30 seconds 4. Make sure the script and its dependencies are in your container 5. Test it manually before deploying Example: ```bash #!/bin/bash curl -f localhost:8080/health && redis-cli ping exit $? ``` If you need more detailed resources and documentation beyond these FAQs, check out the links below.

Currently viewing the AI version

Switch to human version

Docker Container Health Check Debugging - AI-Optimized Guide

Configuration

Production-Ready Health Check Settings

HEALTHCHECK --start-period=60s --interval=30s --timeout=10s --retries=3 \
  CMD curl -f localhost:8080/health

Critical Timing Parameters:

start-period=60s: Most apps need 30-60 seconds minimum startup time
timeout=10s: Don't wait forever - health checks taking >10s indicate problems
retries=3: Prevents single random timeout from causing alerts
interval=30s: Sweet spot between resource usage and detection speed

Database-Specific Health Checks

PostgreSQL: pg_isready -h localhost
MySQL: mysqladmin ping -h localhost
Redis: redis-cli ping

Warning: Never use expensive database queries in health checks (e.g., SELECT COUNT(*) FROM huge_table)

Common Failure Modes and Solutions

90% Failure Rate Causes

Wrong port configuration: Health check hits port 3000, app runs on 8080
Missing curl in container: Exit code 125 - command not found
localhost vs 0.0.0.0: App listening on 127.0.0.1, health check targets localhost
Insufficient startup time: Default 0-second start-period kills slow-starting apps

Network Configuration Issues

Container listening on localhost: Change app to listen on 0.0.0.0:PORT
Port verification: docker exec container netstat -tlnp
DNS problems: Use IP addresses instead of hostnames
IPv6 conflicts: Force IPv4 with curl -4

Resource Constraints

Memory limits: Health checks timeout during OOM conditions
Detection: docker stats container - watch for memory approaching limits
CPU starvation: Random timeouts under load
Garbage collection pauses: Can cause intermittent health check failures

Critical Debugging Commands

Health Check Status Investigation

# Get complete health check state
docker inspect --format "{{json .State.Health }}" container | jq

# View health check logs
docker inspect --format "{{json .State.Health }}" container | jq '.Log[].Output'

# Manual health check execution
docker exec -it container curl -f localhost:8080/health
echo $?

Exit Code Meanings

0: Success
1: General failure (app broken)
125: Docker couldn't run command (missing dependencies)
Timeout: No exit code - Docker kills hanging process

Environment Replication

# Replicate exact Docker health check environment
docker exec container sh -c "curl -f localhost:8080/health"

Resource Requirements

Time Investment

Initial debugging: 2 hours typical for first-time issues
Recurring problems: 30 minutes once patterns identified
Prevention setup: 1 hour to configure proper health checks

Expertise Requirements

Basic Docker knowledge: Essential
Application architecture understanding: Critical for meaningful health checks
Network troubleshooting: Required for complex failures

Implementation Reality

Docker Compose Dependency Management

services:
  web:
    depends_on:
      database:
        condition: service_healthy  # Critical: prevents startup race conditions
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      start_period: 30s

  database:
    image: postgres:13
    healthcheck:
      test: ["CMD", "pg_isready", "-h", "localhost"]
      interval: 10s
      timeout: 5s
      retries: 5

Health Check Endpoint Requirements

Must Test:

Database connectivity (if app can't function without it)
Critical cache availability
Essential external dependencies

Must NOT Test:

Non-critical external APIs (causes false failures)
Expensive operations (slows down app)
File system operations (unreliable timing)

Breaking Points and Failure Modes

False Positive Scenarios

Tuesday 2am mystery: Automated maintenance consuming resources
Load-dependent failures: Health checks work when idle, fail under traffic
Docker version upgrades: Behavior changes between versions
Environment differences: Works locally, fails in CI/CD

Critical Warnings

Container restart loops: Broken health checks can prevent proper startup
Resource monitoring gaps: Health check failures often mask underlying resource issues
Default setting traps: Docker's 30-second defaults fail for most real applications
Orchestration confusion: Health check success ≠ application readiness

Monitoring Requirements

Essential Metrics

Health check success rate over time
Health check response times
Resource usage correlation with failures
Failure pattern analysis (time, load conditions)

Alert Configuration

Don't alert on: Single health check failure
Alert on: 3+ consecutive failures (matches Docker's retry logic)
Escalate on: Persistent failures >10 minutes

Decision Criteria

When Health Checks Are Worth It

Multi-service applications: Dependencies must start in order
Production environments: Automated recovery essential
Load-balanced deployments: Traffic routing decisions

When to Skip Health Checks

Single-container applications: Process monitoring sufficient
Development environments: Manual intervention acceptable
Stateless services: Container restart has no side effects

Production Gotchas

Container State Confusion

Container status "running" + health status "unhealthy" = app process alive but non-functional
Health check failure ≠ container restart (orchestration system dependent)
Docker doesn't restart unhealthy containers automatically

Testing Requirements

# Local validation before deployment
docker build -t myapp .
docker run -d --name test-container myapp
sleep 60
docker inspect --format "{{json .State.Health }}" test-container | jq

Monitoring Commands

# Real-time health status changes
docker events --filter event=health_status

# Resource correlation analysis
docker stats container

Migration Pain Points

Common Upgrade Issues

Health check timing behavior changes between Docker versions
Container orchestration system updates modify health check handling
Base image updates may remove health check dependencies (curl, netstat)

Validation Checklist

Test health checks in target deployment environment
Verify health check dependencies exist in container
Confirm timing settings match application startup requirements
Validate network configuration matches health check expectations

Useful Links for Further Investigation

Resources That Actually Help

Link	Description
Docker Health Check Reference	The official docs. Dry as toast but technically accurate. I suffered through these so you have the complete syntax reference.
Docker Compose Health Check Configuration	Health check config for Docker Compose. The examples are useless for real-world scenarios, but you need to know the syntax.
Docker CLI Health Commands	How to use `docker inspect` and related commands. Actually useful for debugging, unlike most official docs.
Lumigo Docker Health Check Guide	Actually helpful practical guide written by people who've been through this hell. Way better than the official docs for real-world scenarios.
Last9 Docker Status Unhealthy Guide	Solid troubleshooting guide with actual debugging commands. These people have clearly spent 3am debugging broken health checks in production.
Stack Overflow Health Check Logs	Where you'll end up anyway when the official docs fail you. The real solutions are buried in the comments, as usual.
Docker Events Documentation	How to watch health status changes in real-time. Actually useful for local debugging. `docker events --filter event=health_status` is your friend.
Container Logging Best Practices	Log configuration that might help you figure out why health checks are failing. Most of this is common sense but worth skimming.
Prometheus Docker Metrics	If you're running Prometheus (and willing to deal with its complexity), this can track health check metrics over time.
AWS ECS Health Check Troubleshooting	ECS health checks work differently from vanilla Docker. This explains the gotchas. Still doesn't solve the fundamental problem that AWS error messages suck.
Kubernetes Health Checks	K8s has three different types of health checks because apparently one wasn't complicated enough. The concepts carry over from Docker but the configuration is different.
Azure Container Apps Troubleshooting	Azure's version of container health checks. The documentation is surprisingly decent, which is unusual for Microsoft.
jq JSON Processor	Essential for parsing `docker inspect` output. Install this on every machine for Docker debugging. Its weird syntax is worth learning for digging through JSON logs.
docker-autoheal	Third-party tool that automatically restarts unhealthy containers. Useful for automatic recovery without Kubernetes complexity. I've used this in production when proper orchestration wasn't feasible, and it works.

Docker Container Health Check Debugging - AI-Optimized Guide

Configuration

Production-Ready Health Check Settings

Database-Specific Health Checks

Common Failure Modes and Solutions

90% Failure Rate Causes

Network Configuration Issues

Resource Constraints

Critical Debugging Commands

Health Check Status Investigation

Exit Code Meanings

Environment Replication

Resource Requirements

Time Investment

Expertise Requirements

Implementation Reality

Docker Compose Dependency Management

Health Check Endpoint Requirements

Breaking Points and Failure Modes

False Positive Scenarios

Critical Warnings

Monitoring Requirements

Essential Metrics

Alert Configuration

Decision Criteria

When Health Checks Are Worth It

When to Skip Health Checks

Production Gotchas

Container State Confusion

Testing Requirements

Monitoring Commands

Migration Pain Points

Common Upgrade Issues

Validation Checklist

Useful Links for Further Investigation

Resources That Actually Help

Related Tools & Recommendations

Docker Desktop vs Podman Desktop vs Rancher Desktop vs OrbStack: What Actually Happens

Podman Desktop - Free Docker Desktop Alternative

Migration vers Kubernetes

Kubernetes 替代方案：轻量级 vs 企业级选择指南

Kubernetes - Le Truc que Google a Lâché dans la Nature

containerd 迁移避坑指南 - 三年血泪总结

containerd - The Container Runtime That Actually Just Works

Docker vs Podman vs Containerd - 2025 安全性能深度对比

Docker Desktop vs Podman: Your Monthly Bill Reality Check

GitHub Actions - CI/CD That Actually Lives Inside GitHub

GitHub Actions + AWS Lambda: Deploy Shit Without Desktop Boomer Energy

🔧 GitHub Actions vs Jenkins

Jenkins - The CI/CD Server That Won't Die

Jenkins Docker 통합: CI/CD Pipeline 구축 완전 가이드

Jenkins - 日本発のCI/CDオートメーションサーバー

Docker Swarm - Container Orchestration That Actually Works

Docker Swarm Node Down? Here's How to Fix It

Docker Swarm Service Discovery Broken? Here's How to Unfuck It

Docker Container Won't Start? Here's How to Actually Fix It

Rancher Desktop - Docker Desktop's Free Replacement That Actually Works