Docker Container Health Check Debugging - AI-Optimized Guide
Configuration
Production-Ready Health Check Settings
HEALTHCHECK --start-period=60s --interval=30s --timeout=10s --retries=3 \
CMD curl -f localhost:8080/health
Critical Timing Parameters:
- start-period=60s: Most apps need 30-60 seconds minimum startup time
- timeout=10s: Don't wait forever - health checks taking >10s indicate problems
- retries=3: Prevents single random timeout from causing alerts
- interval=30s: Sweet spot between resource usage and detection speed
Database-Specific Health Checks
- PostgreSQL:
pg_isready -h localhost
- MySQL:
mysqladmin ping -h localhost
- Redis:
redis-cli ping
Warning: Never use expensive database queries in health checks (e.g., SELECT COUNT(*) FROM huge_table
)
Common Failure Modes and Solutions
90% Failure Rate Causes
- Wrong port configuration: Health check hits port 3000, app runs on 8080
- Missing curl in container: Exit code 125 - command not found
- localhost vs 0.0.0.0: App listening on 127.0.0.1, health check targets localhost
- Insufficient startup time: Default 0-second start-period kills slow-starting apps
Network Configuration Issues
- Container listening on localhost: Change app to listen on
0.0.0.0:PORT
- Port verification:
docker exec container netstat -tlnp
- DNS problems: Use IP addresses instead of hostnames
- IPv6 conflicts: Force IPv4 with
curl -4
Resource Constraints
- Memory limits: Health checks timeout during OOM conditions
- Detection:
docker stats container
- watch for memory approaching limits - CPU starvation: Random timeouts under load
- Garbage collection pauses: Can cause intermittent health check failures
Critical Debugging Commands
Health Check Status Investigation
# Get complete health check state
docker inspect --format "{{json .State.Health }}" container | jq
# View health check logs
docker inspect --format "{{json .State.Health }}" container | jq '.Log[].Output'
# Manual health check execution
docker exec -it container curl -f localhost:8080/health
echo $?
Exit Code Meanings
- 0: Success
- 1: General failure (app broken)
- 125: Docker couldn't run command (missing dependencies)
- Timeout: No exit code - Docker kills hanging process
Environment Replication
# Replicate exact Docker health check environment
docker exec container sh -c "curl -f localhost:8080/health"
Resource Requirements
Time Investment
- Initial debugging: 2 hours typical for first-time issues
- Recurring problems: 30 minutes once patterns identified
- Prevention setup: 1 hour to configure proper health checks
Expertise Requirements
- Basic Docker knowledge: Essential
- Application architecture understanding: Critical for meaningful health checks
- Network troubleshooting: Required for complex failures
Implementation Reality
Docker Compose Dependency Management
services:
web:
depends_on:
database:
condition: service_healthy # Critical: prevents startup race conditions
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
start_period: 30s
database:
image: postgres:13
healthcheck:
test: ["CMD", "pg_isready", "-h", "localhost"]
interval: 10s
timeout: 5s
retries: 5
Health Check Endpoint Requirements
Must Test:
- Database connectivity (if app can't function without it)
- Critical cache availability
- Essential external dependencies
Must NOT Test:
- Non-critical external APIs (causes false failures)
- Expensive operations (slows down app)
- File system operations (unreliable timing)
Breaking Points and Failure Modes
False Positive Scenarios
- Tuesday 2am mystery: Automated maintenance consuming resources
- Load-dependent failures: Health checks work when idle, fail under traffic
- Docker version upgrades: Behavior changes between versions
- Environment differences: Works locally, fails in CI/CD
Critical Warnings
- Container restart loops: Broken health checks can prevent proper startup
- Resource monitoring gaps: Health check failures often mask underlying resource issues
- Default setting traps: Docker's 30-second defaults fail for most real applications
- Orchestration confusion: Health check success ≠ application readiness
Monitoring Requirements
Essential Metrics
- Health check success rate over time
- Health check response times
- Resource usage correlation with failures
- Failure pattern analysis (time, load conditions)
Alert Configuration
- Don't alert on: Single health check failure
- Alert on: 3+ consecutive failures (matches Docker's retry logic)
- Escalate on: Persistent failures >10 minutes
Decision Criteria
When Health Checks Are Worth It
- Multi-service applications: Dependencies must start in order
- Production environments: Automated recovery essential
- Load-balanced deployments: Traffic routing decisions
When to Skip Health Checks
- Single-container applications: Process monitoring sufficient
- Development environments: Manual intervention acceptable
- Stateless services: Container restart has no side effects
Production Gotchas
Container State Confusion
- Container status "running" + health status "unhealthy" = app process alive but non-functional
- Health check failure ≠ container restart (orchestration system dependent)
- Docker doesn't restart unhealthy containers automatically
Testing Requirements
# Local validation before deployment
docker build -t myapp .
docker run -d --name test-container myapp
sleep 60
docker inspect --format "{{json .State.Health }}" test-container | jq
Monitoring Commands
# Real-time health status changes
docker events --filter event=health_status
# Resource correlation analysis
docker stats container
Migration Pain Points
Common Upgrade Issues
- Health check timing behavior changes between Docker versions
- Container orchestration system updates modify health check handling
- Base image updates may remove health check dependencies (curl, netstat)
Validation Checklist
- Test health checks in target deployment environment
- Verify health check dependencies exist in container
- Confirm timing settings match application startup requirements
- Validate network configuration matches health check expectations
Useful Links for Further Investigation
Resources That Actually Help
Link | Description |
---|---|
Docker Health Check Reference | The official docs. Dry as toast but technically accurate. I suffered through these so you have the complete syntax reference. |
Docker Compose Health Check Configuration | Health check config for Docker Compose. The examples are useless for real-world scenarios, but you need to know the syntax. |
Docker CLI Health Commands | How to use `docker inspect` and related commands. Actually useful for debugging, unlike most official docs. |
Lumigo Docker Health Check Guide | Actually helpful practical guide written by people who've been through this hell. Way better than the official docs for real-world scenarios. |
Last9 Docker Status Unhealthy Guide | Solid troubleshooting guide with actual debugging commands. These people have clearly spent 3am debugging broken health checks in production. |
Stack Overflow Health Check Logs | Where you'll end up anyway when the official docs fail you. The real solutions are buried in the comments, as usual. |
Docker Events Documentation | How to watch health status changes in real-time. Actually useful for local debugging. `docker events --filter event=health_status` is your friend. |
Container Logging Best Practices | Log configuration that might help you figure out why health checks are failing. Most of this is common sense but worth skimming. |
Prometheus Docker Metrics | If you're running Prometheus (and willing to deal with its complexity), this can track health check metrics over time. |
AWS ECS Health Check Troubleshooting | ECS health checks work differently from vanilla Docker. This explains the gotchas. Still doesn't solve the fundamental problem that AWS error messages suck. |
Kubernetes Health Checks | K8s has three different types of health checks because apparently one wasn't complicated enough. The concepts carry over from Docker but the configuration is different. |
Azure Container Apps Troubleshooting | Azure's version of container health checks. The documentation is surprisingly decent, which is unusual for Microsoft. |
jq JSON Processor | Essential for parsing `docker inspect` output. Install this on every machine for Docker debugging. Its weird syntax is worth learning for digging through JSON logs. |
docker-autoheal | Third-party tool that automatically restarts unhealthy containers. Useful for automatic recovery without Kubernetes complexity. I've used this in production when proper orchestration wasn't feasible, and it works. |
Related Tools & Recommendations
Docker Desktop vs Podman Desktop vs Rancher Desktop vs OrbStack: What Actually Happens
competes with Docker Desktop
Podman Desktop - Free Docker Desktop Alternative
competes with Podman Desktop
Migration vers Kubernetes
Ce que tu dois savoir avant de migrer vers K8s
Kubernetes 替代方案:轻量级 vs 企业级选择指南
当你的团队被 K8s 复杂性搞得焦头烂额时,这些工具可能更适合你
Kubernetes - Le Truc que Google a Lâché dans la Nature
Google a opensourcé son truc pour gérer plein de containers, maintenant tout le monde s'en sert
containerd 迁移避坑指南 - 三年血泪总结
competes with containerd
containerd - The Container Runtime That Actually Just Works
The boring container runtime that Kubernetes uses instead of Docker (and you probably don't need to care about it)
Docker vs Podman vs Containerd - 2025 安全性能深度对比
哪个容器运行时更适合你的生产环境?从rootless到daemon架构的全面分析
Docker Desktop vs Podman: Your Monthly Bill Reality Check
When Docker's subscription fees hit different than your monthly spotify premium - the actual cost breakdown that's gonna make your manager rethink the whole con
GitHub Actions - CI/CD That Actually Lives Inside GitHub
integrates with GitHub Actions
GitHub Actions + AWS Lambda: Deploy Shit Without Desktop Boomer Energy
AWS finally stopped breaking lambda deployments every 3 weeks
🔧 GitHub Actions vs Jenkins
GitHub Actions vs Jenkins - 실제 사용기
Jenkins - The CI/CD Server That Won't Die
integrates with Jenkins
Jenkins Docker 통합: CI/CD Pipeline 구축 완전 가이드
한국 개발자를 위한 Jenkins + Docker 자동화 시스템 구축 실무 가이드 - 2025년 기준으로 작성된 제대로 동작하는 통합 방법
Jenkins - 日本発のCI/CDオートメーションサーバー
プラグインが2000個以上とかマジで管理不能だけど、なんでも実現できちゃう悪魔的なCI/CDプラットフォーム
Docker Swarm - Container Orchestration That Actually Works
Multi-host Docker without the Kubernetes PhD requirement
Docker Swarm Node Down? Here's How to Fix It
When your production cluster dies at 3am and management is asking questions
Docker Swarm Service Discovery Broken? Here's How to Unfuck It
When your containers can't find each other and everything goes to shit
Docker Container Won't Start? Here's How to Actually Fix It
Real solutions for when Docker decides to ruin your day (again)
Rancher Desktop - Docker Desktop's Free Replacement That Actually Works
alternative to Rancher Desktop
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization