Docker health checks are supposed to tell you if your app is working. In practice, they mostly tell you that something is broken but not what or why. It's like having a smoke detector that just screams "FIRE!" without telling you which room is burning.
Understanding Docker's health check mechanism requires knowing how container lifecycle management actually works under the hood.
Here's the reality: when your container shows Status: unhealthy
, Docker isn't actually checking if your container is broken. It's checking if some command you wrote returns exit code 0. Docker runs your test command every 30 seconds and if it fails 3 times in a row, it marks your container as fucked.
The container keeps running. Your app keeps serving traffic. But now you're getting alerts.
The Three States of Health Check Hell
Here's what actually happens during the container lifecycle:
Starting: Docker gives you a grace period where health check failures don't count. This is supposed to let your app boot up without triggering false alarms. In reality, most people set this too short and wonder why their Postgres container is "unhealthy" 5 seconds after starting when it takes 15 seconds to initialize.
Healthy: Your health check command returned 0 a few times in a row. Congratulations, Docker thinks your app works. This doesn't mean your app actually works - just that whatever random endpoint you picked responded with a 200.
Unhealthy: Your health check failed 3 times (or whatever retry count you set). The container is still running and probably working fine, but Docker has decided it's broken. This is where you get paged at 3am.
I've debugged this scenario about 50 times. 80% of the time it's one of these things: wrong port, missing curl in the container, or the health check is hitting localhost when it should hit 0.0.0.0. Save yourself 2 hours and check these first. Common debugging patterns are well documented.
But first, let's understand what usually goes wrong.
The Usual Suspects
Here's what usually breaks, based on actual production incidents:
Your app crashed: The obvious one. Health check hits your endpoint, gets connection refused, returns exit code 1. At least this one makes sense. Look at your application logs, not the health check logs.
Database is down: Your app starts up fine, but can't connect to Postgres/MySQL/whatever. Health check tries to hit your /health
endpoint, your app returns 500 because it can't query the database. Fix: check if your database is actually running and reachable from the container.
Out of memory: Container hits memory limits, processes get killed, health check times out. This one's fun because Docker doesn't tell you it's an OOM kill - you just get timeout errors. Use `docker stats` to see if you're hitting memory limits.
Wrong network config: Health check tries to connect to localhost:8080
but your app is listening on 0.0.0.0:8080
. Or vice versa. Or the port is wrong. Or you're in a different network namespace. Docker's networking makes me want to throw my laptop out the window.
Missing dependencies: Health check script calls curl
but curl isn't installed in your container. Or it calls some Python script that's not in the PATH. The error message will be "command not found" which is at least helpful.
Now that you know what usually breaks, here's how to actually figure out which one is fucking with your containers.