Finding the Real Problem When Docker Lies to You
The debugging process: systematic elimination of possible causes, from authentication to DNS to registry redirects.
Docker Desktop's error messages are about as helpful as a screen door on a submarine. When you get ECONNREFUSED
, that could mean anything from DNS failures to policy violations to AWS being down. Here's how to actually figure out what's broken.
Docker Desktop Log Locations (Because They Hide This Shit)
Log file organization: Docker Desktop scatters logs across multiple files, each containing different types of events.
macOS: ~/Library/Containers/com.docker.docker/Data/log/
Windows: %APPDATA%\Docker\log\
**WSL2**: /mnt/c/Users/%USERNAME%/AppData/Roaming/Docker/log/
The useful logs:
host/log.log
- Main Docker daemon logsvm/dockerd.log
- VM-specific issues (macOS/Windows)vm/dns.log
- DNS resolution attempts (this is gold for RAM debugging)
Most people check docker logs
and give up. The real debugging happens in these system logs. Took me way too long to figure this out - Docker hides the useful stuff.
Systematic Debugging Process (That Actually Works)
Step 1: Confirm it's actually a RAM issue
## Test registry connectivity outside Docker
curl -I https://hub.docker.com/
## If this fails, it's not RAM - it's network/DNS
## Replace with your actual registry URL
Step 2: Check user authentication
docker info | grep -A 5 \"Username\"
## Should show org username, not personal account
Step 3: Enable verbose logging
## Docker Desktop > Settings > Docker Engine
{
\"log-level\": \"debug\",
\"log-driver\": \"local\"
}
Step 4: Reproduce and grep logs
## On macOS/Linux
grep -i \"registry\" ~/Library/Containers/com.docker.docker/Data/log/host/log.log | tail -20
## Look for \"policy denied\" or \"allowlist\" messages
The DNS Detective Work
RAM blocks at the DNS level, so you need to trace DNS resolution to see what's actually happening. Docker Desktop intercepts DNS requests and checks them against your policy before resolving.
Network tracing on macOS:
sudo tcpdump -i any port 53 | grep your-registry
## Shows DNS queries being made
Check what domains are actually being hit:
## Enable Docker debug logging, then run a pull
docker pull your-registry.com/image:tag --debug 2>&1 | grep -i \"resolv\\|dns\\|policy\"
I've seen builds fail because:
- ECR redirected to
*.cloudfront.net
domains not in allowlist (happened during Black Friday traffic) - GitHub changed from
docker.pkg.github.com
toghcr.io
without warning (broke our entire CI for 6 hours) - Azure added new regional endpoints that weren't whitelisted (thanks Microsoft)
- Artifactory's load balancer started using different backend domains (discovered this at 2 AM)
Registry-Specific Debugging Patterns
Authentication flow complexity: Each registry has its own redirect patterns and domain requirements that can trigger RAM blocks.
AWS ECR Hell:
ECR can redirect through 6+ domains during a single pull. Enable CloudTrail on your ECR repositories to see what domains are actually being hit:
aws logs filter-log-events \
--log-group-name /aws/ecr/your-registry \
--filter-pattern \"{ $.eventName = \\\"GetAuthorizationToken\\\" }\"
GitHub Container Registry Fuckery:
GitHub uses different domains for auth vs actual pulls. Check both:
ghcr.io
- primary endpointpkg-containers.githubusercontent.com
- where images actually live*.githubusercontent.com
- various CDN endpoints
If only one is whitelisted, you'll get weird partial failures.
The Authentication Nightmare
RAM only works when users are signed into the correct Docker organization. But Docker Desktop's sign-in state is fragile as hell.
Check actual org membership:
## Check Docker Hub authentication (requires Docker CLI login)
docker info | grep -A 5 \"Registry Mirrors\"
## Or check auth directly
cat ~/.docker/config.json | jq '.auths'
Common sign-in failures:
- Personal Docker account cached from before joining org
- Multiple orgs with different policies (only first one applies)
- Personal Access Token expired (these expire!)
- Organization Access Token used instead of PAT (doesn't work with RAM)
Performance Debugging at Scale
With 100+ developers, Docker's policy enforcement becomes a bottleneck. Each DNS lookup hits Docker's servers before checking allowlist.
Measure policy lookup latency:
time docker pull nginx:latest 2>/dev/null
## First pull tests policy + registry speed
## Second pull tests just policy (image cached)
time docker pull nginx:latest 2>/dev/null
If the second pull is still slow (>2 seconds), your allowlist is too big or Docker's policy servers are overloaded.
Network analysis:
## Check where policy lookups are going
netstat -an | grep 443 | grep -E \"(docker|index.docker)\"
## Should show connections to Docker's policy servers
Emergency Diagnosis Commands
When production is down and you need answers fast:
## Check current RAM policy status
docker system info | grep -i \"registry\\|policy\"
## Find what registry domains a build is actually trying to hit
docker build --progress=plain . 2>&1 | grep -i \"resolv\\|dns\" | sort | uniq
## See recent policy enforcement events
grep -i \"registry\\|allowlist\\|policy\" ~/.docker/log/host/log.log | tail -10
## Test if specific domain is allowed
docker pull scratch || echo \"If this fails, basic policy enforcement is working\"
The key is systematic elimination: confirm the user is authenticated correctly, verify the policy is active, trace the actual DNS requests being made, and check what domains the registry is redirecting to.
Most RAM debugging comes down to: "Docker said it tried to connect to X, but it actually tried to connect to Y, and only X is in your allowlist."
Additional debugging resources:
- Docker Desktop troubleshooting guide for general connectivity issues
- Docker registry HTTP API for understanding registry protocols
- Docker daemon configuration reference for system-level debugging
- Container registry networking patterns for network-level troubleshooting
- Docker build reference for build-specific registry failures
- AWS ECR troubleshooting for ECR-specific issues
- Azure Container Registry networking for ACR debugging
- GitHub Container Registry documentation for GHCR issues