When you get that Slack ping at 2am about containers doing weird shit, don't docker kill
everything like an idiot. You'll nuke the evidence and learn nothing.
Container escapes happen because Docker's isolation is pretty much theater. Privileged containers with socket access = root on the host. Game over. I've seen this exact scenario four times, and every time it was some developer who mounted /var/run/docker.sock
because a Medium article told them to.
Most breakouts aren't sophisticated. They're dumb configuration mistakes that made it past code review. Like that time someone deployed a Jupyter notebook container in privileged mode because matplotlib wouldn't install. Cost us three days and probably some sleep.
The worst part? Docker acts like everything is fine while your container is pwning the host. No warnings, no alerts. Just silent escalation until your Bitcoin mining bill shows up.
Here's what actually works when you're dealing with a potential breakout and your monitoring system is screaming at you:
Incident Classification and Severity Assessment
Oh Shit Indicators (Time to Wake Everyone Up):
## Check for the classic rookie mistakes that give root access
docker ps --filter \"label=privileged=true\" --format \"table {{.Names}} {{.Image}} {{.Status}}\"
## This hangs on Docker 24.0.7 and nobody knows why
docker inspect $(docker ps -q) | jq -r '.[] | select(.Mounts[]?.Source == \"/var/run/docker.sock\") | .Name + \" - SOCKET MOUNTED = GAME OVER\"'
## Host network mode = your firewall is now useless
docker ps --filter \"network=host\" --format \"table {{.Names}} {{.Image}} {{.Ports}}\"
## See if Docker is spawning weird processes (spoiler: it always is)
ps auxf | grep -E \"(docker|containerd|runc)\" | head -20
Docker has this lovely habit of hanging on the exact command you need most. docker inspect
will timeout on the compromised container, guaranteed.
Secondary Indicators (HIGH - Investigate Within 2 Hours):
## Containers with excessive capabilities
docker inspect $(docker ps -q) | jq -r '.[] | select(.HostConfig.CapAdd != null) | .Name + \" - Capabilities: \" + (.HostConfig.CapAdd | join(\",\"))'
## User namespace bypass attempts
docker inspect $(docker ps -q) | jq -r '.[] | select(.HostConfig.UsernsMode == \"host\") | .Name + \" - Host User Namespace\"'
## Containers with writable host mounts
docker inspect $(docker ps -q) | jq -r '.[] | select(.Mounts[]?.RW == true and (.Mounts[]?.Source | startswith(\"/etc\") or startswith(\"/var\") or startswith(\"/usr\"))) | .Name + \" - Writable Host Mount\"'
Immediate Containment Actions
Step 1: Isolate Suspected Containers (Execute in first 5 minutes)
Never immediately stop suspicious containers - you'll lose volatile evidence and running processes. Instead, isolate them:
## Create forensic snapshot before containment
CONTAINER_ID=\"suspicious_container_name\"
docker commit $CONTAINER_ID forensic-snapshot-$(date +%Y%m%d-%H%M%S)
## Isolate container network access (preserves running state)
docker network disconnect bridge $CONTAINER_ID
docker network disconnect $(docker inspect $CONTAINER_ID | jq -r '.[0].NetworkSettings.Networks | keys[]') $CONTAINER_ID 2>/dev/null || true
## Remove dangerous volume mounts (requires container restart - do last)
## docker update --mount-rm /target/dangerous/path $CONTAINER_ID
Step 2: Preserve Evidence State
Critical evidence disappears when containers stop. Capture everything immediately:
#!/bin/bash
## Incident response evidence collection script
INCIDENT_DIR=\"/var/log/docker-incident-$(date +%Y%m%d-%H%M%S)\"
mkdir -p \"$INCIDENT_DIR\"
## Container runtime state
docker ps -a > \"$INCIDENT_DIR/containers.txt\"
docker images > \"$INCIDENT_DIR/images.txt\"
docker network ls > \"$INCIDENT_DIR/networks.txt\"
docker volume ls > \"$INCIDENT_DIR/volumes.txt\"
## System state at time of incident
ps auxf > \"$INCIDENT_DIR/processes.txt\"
netstat -tulpn > \"$INCIDENT_DIR/network-connections.txt\"
ss -tulpn > \"$INCIDENT_DIR/socket-connections.txt\"
mount > \"$INCIDENT_DIR/mounts.txt\"
## Docker daemon state
systemctl status docker > \"$INCIDENT_DIR/docker-service.txt\"
journalctl -u docker --since \"1 hour ago\" > \"$INCIDENT_DIR/docker-logs.txt\"
## Memory snapshots of suspicious containers
for container in $(docker ps --filter \"status=running\" --format \"{{.Names}}\"); do
PID=$(docker inspect --format '{{.State.Pid}}' \"$container\")
if [[ \"$PID\" != \"0\" ]]; then
echo \"Capturing memory for container: $container (PID: $PID)\"
# gcore fails about 30% of the time for no good reason
if command -v gcore &> /dev/null; then
timeout 60 gcore -o \"$INCIDENT_DIR/memory-$container\" \"$PID\" 2>/dev/null || echo \"gcore shit the bed again for $container\"
fi
fi
done
echo \"Evidence preserved in: $INCIDENT_DIR\"
Step 3: Host System Security Assessment
Container breakouts often target the host system. Check for immediate compromise:
## Check for suspicious processes spawned by container runtime
ps auxf | grep -E \"(docker|containerd|runc)\" | grep -v grep
## Look for processes with unusual parent-child relationships
ps -eo pid,ppid,cmd | awk '$2 == 1 && $3 !~ /(kernel|systemd|init)/ {print \"Suspicious PID 1 child:\", $0}'
## Check for new user accounts (common persistence method)
diff <(cat /etc/passwd) <(docker run --rm -v /etc:/host-etc alpine cat /host-etc/passwd) || echo \"No container access to /etc/passwd\"
## Monitor for privilege escalation attempts
ausearch -m avc,user_chauthtok,user_acct,user_mgmt -ts recent | head -20
## Check systemd for new services (backdoor persistence)
systemctl list-units --type=service --state=running | grep -E \"(docker|container)\" | tail -10
Network Traffic Analysis and Lateral Movement Detection
Container escapes often involve network reconnaissance and lateral movement. Capture network evidence before attackers pivot:
## Active network monitoring
tcpdump -i any -w \"/var/log/incident-network-$(date +%Y%m%d-%H%M%S).pcap\" &
TCPDUMP_PID=$!
## Container network activity
for container in $(docker ps --format \"{{.Names}}\"); do
echo \"=== Network activity for $container ===\"
docker exec \"$container\" ss -tulpn 2>/dev/null || echo \"Cannot access $container network\"
docker exec \"$container\" netstat -rn 2>/dev/null || echo \"Cannot access $container routing\"
done
## Look for suspicious connections to external IPs
netstat -an | awk '/ESTABLISHED/ && !/127.0.0.1/ && !/::1/ {print \"External connection:\", $0}'
## Check for reverse shells (common escape technique)
lsof -i | grep -E \":4444|:1234|:8080|:9001\" | head -10
## Stop network capture after 5 minutes
sleep 300; kill $TCPDUMP_PID 2>/dev/null
File System Integrity Verification
## Check for unauthorized file modifications in critical directories
find /etc /usr/bin /usr/sbin -type f -mtime -1 | head -20
## Look for new SUID/SGID files (privilege escalation technique)
find / -type f \( -perm -4000 -o -perm -2000 \) -newermt \"1 hour ago\" 2>/dev/null
## Check Docker-specific directories for tampering
find /var/lib/docker -type f -mtime -1 | grep -v \"containers/.*/logs\" | head -10
## Verify container image integrity
docker images --digests | while read repo tag digest rest; do
if [[ \"$digest\" != \"<none>\" ]] && [[ \"$repo\" != \"REPOSITORY\" ]]; then
echo \"Verifying: $repo:$tag\"
# Note: This requires Docker Content Trust or Cosign for verification
docker trust inspect \"$repo:$tag\" 2>/dev/null || echo \"No signature data for $repo:$tag\"
fi
done
Communication and Escalation Procedures
CRITICAL: Do not communicate incident details over potentially compromised systems
Immediate notification (within 15 minutes):
- Security Operations Center (SOC) via secure channel
- Incident Response team lead
- Infrastructure team on-call engineer
Initial incident report should include:
- Affected container names and images
- Suspected attack vector (socket mount, privileged mode, capability abuse)
- Evidence of host system compromise (yes/no/unknown)
- Current containment status
Secure communication channels:
- Use out-of-band communication (separate network/device)
- Signal, Telegram, or encrypted messaging apps
- Phone calls for critical coordination
Documentation Requirements for Legal/Compliance
Container incidents often involve compliance requirements (PCI-DSS, HIPAA, SOX) and potential law enforcement involvement. Enterprise organizations must follow incident response frameworks and breach notification requirements while coordinating with threat intelligence platforms and security operations centers:
## Legal hold evidence preservation
LEGAL_HOLD_DIR=\"/secure/storage/incident-$(date +%Y%m%d-%H%M%S)\"
mkdir -p \"$LEGAL_HOLD_DIR\"
## Hash all evidence files for integrity verification
find \"$INCIDENT_DIR\" -type f -exec sha256sum {} \; > \"$LEGAL_HOLD_DIR/evidence-hashes.txt\"
## Create incident timeline
echo \"$(date): Container incident detected\" > \"$LEGAL_HOLD_DIR/incident-timeline.txt\"
echo \"$(date): Evidence collection initiated\" >> \"$LEGAL_HOLD_DIR/incident-timeline.txt\"
echo \"$(date): Containment actions implemented\" >> \"$LEGAL_HOLD_DIR/incident-timeline.txt\"
## Copy critical logs with timestamps preserved
cp -p /var/log/audit/audit.log \"$LEGAL_HOLD_DIR/\" 2>/dev/null || echo \"No audit logs available\"
cp -p /var/log/syslog \"$LEGAL_HOLD_DIR/\" 2>/dev/null || echo \"No syslog available\"
Reality Check: Took me 6 hours last time to get a clean memory dump. Docker kept hanging, gcore crashed twice, and the compromised container decided to restart itself for no reason.
What actually happens:
- First hour: Running
docker ps
obsessively while trying not to panic - Hour 2: Finding the mounted socket and realizing how fucked you are
- Hours 3-8: Fighting with gcore while it fails on half the containers
- Rest of day 1: Discovering that overlayfs makes everything 10x harder
- Day 2: Actually reading the logs and understanding what went wrong
Guaranteed problems:
- Docker inspect hangs on the exact container you need
- Memory dumps are either corrupted or empty
- Logs are disabled because "it affects performance"
- Your monitoring decides to take a shit right now
- Every forensic tool assumes normal processes, not containers
Next up is forensic analysis, where you find out what evidence you forgot to collect.