My container shows OOMKilled: false but still exited with 137 - what gives?

This happens when a child process gets OOM killed instead of your main process. Docker only flags OOMKilled if PID 1 dies. If your app spawns worker processes and one gets killed, the main process might exit with code 137 but OOMKilled stays false.Check `/var/log/kern.log` on the host for actual OOM kill messages: `dmesg | grep -i "killed process"`

Docker stats shows 200MB usage but my 512MB container died - why?

Docker stats can be misleading because it includes cached memory that the kernel can reclaim. The OOM killer looks at committed memory (RSS + swap). Your app might have allocated more memory than Docker stats indicates.Use `docker exec container_name cat /proc/meminfo` to see detailed memory breakdown inside the container.

How do I set the right memory limit without guessing?

Run your app without limits first and monitor it under realistic load:```bash# Run without memory limitdocker run -d --name test-app your-image# Monitor for 24-48 hoursdocker stats test-app# Set limit to 2x observed peak usagedocker run --memory=800m your-image # If peak was 400MB```Always leave headroom. Memory usage isn't constant.

Why does my Java app use way more memory than my heap size?

JVM memory != heap memory. The JVM needs memory for:- Method area/Metaspace- Code cache- Compressed class space- Direct memory (NIO, unsafe operations)- Stack space for threadsRule of thumb: Container limit = heap + 25% for JVM overhead.

Can I prevent OOM kills completely?

No, and you shouldn't want to. OOM kills protect your system from runaway processes. Instead:- Set appropriate memory limits- Handle memory pressure gracefully in your application- Use swap carefully (it can mask problems and hurt performance)- Monitor and alert on memory usage trends

My container uses 100% CPU after OOM kill - is this normal?

No. If your container survives an OOM kill (child process died), the main process might be stuck in a bad state - possibly trying to allocate memory in a loop or handling the death of child processes poorly.Restart the container and investigate why child processes are getting killed.

How do I handle OOM kills in production gracefully?

**Application level:**- Implement graceful degradation when memory is low- Use streaming instead of loading large datasets into memory- Add circuit breakers for memory-intensive operations**Infrastructure level:**- Set up monitoring with memory usage alerts- Use horizontal scaling (more containers) instead of just increasing memory- Implement health checks that detect memory pressure

Why do some containers survive longer than others with the same memory usage?

The OOM killer doesn't just look at memory usage - it scores processes based on:- How much memory they're using (higher = more likely to die)- How long they've been running (newer processes get killed first)- Process priority and OOM adjustment scoresMain processes with long uptime are less likely to be killed than recently spawned children using the same amount of memory.

Should I use swap in containers?

Generally no. Swap can mask memory issues and hurt performance. If you need swap:- Set a reasonable limit (`--memory=1g --memory-swap=1.5g`)- Monitor swap usage - high swap usage indicates undersized containers- Consider [memory.swappiness](https://docs.docker.com/engine/containers/resource_constraints/) settings

How can I debug OOM kills after the fact?

**Check system logs:**```bash# On the Docker hostdmesg | grep -i "killed process"journalctl -u docker | grep -i oom```**Check container inspection:**```bashdocker inspect container_name | grep -A5 -B5 Memory```**Application logs:**Most apps don't log memory pressure, but some frameworks do. Look for garbage collection warnings, allocation failures, or "out of memory" messages.

My memory usage gradually increases then suddenly drops to zero - memory leak?

Probably garbage collection, not a leak. Languages with GC (Java, Go, JavaScript) accumulate memory until GC runs. This creates a sawtooth pattern.Real leaks show consistently increasing memory with no drops. Profile your application with tools like:- Java: JProfiler, VisualVM- Node.js: heapdump, clinic.js- Go: pprof- Python: memory_profiler

Currently viewing the AI version

Switch to human version

Docker Exit Code 137: OOM Kill Prevention and Debugging

Critical Context

What Exit Code 137 Means:

Signal: 128 + 9 (SIGKILL - uncatchable termination)
Cause: Linux kernel OOM killer terminated container for exceeding memory limits
Timing: Often occurs during traffic spikes or at 3AM when monitoring is minimal
Impact: No graceful shutdown, no cleanup, immediate service disruption

Common Production Scenario:
Container runs stable for weeks with 512MB limit → Traffic spike requires 600MB → Instant death with no warning

Configuration That Actually Works

Memory Limit Sizing Formula

Memory limit = (Peak RSS × 1.5) + Runtime overhead

Runtime Overhead Requirements:

JVM applications: +200-400MB for non-heap memory
Node.js applications: +50-100MB for V8 overhead
Go applications: +20-50MB for garbage collection
Python applications: +30-100MB for interpreter overhead

JVM Container Configuration

Critical Problem: JVM allocates heap based on host memory (32GB), not container limits (512MB)

Required Flags:

# Modern JVMs (Java 11+)
-XX:+UseContainerSupport

# Legacy JVMs
-Xmx400m  # Leave 112MB for non-heap in 512MB container

Failure Mode: JVM tries to allocate 8GB heap in 512MB container → immediate OOM kill

Kubernetes Memory Settings

2025 Best Practice: Set requests = limits to prevent overcommitment

Why This Matters:

Scheduler uses requests for node placement
OOM killer respects limits
Gap between them causes unpredictable failures when nodes are overcommitted

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "512Mi"  # Same value prevents overcommit

Overcommitment Failure: Multiple pods scheduled based on low requests, all hit high limits simultaneously → node memory exhaustion → random pod kills

Diagnostic Commands

Confirm OOM Kill

# Check OOM kill flag (can be misleading for child process kills)
docker inspect --format '{{.State.OOMKilled}}' container_name

# Verify exit code
docker inspect --format '{{.State.ExitCode}}' container_name

# Check system logs for actual OOM events
dmesg | grep -i "killed process"

Memory Monitoring

# Real-time monitoring (includes cache/buffers - can be misleading)
docker stats container_name

# Container-internal memory view
docker exec container_name cat /proc/meminfo

Critical Warning: docker stats shows cache memory that kernel can reclaim. OOM killer looks at RSS (resident set size). Container can show 200MB usage but die at 512MB limit due to committed memory.

Failure Scenarios and Solutions

Memory Leak vs Memory Spike Detection

Memory Leak Pattern:

Gradual memory growth over time
No periodic drops from garbage collection
Consistent upward trend in monitoring

Memory Spike Pattern:

Stable baseline usage
Sudden jumps during traffic/processing
Returns to baseline after load decreases

Diagnostic Difference: Leaks require application profiling, spikes require better resource planning or rate limiting.

Child Process OOM Confusion

Scenario: Container shows OOMKilled: false but exit code 137
Cause: Child process got OOM killed, main process (PID 1) remained alive but exited
Detection: Check kernel logs for actual OOM events, not just Docker flags

False OOM Signals

Windows Containers: OOM behavior differs significantly from Linux
Child Process Kills: Docker only flags OOMKilled if PID 1 dies directly
Memory Accounting: Different between Docker stats and kernel OOM killer view

Production Prevention Strategies

Application-Level Defense

Backpressure Implementation:

Reject non-essential operations at 80% memory usage
Implement circuit breakers for memory-intensive operations
Stream data processing instead of loading entire datasets

Connection Pool Limits:

# Memory-aware pool sizing
session = requests.Session()
session.mount('http://', HTTPAdapter(pool_maxsize=50))

Monitoring Setup Requirements

Essential Stack:

cAdvisor for accurate container metrics collection
Prometheus for storage and alerting
Alert at 75% memory usage, panic at 90%
Monitor memory growth rate, not just absolute usage

Critical Metric: RSS memory tracking, not Docker stats cache-inclusive numbers

Container Sizing Methodology

Baseline Measurement: Run production workload without limits for 48-72 hours
Peak Calculation: Monitor actual RSS usage under realistic load
Safety Margin: Apply 1.5x multiplier plus runtime overhead
Load Testing: Verify limits under stress conditions that mimic production spikes

Exit Code Reference

Code	Meaning	Required Action
137	SIGKILL (often OOM)	Check OOMKilled flag, increase memory or fix leak
125	Docker daemon error	Verify Dockerfile syntax, image availability
126	Command not executable	Fix permissions or command path
127	Command not found	Check binary exists, PATH configuration
1	Application error	Review application logs for specific error
0	Clean exit	Normal termination (investigate if unexpected)

Resource Requirements

Time Investment

Initial Sizing: 2-3 days of monitoring plus load testing
Production Debugging: 30 minutes to 4 hours depending on complexity
Monitoring Setup: 1-2 days for complete observability stack

Expertise Requirements

Basic: Understanding of container memory limits and Docker commands
Advanced: Knowledge of kernel memory management, cgroups, and runtime-specific behavior
Expert: Application profiling, custom metrics, and distributed system debugging

Breaking Points

1000+ concurrent containers: Standard monitoring tools may become inadequate
Multi-GB containers: Require careful node sizing and network considerations
High-frequency allocations: May need custom memory management strategies

Critical Warnings

What Documentation Doesn't Tell You

Docker stats memory reporting includes reclaimable cache
OOMKilled flag only set when PID 1 dies directly
JVM heap size calculation ignores container limits by default
Kubernetes scheduler and OOM killer use different memory values
Child process OOM kills don't trigger container restart policies

Hidden Costs

Memory Overcommitment: Appears to save resources but causes unpredictable failures
Insufficient Monitoring: Delayed detection leads to extended outages
Undersized Containers: Create cascade failures during traffic spikes
Legacy Runtime Defaults: Most runtimes ignore container memory limits without explicit configuration

Migration Pain Points

Kubernetes 1.20+: Changes in memory accounting affect existing deployments
Docker Desktop vs Production: Different memory management behavior
Cloud Platform Differences: AWS ECS, Azure Container Apps, GCP Cloud Run have platform-specific OOM handling

This knowledge enables automated detection of memory pressure, proper container sizing, and prevention of production OOM kills through systematic monitoring and application-level defensive programming.

Useful Links for Further Investigation

Essential Resources for Docker Memory Debugging

Link	Description
Docker Resource Constraints	Complete guide to memory, CPU, and other resource limits. Essential reading for understanding how Docker implements cgroups and memory accounting.
Docker Runtime Metrics	Official documentation for docker stats and container monitoring. Explains what each metric actually measures and when it's useful.
Docker Container Run Reference	Complete reference for docker run including exit codes. Section on exit status codes explains what 137, 125, and other codes mean.
GitHub Issue: Process OOM within Docker Container	Detailed issue about OOM behavior differences between Windows and Linux containers. Shows how OOMKilled flag can be misleading.
Kubernetes Issue: Container marked OOMKilled when non-init process dies	Explains why containers sometimes show OOMKilled: false even with exit code 137. Important for understanding child process kills.
Stack Overflow: How to detect Docker memory limit reached	Practical commands for checking if container was OOM killed. Community answers with working examples.
Kubernetes Issue: pods getting terminated with Exit Code 137	Recent issue showing how memory pressure affects Kubernetes pods. Good example of production troubleshooting process.
cAdvisor Container Monitoring	Google's container advisor for collecting runtime metrics. Essential for production monitoring - provides accurate memory usage data.
Prometheus Container Metrics	How to set up Prometheus monitoring for Docker containers using cAdvisor. Includes example queries for memory alerts.
Advanced Container Monitoring Guide	Comprehensive guide to using docker stats effectively. Covers real-time monitoring and automated alerting strategies.
Kubernetes OOMKilled Troubleshooting	Complete guide to handling OOM kills in Kubernetes. Covers requests vs limits, proper sizing, and monitoring setup.
Memory Resource Management Best Practices	2025 best practices for Kubernetes memory management. Recommends setting requests = limits to prevent overcommitment.
Azure Container Apps Exit Code 137	Platform-specific troubleshooting for Azure Container Apps. Shows how cloud platforms handle container OOM kills.
Tracking Down Invisible OOM Kills	Advanced debugging when child processes get OOM killed but Kubernetes doesn't detect it. Essential for complex applications.
Docker Container Memory Leak Detection	Comprehensive guide to detecting and fixing memory leaks in containerized applications. Includes monitoring setup and debugging techniques.
OOM Killer Deep Dive	Technical explanation of how the Linux OOM killer works and how it interacts with container orchestration platforms.
JVM Container Support	Oracle's documentation on JVM container awareness. Critical for Java applications that need to respect container memory limits.
Node.js Memory Management in Containers	Official Node.js documentation for memory-related command line options. Essential for sizing Node.js containers correctly.
Go Memory Management	Go's garbage collector guide. Helps understand memory patterns in Go applications running in containers.
Docker System Commands	When everything is broken, these commands can help. docker system prune and related cleanup commands for desperate times.
Kubernetes Exit Codes Reference	Official Kubernetes guide to debugging pod failures. Includes exit code meanings and troubleshooting steps.
Container Exit Codes Complete Guide	Comprehensive reference for all container exit codes. Bookmark this for 3AM debugging sessions.

Docker Exit Code 137: OOM Kill Prevention and Debugging

Critical Context

Configuration That Actually Works

Memory Limit Sizing Formula

JVM Container Configuration

Kubernetes Memory Settings

Diagnostic Commands

Confirm OOM Kill

Memory Monitoring

Failure Scenarios and Solutions

Memory Leak vs Memory Spike Detection

Child Process OOM Confusion

False OOM Signals

Production Prevention Strategies

Application-Level Defense

Monitoring Setup Requirements

Container Sizing Methodology

Exit Code Reference

Resource Requirements

Time Investment

Expertise Requirements

Breaking Points

Critical Warnings

What Documentation Doesn't Tell You

Hidden Costs

Migration Pain Points

Useful Links for Further Investigation

Essential Resources for Docker Memory Debugging

Related Tools & Recommendations

GitHub Actions + Jenkins Security Integration

Fix Kubernetes OOMKilled Pods - Production Memory Crisis Management

Stop Fighting Your CI/CD Tools - Make Them Work Together

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

Docker Desktop vs Podman Desktop vs Rancher Desktop vs OrbStack: What Actually Happens

Fix Kubernetes ImagePullBackOff Error - The Complete Battle-Tested Guide

GitHub Actions is Fine for Open Source Projects, But Try Explaining to an Auditor Why Your CI/CD Platform Was Built for Hobby Projects

GitHub Actions + Docker + ECS: Stop SSH-ing Into Servers Like It's 2015

Jenkins - The CI/CD Server That Won't Die

Deploy Django with Docker Compose - Complete Production Guide

Docker Compose 2.39.2 and Buildx 0.27.0 Released with Major Updates

Stop Docker from Killing Your Containers at Random (Exit Code 137 Is Not Your Friend)

containerd - The Container Runtime That Actually Just Works

Podman Desktop - Free Docker Desktop Alternative

Podman Desktop Alternatives That Don't Suck

GitLab Container Registry

Colima - Docker Desktop Alternative That Doesn't Suck

Docker Desktop Alternatives That Don't Suck

Docker Desktop is Fucked - CVE-2025-9074 Container Escape

How to Actually Escape Docker Desktop Without Losing Your Shit