Currently viewing the AI version
Switch to human version

Docker Exit Code 137: OOM Kill Prevention and Debugging

Critical Context

What Exit Code 137 Means:

  • Signal: 128 + 9 (SIGKILL - uncatchable termination)
  • Cause: Linux kernel OOM killer terminated container for exceeding memory limits
  • Timing: Often occurs during traffic spikes or at 3AM when monitoring is minimal
  • Impact: No graceful shutdown, no cleanup, immediate service disruption

Common Production Scenario:
Container runs stable for weeks with 512MB limit → Traffic spike requires 600MB → Instant death with no warning

Configuration That Actually Works

Memory Limit Sizing Formula

Memory limit = (Peak RSS × 1.5) + Runtime overhead

Runtime Overhead Requirements:

  • JVM applications: +200-400MB for non-heap memory
  • Node.js applications: +50-100MB for V8 overhead
  • Go applications: +20-50MB for garbage collection
  • Python applications: +30-100MB for interpreter overhead

JVM Container Configuration

Critical Problem: JVM allocates heap based on host memory (32GB), not container limits (512MB)

Required Flags:

# Modern JVMs (Java 11+)
-XX:+UseContainerSupport

# Legacy JVMs
-Xmx400m  # Leave 112MB for non-heap in 512MB container

Failure Mode: JVM tries to allocate 8GB heap in 512MB container → immediate OOM kill

Kubernetes Memory Settings

2025 Best Practice: Set requests = limits to prevent overcommitment

Why This Matters:

  • Scheduler uses requests for node placement
  • OOM killer respects limits
  • Gap between them causes unpredictable failures when nodes are overcommitted
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "512Mi"  # Same value prevents overcommit

Overcommitment Failure: Multiple pods scheduled based on low requests, all hit high limits simultaneously → node memory exhaustion → random pod kills

Diagnostic Commands

Confirm OOM Kill

# Check OOM kill flag (can be misleading for child process kills)
docker inspect --format '{{.State.OOMKilled}}' container_name

# Verify exit code
docker inspect --format '{{.State.ExitCode}}' container_name

# Check system logs for actual OOM events
dmesg | grep -i "killed process"

Memory Monitoring

# Real-time monitoring (includes cache/buffers - can be misleading)
docker stats container_name

# Container-internal memory view
docker exec container_name cat /proc/meminfo

Critical Warning: docker stats shows cache memory that kernel can reclaim. OOM killer looks at RSS (resident set size). Container can show 200MB usage but die at 512MB limit due to committed memory.

Failure Scenarios and Solutions

Memory Leak vs Memory Spike Detection

Memory Leak Pattern:

  • Gradual memory growth over time
  • No periodic drops from garbage collection
  • Consistent upward trend in monitoring

Memory Spike Pattern:

  • Stable baseline usage
  • Sudden jumps during traffic/processing
  • Returns to baseline after load decreases

Diagnostic Difference: Leaks require application profiling, spikes require better resource planning or rate limiting.

Child Process OOM Confusion

Scenario: Container shows OOMKilled: false but exit code 137
Cause: Child process got OOM killed, main process (PID 1) remained alive but exited
Detection: Check kernel logs for actual OOM events, not just Docker flags

False OOM Signals

Windows Containers: OOM behavior differs significantly from Linux
Child Process Kills: Docker only flags OOMKilled if PID 1 dies directly
Memory Accounting: Different between Docker stats and kernel OOM killer view

Production Prevention Strategies

Application-Level Defense

Backpressure Implementation:

  • Reject non-essential operations at 80% memory usage
  • Implement circuit breakers for memory-intensive operations
  • Stream data processing instead of loading entire datasets

Connection Pool Limits:

# Memory-aware pool sizing
session = requests.Session()
session.mount('http://', HTTPAdapter(pool_maxsize=50))

Monitoring Setup Requirements

Essential Stack:

  • cAdvisor for accurate container metrics collection
  • Prometheus for storage and alerting
  • Alert at 75% memory usage, panic at 90%
  • Monitor memory growth rate, not just absolute usage

Critical Metric: RSS memory tracking, not Docker stats cache-inclusive numbers

Container Sizing Methodology

  1. Baseline Measurement: Run production workload without limits for 48-72 hours
  2. Peak Calculation: Monitor actual RSS usage under realistic load
  3. Safety Margin: Apply 1.5x multiplier plus runtime overhead
  4. Load Testing: Verify limits under stress conditions that mimic production spikes

Exit Code Reference

Code Meaning Required Action
137 SIGKILL (often OOM) Check OOMKilled flag, increase memory or fix leak
125 Docker daemon error Verify Dockerfile syntax, image availability
126 Command not executable Fix permissions or command path
127 Command not found Check binary exists, PATH configuration
1 Application error Review application logs for specific error
0 Clean exit Normal termination (investigate if unexpected)

Resource Requirements

Time Investment

  • Initial Sizing: 2-3 days of monitoring plus load testing
  • Production Debugging: 30 minutes to 4 hours depending on complexity
  • Monitoring Setup: 1-2 days for complete observability stack

Expertise Requirements

  • Basic: Understanding of container memory limits and Docker commands
  • Advanced: Knowledge of kernel memory management, cgroups, and runtime-specific behavior
  • Expert: Application profiling, custom metrics, and distributed system debugging

Breaking Points

  • 1000+ concurrent containers: Standard monitoring tools may become inadequate
  • Multi-GB containers: Require careful node sizing and network considerations
  • High-frequency allocations: May need custom memory management strategies

Critical Warnings

What Documentation Doesn't Tell You

  • Docker stats memory reporting includes reclaimable cache
  • OOMKilled flag only set when PID 1 dies directly
  • JVM heap size calculation ignores container limits by default
  • Kubernetes scheduler and OOM killer use different memory values
  • Child process OOM kills don't trigger container restart policies

Hidden Costs

  • Memory Overcommitment: Appears to save resources but causes unpredictable failures
  • Insufficient Monitoring: Delayed detection leads to extended outages
  • Undersized Containers: Create cascade failures during traffic spikes
  • Legacy Runtime Defaults: Most runtimes ignore container memory limits without explicit configuration

Migration Pain Points

  • Kubernetes 1.20+: Changes in memory accounting affect existing deployments
  • Docker Desktop vs Production: Different memory management behavior
  • Cloud Platform Differences: AWS ECS, Azure Container Apps, GCP Cloud Run have platform-specific OOM handling

This knowledge enables automated detection of memory pressure, proper container sizing, and prevention of production OOM kills through systematic monitoring and application-level defensive programming.

Useful Links for Further Investigation

Essential Resources for Docker Memory Debugging

LinkDescription
Docker Resource ConstraintsComplete guide to memory, CPU, and other resource limits. Essential reading for understanding how Docker implements cgroups and memory accounting.
Docker Runtime MetricsOfficial documentation for docker stats and container monitoring. Explains what each metric actually measures and when it's useful.
Docker Container Run ReferenceComplete reference for docker run including exit codes. Section on exit status codes explains what 137, 125, and other codes mean.
GitHub Issue: Process OOM within Docker ContainerDetailed issue about OOM behavior differences between Windows and Linux containers. Shows how OOMKilled flag can be misleading.
Kubernetes Issue: Container marked OOMKilled when non-init process diesExplains why containers sometimes show OOMKilled: false even with exit code 137. Important for understanding child process kills.
Stack Overflow: How to detect Docker memory limit reachedPractical commands for checking if container was OOM killed. Community answers with working examples.
Kubernetes Issue: pods getting terminated with Exit Code 137Recent issue showing how memory pressure affects Kubernetes pods. Good example of production troubleshooting process.
cAdvisor Container MonitoringGoogle's container advisor for collecting runtime metrics. Essential for production monitoring - provides accurate memory usage data.
Prometheus Container MetricsHow to set up Prometheus monitoring for Docker containers using cAdvisor. Includes example queries for memory alerts.
Advanced Container Monitoring GuideComprehensive guide to using docker stats effectively. Covers real-time monitoring and automated alerting strategies.
Kubernetes OOMKilled TroubleshootingComplete guide to handling OOM kills in Kubernetes. Covers requests vs limits, proper sizing, and monitoring setup.
Memory Resource Management Best Practices2025 best practices for Kubernetes memory management. Recommends setting requests = limits to prevent overcommitment.
Azure Container Apps Exit Code 137Platform-specific troubleshooting for Azure Container Apps. Shows how cloud platforms handle container OOM kills.
Tracking Down Invisible OOM KillsAdvanced debugging when child processes get OOM killed but Kubernetes doesn't detect it. Essential for complex applications.
Docker Container Memory Leak DetectionComprehensive guide to detecting and fixing memory leaks in containerized applications. Includes monitoring setup and debugging techniques.
OOM Killer Deep DiveTechnical explanation of how the Linux OOM killer works and how it interacts with container orchestration platforms.
JVM Container SupportOracle's documentation on JVM container awareness. Critical for Java applications that need to respect container memory limits.
Node.js Memory Management in ContainersOfficial Node.js documentation for memory-related command line options. Essential for sizing Node.js containers correctly.
Go Memory ManagementGo's garbage collector guide. Helps understand memory patterns in Go applications running in containers.
Docker System CommandsWhen everything is broken, these commands can help. docker system prune and related cleanup commands for desperate times.
Kubernetes Exit Codes ReferenceOfficial Kubernetes guide to debugging pod failures. Includes exit code meanings and troubleshooting steps.
Container Exit Codes Complete GuideComprehensive reference for all container exit codes. Bookmark this for 3AM debugging sessions.

Related Tools & Recommendations

integration
Recommended

GitHub Actions + Jenkins Security Integration

When Security Wants Scans But Your Pipeline Lives in Jenkins Hell

GitHub Actions
/integration/github-actions-jenkins-security-scanning/devsecops-pipeline-integration
100%
troubleshoot
Similar content

Fix Kubernetes OOMKilled Pods - Production Memory Crisis Management

When your pods die with exit code 137 at 3AM and production is burning - here's the field guide that actually works

Kubernetes
/troubleshoot/kubernetes-oom-killed-pod/oomkilled-production-crisis-management
93%
integration
Recommended

Stop Fighting Your CI/CD Tools - Make Them Work Together

When Jenkins, GitHub Actions, and GitLab CI All Live in Your Company

GitHub Actions
/integration/github-actions-jenkins-gitlab-ci/hybrid-multi-platform-orchestration
83%
integration
Recommended

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

How to Wire Together the Modern DevOps Stack Without Losing Your Sanity

kubernetes
/integration/docker-kubernetes-argocd-prometheus/gitops-workflow-integration
81%
compare
Recommended

Docker Desktop vs Podman Desktop vs Rancher Desktop vs OrbStack: What Actually Happens

powers Docker Desktop

Docker Desktop
/compare/docker-desktop/podman-desktop/rancher-desktop/orbstack/performance-efficiency-comparison
77%
troubleshoot
Recommended

Fix Kubernetes ImagePullBackOff Error - The Complete Battle-Tested Guide

From "Pod stuck in ImagePullBackOff" to "Problem solved in 90 seconds"

Kubernetes
/troubleshoot/kubernetes-imagepullbackoff/comprehensive-troubleshooting-guide
57%
alternatives
Recommended

GitHub Actions is Fine for Open Source Projects, But Try Explaining to an Auditor Why Your CI/CD Platform Was Built for Hobby Projects

integrates with GitHub Actions

GitHub Actions
/alternatives/github-actions/enterprise-governance-alternatives
53%
integration
Recommended

GitHub Actions + Docker + ECS: Stop SSH-ing Into Servers Like It's 2015

Deploy your app without losing your mind or your weekend

GitHub Actions
/integration/github-actions-docker-aws-ecs/ci-cd-pipeline-automation
53%
tool
Recommended

Jenkins - The CI/CD Server That Won't Die

integrates with Jenkins

Jenkins
/tool/jenkins/overview
52%
howto
Recommended

Deploy Django with Docker Compose - Complete Production Guide

End the deployment nightmare: From broken containers to bulletproof production deployments that actually work

Django
/howto/deploy-django-docker-compose/complete-production-deployment-guide
51%
news
Recommended

Docker Compose 2.39.2 and Buildx 0.27.0 Released with Major Updates

Latest versions bring improved multi-platform builds and security fixes for containerized applications

Docker
/news/2025-09-05/docker-compose-buildx-updates
51%
howto
Similar content

Stop Docker from Killing Your Containers at Random (Exit Code 137 Is Not Your Friend)

Three weeks into a project and Docker Desktop suddenly decides your container needs 16GB of RAM to run a basic Node.js app

Docker Desktop
/howto/setup-docker-development-environment/complete-development-setup
48%
tool
Recommended

containerd - The Container Runtime That Actually Just Works

The boring container runtime that Kubernetes uses instead of Docker (and you probably don't need to care about it)

containerd
/tool/containerd/overview
46%
tool
Recommended

Podman Desktop - Free Docker Desktop Alternative

competes with Podman Desktop

Podman Desktop
/tool/podman-desktop/overview
41%
alternatives
Recommended

Podman Desktop Alternatives That Don't Suck

Container tools that actually work (tested by someone who's debugged containers at 3am)

Podman Desktop
/alternatives/podman-desktop/comprehensive-alternatives-guide
41%
tool
Recommended

GitLab Container Registry

GitLab's container registry that doesn't make you juggle five different sets of credentials like every other registry solution

GitLab Container Registry
/tool/gitlab-container-registry/overview
36%
tool
Recommended

Colima - Docker Desktop Alternative That Doesn't Suck

For when Docker Desktop starts costing money and eating half your Mac's RAM

Colima
/tool/colima/overview
34%
alternatives
Recommended

Docker Desktop Alternatives That Don't Suck

powers Docker Desktop

Docker Desktop
/alternatives/docker-desktop/open-source-alternatives
33%
troubleshoot
Recommended

Docker Desktop is Fucked - CVE-2025-9074 Container Escape

Any container can take over your entire machine with one HTTP request

Docker Desktop
/troubleshoot/cve-2025-9074-docker-desktop-fix/container-escape-mitigation
33%
howto
Recommended

How to Actually Escape Docker Desktop Without Losing Your Shit

powers Docker Desktop

Docker Desktop
/howto/migrate-from-docker-desktop-to-alternatives/migrate-from-docker-desktop
33%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization