What Exit Code 137 Actually Means (And Why It Happens at 3AM)

Exit code 137 is Docker's way of telling you the Linux kernel just killed your container because it tried to eat more memory than you allocated. The number comes from 128 + 9, where 9 is the signal number for SIGKILL - the nuclear option that can't be caught or ignored.

Container Networking Architecture

The Real-World Scenario

Picture this: You're running a Node.js app in production with --memory=512m because that seemed reasonable. Everything works fine for weeks. Then at 3:17 AM on a Tuesday, your monitoring starts screaming. Your container died with exit code 137.

What happened? Your app hit a traffic spike, loaded more data into memory, and suddenly needed 600MB. The Linux kernel's OOM killer said "nope" and killed it instantly. No graceful shutdown, no cleanup, just dead. This is a common production scenario that catches teams off guard.

How to Confirm It's Actually an OOM Kill

Don't guess. Check if Docker flagged it as an OOM kill:

## Check if container was OOM killed
docker inspect --format '{{.State.OOMKilled}}' container_name

## See the actual exit code
docker inspect --format '{{.State.ExitCode}}' container_name

## Get container logs to see what happened before death
docker logs --tail=50 container_name

If OOMKilled is true and exit code is 137, you found your culprit. Sometimes the OOMKilled flag is misleading or false, especially on Windows containers or when child processes get killed instead of the main process.

Memory Usage vs Memory Limits: The Gotcha

Here's what trips up most people: Docker's memory reporting includes cache and buffers, but the OOM killer looks at RSS (resident set size) - the memory your process actually claims. This memory accounting difference causes confusion during debugging.

Use docker stats to see real-time usage:

## Live memory monitoring
docker stats container_name

## One-time snapshot
docker stats --no-stream

The memory column shows current usage vs limit. If you're consistently hitting 80%+ of your limit, you're playing Russian roulette with the OOM killer. For more detailed monitoring, consider using cAdvisor or other monitoring tools.

The JVM Memory Trap

Java applications are notorious for this. The JVM allocates heap space based on available system memory, not container limits. If your container has 512MB but the host has 32GB, the JVM might try to allocate 8GB of heap and instantly die. This is a well-documented issue in containerized environments.

Fix it by setting JVM flags to respect container limits:

## Modern JVMs (Java 11+) should detect container limits automatically
-XX:+UseContainerSupport

## Or set explicitly 
-Xmx400m # Leave room for non-heap memory

Same problem exists with other runtimes. Node.js with --max-old-space-size, Python with memory pools, Go with garbage collection - they all need to know about your container's memory constraints. .NET applications have similar considerations.

Kubernetes Makes It More Complicated

In Kubernetes, you set both requests and limits. The OOM killer respects limits, but Kubernetes scheduling uses requests. This creates a dangerous gap that leads to unpredictable OOM kills.

If you set requests: 256Mi and limits: 512Mi, Kubernetes might schedule your pod on a node assuming it needs 256Mi. But if it actually uses 512Mi and the node is overcommitted, multiple pods can hit OOM simultaneously. This is explained in detail in the official Kubernetes documentation.

Current best practice is setting requests = limits to avoid this surprise. The Kubernetes community increasingly recommends this approach for production workloads.

Memory Leaks vs Memory Spikes

Exit code 137 from a memory leak looks different than from a traffic spike. Leaks show gradual memory growth in monitoring until sudden death. Spikes show stable usage followed by immediate jumps. Proper monitoring helps distinguish between these patterns.

Real GitHub issue: Astro builds were failing with exit code 137 because too many concurrent image optimizations pushed memory usage over limits. Not a leak - just bad resource planning. Similar issues appear across different platforms and applications.

The fix was either increasing memory limits or throttling concurrent operations. Sometimes the answer isn't "give it more memory" but "make it use memory more efficiently." Production debugging techniques help identify the root cause.

OOM Troubleshooting FAQ: The Questions You're Actually Asking

Q

My container shows OOMKilled: false but still exited with 137 - what gives?

A

This happens when a child process gets OOM killed instead of your main process. Docker only flags OOMKilled if PID 1 dies. If your app spawns worker processes and one gets killed, the main process might exit with code 137 but OOMKilled stays false.Check /var/log/kern.log on the host for actual OOM kill messages: dmesg | grep -i "killed process"

Q

Docker stats shows 200MB usage but my 512MB container died - why?

A

Docker stats can be misleading because it includes cached memory that the kernel can reclaim. The OOM killer looks at committed memory (RSS + swap). Your app might have allocated more memory than Docker stats indicates.Use docker exec container_name cat /proc/meminfo to see detailed memory breakdown inside the container.

Q

How do I set the right memory limit without guessing?

A

Run your app without limits first and monitor it under realistic load:bash# Run without memory limitdocker run -d --name test-app your-image# Monitor for 24-48 hoursdocker stats test-app# Set limit to 2x observed peak usagedocker run --memory=800m your-image # If peak was 400MBAlways leave headroom. Memory usage isn't constant.

Q

Why does my Java app use way more memory than my heap size?

A

JVM memory != heap memory.

The JVM needs memory for:

  • Method area/Metaspace
  • Code cache
  • Compressed class space
  • Direct memory (NIO, unsafe operations)
  • Stack space for threadsRule of thumb: Container limit = heap + 25% for JVM overhead.
Q

Can I prevent OOM kills completely?

A

No, and you shouldn't want to.

OOM kills protect your system from runaway processes. Instead:

  • Set appropriate memory limits
  • Handle memory pressure gracefully in your application
  • Use swap carefully (it can mask problems and hurt performance)
  • Monitor and alert on memory usage trends
Q

My container uses 100% CPU after OOM kill - is this normal?

A

No. If your container survives an OOM kill (child process died), the main process might be stuck in a bad state

  • possibly trying to allocate memory in a loop or handling the death of child processes poorly.Restart the container and investigate why child processes are getting killed.
Q

How do I handle OOM kills in production gracefully?

A

Application level:

  • Implement graceful degradation when memory is low
  • Use streaming instead of loading large datasets into memory
  • Add circuit breakers for memory-intensive operationsInfrastructure level:
  • Set up monitoring with memory usage alerts
  • Use horizontal scaling (more containers) instead of just increasing memory
  • Implement health checks that detect memory pressure
Q

Why do some containers survive longer than others with the same memory usage?

A

The OOM killer doesn't just look at memory usage

  • it scores processes based on:

  • How much memory they're using (higher = more likely to die)

  • How long they've been running (newer processes get killed first)

  • Process priority and OOM adjustment scoresMain processes with long uptime are less likely to be killed than recently spawned children using the same amount of memory.

Q

Should I use swap in containers?

A

Generally no.

Swap can mask memory issues and hurt performance. If you need swap:

  • Set a reasonable limit (--memory=1g --memory-swap=1.5g)
  • Monitor swap usage
  • high swap usage indicates undersized containers
  • Consider memory.swappiness settings
Q

How can I debug OOM kills after the fact?

A

Check system logs:bash# On the Docker hostdmesg | grep -i "killed process"journalctl -u docker | grep -i oomCheck container inspection:bashdocker inspect container_name | grep -A5 -B5 Memory**Application logs:**Most apps don't log memory pressure, but some frameworks do. Look for garbage collection warnings, allocation failures, or "out of memory" messages.

Q

My memory usage gradually increases then suddenly drops to zero - memory leak?

A

Probably garbage collection, not a leak.

Languages with GC (Java, Go, JavaScript) accumulate memory until GC runs. This creates a sawtooth pattern.Real leaks show consistently increasing memory with no drops. Profile your application with tools like:

  • Java:

JProfiler, VisualVM

  • Node.js: heapdump, clinic.js
  • Go: pprof
  • Python: memory_profiler

Prevention Strategies That Actually Work in Production

The best OOM kill is the one that never happens. Here's how to engineer your containers to handle memory pressure without dying spectacularly at inconvenient times.

Memory Monitoring That Doesn't Lie

Stop relying on `docker stats` for production monitoring. It shows you what Docker thinks your container is using, not what the OOM killer sees. Use proper monitoring that tracks RSS memory and sends alerts before you hit limits. This monitoring approach prevents surprises in production.

Container Memory Monitoring

Monitoring stack that works:

  • cAdvisor for container metrics collection
  • Prometheus for storage and alerting
  • Grafana for visualization
  • Alert at 75% memory usage, panic at 90%

Set up alerts on memory growth rate, not just absolute usage. A container that goes from 100MB to 400MB in 10 minutes is in trouble, even if 400MB is under your limit. This proactive monitoring approach catches problems before they become outages.

The Container Right-Sizing Process

Here's the systematic approach that doesn't involve guessing:

Step 1: Run production workload without memory limits for 48-72 hours while monitoring RSS memory usage. Yes, this is scary in prod. Do it in staging first, then gradually roll to production during low-traffic periods. Use proper monitoring tools to track actual memory consumption.

Step 2: Calculate your memory requirements using established formulas:

Memory limit = (Peak RSS * 1.5) + JVM/runtime overhead

For JVM apps, add 200-400MB overhead. For Node.js, add 50-100MB. For Go, add 20-50MB. This sizing methodology works across different runtimes.

Step 3: Test your limits under load. Use stress testing tools that mimics your actual usage patterns - API calls, background jobs, whatever causes memory allocation in your app. Load testing reveals memory patterns not visible during normal operation.

Application-Level Defense

The kernel OOM killer is a last resort. Your application should handle memory pressure gracefully before getting to that point. This defensive programming approach prevents outages.

Implement backpressure: When memory usage hits 80% of your limit, start rejecting non-essential operations. Circuit breakers for memory-intensive operations prevent cascade failures. This pattern is well-documented in production systems.

Stream don't load: Processing a 500MB CSV file? Stream it line by line instead of loading it entirely into memory. This applies to API responses, file processing, database queries - anything that deals with large datasets. Streaming patterns reduce memory footprint dramatically.

Connection pooling limits: Database connection pools, HTTP clients, message queues - they all consume memory per connection. Set reasonable limits based on your memory budget, not just performance requirements. Production deployments require careful pool sizing.

## Bad: Unlimited connections
requests.Session()

## Good: Memory-aware pool size  
session = requests.Session()
session.mount('http://', HTTPAdapter(pool_maxsize=50))

Exit Code Reference for Production

Not all container deaths are OOM kills. Here's what the exit codes actually mean:

Exit Code What Happened Action Required
137 OOM killed by kernel Increase memory limits or fix memory leak
125 Docker daemon error Check Dockerfile syntax, image exists
126 Command not executable Fix file permissions or command path
127 Command not found App binary missing or PATH wrong
1 Application error Check application logs for actual error
0 Clean exit This is good (unless unexpected)

Pro tip: Exit code 137 doesn't always mean OOM. It's any SIGKILL, which could be from manual docker kill or system shutdown. Always check the OOMKilled flag and system logs to confirm.

The Kubernetes Memory Limit Gotcha

In Kubernetes, memory requests and limits serve different purposes. The scheduler uses requests for placement decisions, but the OOM killer respects limits. This fundamental difference causes confusion and production issues.

This creates a dangerous scenario: If you set requests too low, Kubernetes might schedule too many pods on a node. When they all hit their limits simultaneously, the node runs out of memory and starts killing pods randomly. This overcommitment problem affects cluster stability.

2025 best practice: Set memory.requests = memory.limits. This prevents overcommitment but wastes some memory. For most applications, the reliability is worth the cost. The Kubernetes community increasingly recommends this approach.

resources:
  requests:
    memory: \"512Mi\"
  limits:
    memory: \"512Mi\"  # Same value prevents overcommit

When Your App Needs More Memory Than You Have

Sometimes you can't reduce memory usage or increase limits. Here are the escape hatches:

Horizontal scaling: Run multiple smaller containers instead of one large one. This works better with modern orchestration and provides better fault isolation. Microservices patterns naturally support this approach.

Memory-mapped files: For read-heavy workloads, memory-map large files instead of loading them. The kernel manages the memory and can evict pages when needed. This technique works well for database systems and data processing.

External caching: Move memory usage outside your container. Redis, Memcached, or even file-based caching can offload memory pressure from your application containers. Distributed caching patterns help scale memory usage.

Batch processing: Instead of processing everything at once, work in smaller batches that fit in memory. Takes longer but doesn't explode. Stream processing frameworks excel at this pattern.

The goal isn't to eliminate all OOM kills - it's to make them predictable and recoverable. A container that dies cleanly and restarts quickly is better than one that limps along consuming resources. Resilient architecture patterns embrace this reality.

Essential Resources for Docker Memory Debugging

Related Tools & Recommendations

integration
Similar content

Jenkins Docker Kubernetes CI/CD: Deploy Without Breaking Production

The Real Guide to CI/CD That Actually Works

Jenkins
/integration/jenkins-docker-kubernetes/enterprise-ci-cd-pipeline
100%
tool
Recommended

Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)

Google runs your Kubernetes clusters so you don't wake up to etcd corruption at 3am. Costs way more than DIY but beats losing your weekend to cluster disasters.

Google Kubernetes Engine (GKE)
/tool/google-kubernetes-engine/overview
61%
tool
Similar content

Docker Desktop: GUI for Containers, Pricing, & Setup Guide

Docker's desktop app that packages Docker with a GUI (and a $9/month price tag)

Docker Desktop
/tool/docker-desktop/overview
61%
tool
Similar content

Podman: Rootless Containers, Docker Alternative & Key Differences

Runs containers without a daemon, perfect for security-conscious teams and CI/CD pipelines

Podman
/tool/podman/overview
55%
troubleshoot
Recommended

Fix Kubernetes Service Not Accessible - Stop the 503 Hell

Your pods show "Running" but users get connection refused? Welcome to Kubernetes networking hell.

Kubernetes
/troubleshoot/kubernetes-service-not-accessible/service-connectivity-troubleshooting
43%
howto
Similar content

Mastering Docker Dev Setup: Fix Exit Code 137 & Performance

Three weeks into a project and Docker Desktop suddenly decides your container needs 16GB of RAM to run a basic Node.js app

Docker Desktop
/howto/setup-docker-development-environment/complete-development-setup
41%
tool
Recommended

GitHub Actions Security Hardening - Prevent Supply Chain Attacks

integrates with GitHub Actions

GitHub Actions
/tool/github-actions/security-hardening
41%
alternatives
Recommended

Tired of GitHub Actions Eating Your Budget? Here's Where Teams Are Actually Going

integrates with GitHub Actions

GitHub Actions
/alternatives/github-actions/migration-ready-alternatives
41%
tool
Recommended

GitHub Actions - CI/CD That Actually Lives Inside GitHub

integrates with GitHub Actions

GitHub Actions
/tool/github-actions/overview
41%
troubleshoot
Similar content

Trivy Scanning Failures - Common Problems and Solutions

Fix timeout errors, memory crashes, and database download failures that break your security scans

Trivy
/troubleshoot/trivy-scanning-failures-fix/common-scanning-failures
39%
tool
Similar content

Docker: Package Code, Run Anywhere - Fix 'Works on My Machine'

No more "works on my machine" excuses. Docker packages your app with everything it needs so it runs the same on your laptop, staging, and prod.

Docker Engine
/tool/docker/overview
35%
troubleshoot
Recommended

Docker Desktop is Fucked - CVE-2025-9074 Container Escape

Any container can take over your entire machine with one HTTP request

Docker Desktop
/troubleshoot/cve-2025-9074-docker-desktop-fix/container-escape-mitigation
31%
troubleshoot
Recommended

Docker Desktop Security Configuration Broken? Fix It Fast

The security configs that actually work instead of the broken garbage Docker ships

Docker Desktop
/troubleshoot/docker-desktop-security-hardening/security-configuration-issues
31%
tool
Recommended

Jenkins - The CI/CD Server That Won't Die

integrates with Jenkins

Jenkins
/tool/jenkins/overview
28%
tool
Recommended

Jenkins Production Deployment - From Dev to Bulletproof

integrates with Jenkins

Jenkins
/tool/jenkins/production-deployment
28%
troubleshoot
Similar content

Fix Trivy & ECR Container Scan Authentication Issues

Trivy says "unauthorized" but your Docker login works fine? ECR tokens died overnight? Here's how to fix the authentication bullshit that keeps breaking your sc

Trivy
/troubleshoot/container-security-scan-failed/registry-access-authentication-issues
28%
troubleshoot
Similar content

Docker 'No Space Left on Device' Error: Fast Fixes & Solutions

Stop Wasting Hours on Disk Space Hell

Docker
/troubleshoot/docker-no-space-left-on-device-fix/no-space-left-on-device-solutions
28%
integration
Similar content

Prometheus, Grafana, Alertmanager: Complete Monitoring Stack Setup

How to Connect Prometheus, Grafana, and Alertmanager Without Losing Your Sanity

Prometheus
/integration/prometheus-grafana-alertmanager/complete-monitoring-integration
27%
howto
Recommended

Deploy Django with Docker Compose - Complete Production Guide

End the deployment nightmare: From broken containers to bulletproof production deployments that actually work

Django
/howto/deploy-django-docker-compose/complete-production-deployment-guide
25%
tool
Similar content

Binance API Security Hardening: Protect Your Trading Bots

The complete security checklist for running Binance trading bots in production without losing your shirt

Binance API
/tool/binance-api/production-security-hardening
25%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization