Why is Redis eating 16GB when I'm only storing 4GB of data?

Your fragmentation ratio is probably fucked. Redis is allocating 16GB but only using 4GB for actual data - the rest is just holes in memory that can't be reused. ```bash redis-cli INFO memory | grep fragmentation mem_fragmentation_ratio:3.87 ``` **Causes:** - Mixed key sizes with different expiration patterns - Heavy use of Lists or Streams with frequent trimming - Hash resizing under high write load - Failed active defragmentation attempts Only real fix is restarting Redis, which sucks if you can't afford downtime. Learned this lesson the hard way - use consistent key sizes instead of mixing tiny session tokens with huge user objects like an idiot.

Redis died with exit code 137 again - what the hell?

Exit code 137 means Linux OOM killer terminated Redis. This happens when: - No `maxmemory` limit is set (Redis grows without bounds) - `maxmemory` is set too high for available system RAM - `maxmemory-policy noeviction` prevents Redis from freeing memory - Memory fragmentation causes RSS usage to exceed expectations **Fix immediately:** ```bash # Set proper memory limits (70% of system RAM) redis-cli CONFIG SET maxmemory 5368709120 # 5GB on 8GB system redis-cli CONFIG SET maxmemory-policy allkeys-lru # Disable swap to prevent performance degradation sudo swapoff -a ```

Getting "OOM command not allowed" errors - now what?

This error means Redis hit its memory limit but can't evict keys. Common causes: - `maxmemory-policy noeviction` (most common) - All keys have TTLs but using `volatile-*` policy with no expired keys - Eviction policy can't find suitable keys to remove **Debug and fix:** ```bash # Check current policy redis-cli CONFIG GET maxmemory-policy # If it's "noeviction", change it immediately: redis-cli CONFIG SET maxmemory-policy allkeys-lru # Check if you have keys without TTL when using volatile-* policies redis-cli RANDOMKEY redis-cli TTL key_name # -1 means no expiration set ```

Why is Redis suddenly slower than molasses?

Either Redis is swapping to disk (death sentence) or active defragmentation is running and blocking everything. Both turn your app into unusable garbage. **Swap thrashing:** ```bash free -h # Look for active swap usage iostat 1 5 # Check for sudden I/O spikes to swap device ``` **Active defragmentation blocking operations:** ```bash redis-cli CONFIG GET activedefrag # If "yes", try disabling during peak hours: redis-cli CONFIG SET activedefrag no ``` **Memory pressure causing fork() delays:** ```bash redis-cli INFO persistence | grep latest_fork_usec # If > 10000 (10ms), memory pressure is slowing snapshots ```

TTLs are set but memory keeps growing - WTF?

Several scenarios cause memory leaks despite TTL settings: **Keys not actually expiring:** ```bash # Check if expiration is working redis-cli RANDOMKEY redis-cli TTL your_key # Should show countdown, not -1 # Force expire scan (careful - this can spike CPU) redis-cli CONFIG SET hz 100 # Increase background task frequency # Note: Higher hz values increase CPU usage but improve expiration accuracy ``` **Memory not returned to OS:** - jemalloc doesn't always release freed memory back to the system - Fragmentation prevents reuse of freed blocks - Large key deletions create unusable memory holes **Stream accumulation:** ```bash # Check if Redis Streams are growing unbounded redis-cli XLEN your_stream_key # Use XTRIM to limit stream length redis-cli XTRIM your_stream_key MAXLEN ~ 10000 ```

Which keys are hogging all my memory?

Use Redis's memory analysis commands systematically: ```bash # Find the biggest keys by memory usage redis-cli --bigkeys --memkeys-samples 10000 # Analyze specific key memory usage redis-cli MEMORY USAGE suspicious_key_name # Get memory distribution by key pattern redis-cli --scan --pattern "user:*" | head -100 | \ while read key; do echo "$key: $(redis-cli MEMORY USAGE "$key") bytes" done | sort -k2 -nr ``` **Common memory hogs:** - User profiles as large JSON objects (>100KB each) - Uncompressed cached web pages - Growing Lists used as queues without trimming - Hash tables with many small fields (overhead per field)

Redis cluster nodes have very different memory usage - is this normal?

Uneven memory distribution in clusters usually indicates problems: **Check slot distribution:** ```bash redis-cli CLUSTER NODES | awk '{print $1, $9}' | sort -k2 # Look for nodes with significantly different slot ranges ``` **Failed slot migrations:** ```bash # Check for stuck migrations redis-cli CLUSTER NODES | grep "importing\|migrating" # Check for orphaned keys after failed migrations redis-cli --scan | wc -l # Compare key counts across nodes ``` **Hot keys concentrated on specific nodes:** ```bash # Monitor traffic per node for node in redis-{1..6}; do echo "$node: $(redis-cli -h $node INFO stats | grep total_commands_processed)" done ```

Why does Redis memory usage spike during RDB snapshots?

RDB snapshots use `fork()` which creates a copy-on-write duplicate of the Redis process. Memory spikes occur when: **High write activity during snapshot:** - Original pages get copied when modified after fork - Can temporarily double memory usage - Fragmented memory makes this worse **Monitor fork performance:** ```bash redis-cli INFO persistence | grep -E "(latest_fork_usec|rdb_last_save_time)" # latest_fork_usec > 50000 (50ms) indicates memory pressure ``` **Solutions:** ```bash # Schedule snapshots during low-traffic periods redis-cli CONFIG SET save "0" # Disable automatic saves # Use manual saves: redis-cli BGSAVE # Or use AOF instead of RDB for write-heavy workloads redis-cli CONFIG SET appendonly yes redis-cli CONFIG SET save "" ```

How do I prevent Redis from allocating memory for unused hash slots?

This question reveals a misunderstanding - Redis doesn't pre-allocate memory for hash slots. Memory usage issues in clusters are usually: **Slot migration artifacts:** ```bash # Find orphaned keys not belonging to assigned slots redis-cli CLUSTER SLOTS # Check assigned slots for this node redis-cli --scan | while read key; do slot=$(redis-cli CLUSTER KEYSLOT "$key") echo "$key is in slot $slot" done ``` **Uneven key distribution:** ```bash # Check keys per slot (expensive operation, use carefully) for slot in {0..16383}; do count=$(redis-cli CLUSTER COUNTKEYSINSLOT $slot) if [ $count -gt 1000 ]; then echo "Slot $slot has $count keys (possible hot slot)" fi done ```

Redis shows low memory usage but the system is running out of RAM - why?

This happens when Redis's view of memory usage doesn't match system reality: **Check the actual memory consumption:** ```bash # Redis's view redis-cli INFO memory | grep used_memory_human # System's view ps aux | grep redis-server top -p $(pgrep redis-server) # The difference indicates fragmentation or memory leaks ``` **Hidden memory consumers:** - Client output buffers for pub/sub or slow clients - Replication backlog size - Lua script compilation cache - Module memory usage not tracked by Redis ```bash # Check client memory usage redis-cli CLIENT LIST | grep omem # omem shows output buffer memory per client # Check replication memory redis-cli INFO replication | grep backlog ```

My Redis instance has been restarted but memory usage is still high - why?

If memory usage remains high after restart, the problem isn't fragmentation: **Large dataset:** ```bash # Check actual data size redis-cli INFO memory | grep dataset.bytes redis-cli DBSIZE # Count of keys ``` **Persistent data loading:** - RDB file is large and being loaded into memory - AOF replay is loading historical write operations **Configuration issues:** ```bash # Verify maxmemory setting is appropriate redis-cli CONFIG GET maxmemory # Check if you're loading test data accidentally redis-cli --scan --pattern "test:*" | wc -l ```

Should I just restart Redis every week to fix fragmentation?

Scheduled restarts are a band-aid, not a solution. If you're restarting Redis weekly "just in case," you have bigger problems with your data patterns. I used to restart weekly at my first job like an amateur - don't be me. **Prevent fragmentation:** - Use consistent key sizes and naming patterns - Implement proper TTL strategies - Monitor fragmentation ratio and address root causes **Schedule restarts only when:** - Fragmentation ratio exceeds 2.0 consistently - Active defragmentation isn't effective - Performance degradation is measurable - You have proper failover mechanisms **Better long-term solutions:** - Fix application patterns causing fragmentation - Use Redis Enterprise with better memory management - Consider alternative architectures (multiple smaller instances vs. one large instance)

Currently viewing the AI version

Switch to human version

Redis Memory Optimization: AI-Optimized Technical Reference

Critical Memory Fragmentation Intelligence

Fragmentation Ratio Thresholds

Ratio 1.0-1.3: Healthy memory usage, minimal fragmentation
Ratio 1.3-1.5: Moderate fragmentation, monitor closely
Ratio 1.5-2.0: Serious fragmentation, performance impact likely
Ratio >2.0: Critical fragmentation, immediate action required
Ratio >2.5: Restart recommended, fixing without restart extremely difficult

Real-World Fragmentation Impact

Production case study: 16GB allocated, 4.2GB actual data, fragmentation ratio 3.4
Result: 240% memory waste, Black Friday incident, OOM kills despite low data usage
Consequence: System becomes effectively unusable despite having sufficient theoretical capacity

Root Causes of Production Fragmentation

Variable-Size Key Expiration Patterns (Most Critical)

# Problem Pattern:
SET large_user_profile:12345 "500KB JSON object"
SET session:abc "small session token"  
SET large_user_profile:67890 "500KB JSON object"
EXPIRE session:abc 300  # Creates unusable gaps

Impact: Large allocations cannot fit in small freed gaps
Frequency: Occurs in mixed workloads with different TTL patterns
Severity: Primary cause of production fragmentation

Hash Resizing Under Load

Trigger: Hash tables automatically resize during growth
Impact: Temporarily doubles memory usage during rehashing
Detection: Monitor latency spikes during high write volume
Fragmentation: Leaves unusable blocks after rehashing completes

List and Stream Operations

Problem: Memory allocated in chunks, freed chunks often non-reusable
Specific Operations: LTRIM, XTRIM operations
Workaround: Use consistent trimming patterns

Advanced Fragmentation Diagnosis

# Critical diagnostic commands
redis-cli MEMORY STATS
redis-cli MEMORY USAGE key_name
redis-cli --bigkeys --memkeys-samples 10000

# Fragmentation visualization metrics
total.allocated:       8589934592  # 8GB allocated
dataset.bytes:         6442450944  # 6GB actual data  
fragmentation.bytes:   2147483648  # 2GB fragmented
fragmentation.ratio:   1.33        # 33% waste

Active Defragmentation Dangers

Critical Warning: Active defragmentation often backfires in production

Blocks main thread: Pauses command processing during defragmentation
Cluster failures: Causes timeouts, other nodes consider defragging node failed
CPU intensive: Can trigger thermal throttling on cloud instances
Temporary fragmentation increase: Moving memory fragments it more initially
Production incident example: Enabled during 2am crisis, increased downtime by 30 minutes

Safe Configuration (Use Carefully):

CONFIG SET activedefrag yes
CONFIG SET active-defrag-ignore-bytes 100mb
CONFIG SET active-defrag-threshold-lower 10
CONFIG SET active-defrag-cycle-min 1
CONFIG SET active-defrag-cycle-max 25

Never Enable During:

High traffic periods
Cluster operations
Memory pressure situations
Production incidents

OOM Killer Prevention

OOM Kill Chain of Events

Memory pressure builds: Redis approaches physical RAM limits
System starts swapping: Performance degrades dramatically
OOM killer evaluates: Kernel identifies Redis as memory hog
SIGKILL sent: Process terminates immediately, no graceful shutdown
Data loss occurs: Unsaved data since last persistence point lost
Cascade failures: All Redis-dependent services fail

Critical Memory Configuration

Production Memory Sizing Rule:

Physical RAM: Total server memory
OS + Other Services: Reserve 15-20% (1.5-2GB on 8GB system)
Redis maxmemory: 70-80% of physical RAM maximum
Safety buffer: Keep 500MB-1GB extra headroom

# CRITICAL - Never use default (dangerous)
# maxmemory 0  # This will kill your server

# CORRECT - Production configuration
maxmemory 6gb        # On 8GB system (75% utilization)
maxmemory-policy allkeys-lru

Eviction Policies - Production Reality

Recommended for Production:

allkeys-lru: Safe default, evicts least recently used keys
allkeys-lfu: Only if clear hot/cold data patterns exist
volatile-ttl: Only if all keys have TTLs set consistently

Never Use in Production:

noeviction: Causes "OOM command not allowed" errors, breaks applications
allkeys-random: Poor performance, unpredictable behavior

Noeviction Trap:

# This configuration kills applications
maxmemory 8gb
maxmemory-policy noeviction
# Result: Read-only mode when memory full, users can't login/write

System-Level OOM Prevention

Swap Configuration:

# Disable swap completely (recommended)
sudo swapoff -a
# Or minimize swapping
sudo sysctl vm.swappiness=1

Critical: Redis swapping destroys performance (2ms → 2000ms latency)
Impact: User experience becomes unusable

Memory Overcommit Settings:

# Recommended for Redis servers
echo 1 > /proc/sys/vm/overcommit_memory
echo 80 > /proc/sys/vm/overcommit_ratio

Failure Impact: Redis dies even when well-behaved
Debug Time: 4+ hours typical debugging time for incorrect settings

Container Memory Configuration

Docker/Kubernetes Requirements:

# Container limit must exceed Redis maxmemory
resources:
  limits:
    memory: "1.5Gi"   # Container limit
  requests:
    memory: "1Gi"
    
# Redis configuration
command: redis-server --maxmemory 1073741824  # 1GB Redis limit

Critical: Container memory > Redis maxmemory always
Failure: Docker kills container if limit exceeded

Memory Monitoring - Essential Metrics

Critical Alerting Thresholds

# Critical alerts (immediate action required)
mem_fragmentation_ratio > 1.5
used_memory / maxmemory > 0.85
used_memory_rss > 80% of system RAM
latest_fork_usec > 10000  # Fork operations slow (memory pressure)

# Warning alerts (monitor closely)
mem_fragmentation_ratio > 1.3
used_memory / maxmemory > 0.75

Production Monitoring Script

#!/bin/bash
# Essential Redis memory health check
REDIS_CLI="redis-cli"
USED_MEMORY=$($REDIS_CLI INFO memory | grep '^used_memory:' | cut -d: -f2 | tr -d '\r')
FRAGMENTATION=$($REDIS_CLI INFO memory | grep '^mem_fragmentation_ratio:' | cut -d: -f2 | tr -d '\r')
MAX_MEMORY=$($REDIS_CLI CONFIG GET maxmemory | tail -1)

UTILIZATION=$((USED_MEMORY * 100 / MAX_MEMORY))

# Critical alerts only
if [ $UTILIZATION -gt 85 ]; then
    echo "CRITICAL: Memory utilization >85% - OOM risk high"
fi

if (( $(echo "$FRAGMENTATION > 2.0" | bc -l) )); then
    echo "CRITICAL: Fragmentation ratio >2.0 - restart recommended"
fi

Troubleshooting Common Scenarios

Memory Growing Despite TTLs

Root Causes:

jemalloc not releasing freed memory to OS
Fragmentation preventing reuse of freed blocks
Stream accumulation without trimming
Keys not actually expiring (TTL -1 check)

Diagnostic Commands:

redis-cli RANDOMKEY
redis-cli TTL key_name  # Should show countdown, not -1
redis-cli XLEN stream_key  # Check stream growth

Redis Slower Than Expected

Primary Causes:

Swap thrashing: Check free -h for active swap
Active defragmentation: Blocks main thread during operation
Memory pressure: Fork operations become slow (>10ms)

Quick Diagnosis:

free -h  # Check swap usage
iostat 1 5  # Look for I/O spikes
redis-cli CONFIG GET activedefrag  # Check if defrag enabled

Uneven Cluster Memory Usage

Indicators of Problems:

Significantly different fragmentation ratios between nodes
Failed slot migrations leaving orphaned keys
Hot keys concentrated on specific nodes

Investigation Commands:

# Check slot distribution
redis-cli CLUSTER NODES | awk '{print $1, $9}' | sort -k2

# Check for stuck migrations
redis-cli CLUSTER NODES | grep "importing\|migrating"

Resource Requirements and Costs

Time Investment for Memory Issues

Basic fragmentation fix: 30 minutes (restart approach)
Root cause analysis: 4-8 hours typical debugging time
Production incident response: 2-4 hours average downtime
Monitoring setup: 2-3 days for comprehensive monitoring

Expertise Requirements

Basic memory management: Junior DevOps level
Fragmentation debugging: Senior Redis knowledge required
Production incident response: Expert-level troubleshooting skills
Cluster memory issues: Advanced distributed systems knowledge

Hidden Costs

Memory waste: Up to 300% overhead with severe fragmentation
Performance degradation: 1000x latency increase when swapping occurs
Incident response: Average 3am wake-up calls, weekend debugging sessions
Application failures: Cascade failures affecting all Redis-dependent services

Decision Criteria for Solutions

When to Restart vs Fix

Restart Immediately When:

Fragmentation ratio >2.5 and climbing
Memory efficiency <50%
Active defragmentation ineffective after 24 hours
Can afford downtime (seconds for <1GB, minutes for larger datasets)

Attempt to Fix When:

Fragmentation ratio 1.5-2.5 and stable
Production system cannot afford restart
Replica nodes available for failover

Alternative Technologies

KeyDB: Claims 5x better performance, improved memory handling
Dragonfly DB: Modern Redis alternative designed to avoid fragmentation
Redis Enterprise: Better memory management, commercial support

Cost-Benefit Analysis

Scheduled weekly restarts: Band-aid solution, indicates design problems
Memory monitoring: Essential, prevents 90% of memory-related incidents
Active defragmentation: High risk, limited benefit in production
Cluster vs single instance: Multiple smaller instances often more stable

Critical Configuration Templates

Production Redis Configuration

# Memory settings
maxmemory 75% of system RAM
maxmemory-policy allkeys-lru
save ""  # Disable RDB for cache workloads

# System settings  
vm.overcommit_memory = 1
vm.swappiness = 1

Monitoring Integration

# Prometheus alerting rules
- alert: RedisMemoryHigh
  expr: redis_memory_used_bytes / redis_config_maxmemory_bytes > 0.85
  for: 2m
  
- alert: RedisMemoryFragmentation  
  expr: redis_memory_fragmentation_ratio > 1.5
  for: 5m

Container Deployment

# Kubernetes deployment
resources:
  limits:
    memory: "1.5Gi"    # 50% overhead for container
  requests:  
    memory: "1Gi"      # Guaranteed memory

# Redis args
args: ["redis-server", "--maxmemory", "1073741824", "--maxmemory-policy", "allkeys-lru"]

Recovery Procedures

Post-OOM Kill Recovery

Check system memory: free -h
Verify configuration: redis-server --test-config
Start with lower limits: --maxmemory 4gb
Monitor during startup: watch redis-cli INFO memory
Gradually increase limits as system stabilizes

Memory Leak Investigation

Use MEMORY USAGE on suspected keys
Run --bigkeys --memkeys-samples 10000
Check client output buffers: CLIENT LIST | grep omem
Verify replication backlog size
Monitor memory trends over time

This technical reference provides the operational intelligence needed to successfully implement Redis memory optimization, avoid common pitfalls, and respond effectively to memory-related incidents.

Useful Links for Further Investigation

The Only Links You Actually Need

Link	Description
Redis Memory Optimization Guide	The official memory guide. Dense but comprehensive. Bookmark this one.
Memory Usage Command Reference	How to analyze what's eating your memory. I reference this weekly.
Redis Configuration	All the config options, including the memory ones that matter.
Redis Insight	Official GUI tool. Actually useful for visualizing memory usage. Saved my ass during a fragmentation crisis.
Prometheus Redis Exporter	If you're using Prometheus, this exports Redis metrics. Works reliably.
AWS ElastiCache Memory Guide	AWS-specific memory settings and monitoring.
Google Cloud Memorystore	GCP's managed Redis documentation.
Redis Troubleshooting Guide	Official debugging docs. Start here when things go wrong.
Stack Overflow Redis+Memory	Real problems from real people. Better than most documentation. Found solutions here that the official docs completely missed.
KeyDB Performance Comparison	KeyDB claims better memory handling than Redis.
Dragonfly DB	Modern Redis alternative designed to avoid fragmentation issues.

Redis Memory Optimization: AI-Optimized Technical Reference

Critical Memory Fragmentation Intelligence

Fragmentation Ratio Thresholds

Real-World Fragmentation Impact

Root Causes of Production Fragmentation

Advanced Fragmentation Diagnosis

Active Defragmentation Dangers

OOM Killer Prevention

OOM Kill Chain of Events

Critical Memory Configuration

Eviction Policies - Production Reality

System-Level OOM Prevention

Container Memory Configuration

Memory Monitoring - Essential Metrics

Critical Alerting Thresholds

Production Monitoring Script

Troubleshooting Common Scenarios

Memory Growing Despite TTLs

Redis Slower Than Expected

Uneven Cluster Memory Usage

Resource Requirements and Costs

Time Investment for Memory Issues

Expertise Requirements

Hidden Costs

Decision Criteria for Solutions

When to Restart vs Fix

Alternative Technologies

Cost-Benefit Analysis

Critical Configuration Templates

Production Redis Configuration

Monitoring Integration

Container Deployment

Recovery Procedures

Post-OOM Kill Recovery

Memory Leak Investigation

Useful Links for Further Investigation

The Only Links You Actually Need

Related Tools & Recommendations

Django + Celery + Redis + Docker - Fix Your Broken Background Tasks

Setting Up Prometheus Monitoring That Won't Make You Hate Your Job

Docker's Licensing Hit Us Hard - Here's What We Switched To

Docker Desktop is Fucked - CVE-2025-9074 Container Escape

Your Kubernetes Cluster is Down and Customers are Screaming

RAG on Kubernetes: Why You Probably Don't Need It (But If You Do, Here's How)

Kubernetes Enterprise Review - Is It Worth The Investment in 2025?

Redis vs Memcached vs Hazelcast: Production Caching Decision Guide

Prometheus - Scrapes Metrics From Your Shit So You Know When It Breaks

Kafka + MongoDB + Kubernetes + Prometheus Integration - When Event Streams Break

OpenTelemetry + Jaeger + Grafana on Kubernetes - The Stack That Actually Works

Grafana - The Monitoring Dashboard That Doesn't Suck

Memcached - Stop Your Database From Dying

GitHub Actions is Fucking Slow: Alternatives That Actually Work

GitHub Actions Alternatives for Security & Compliance Teams

GitHub Actions + Jenkins Security Integration

Django - The Web Framework for Perfectionists with Deadlines

Django Production Deployment - Enterprise-Ready Guide for 2025

HashiCorp Vault - Overly Complicated Secrets Manager

HashiCorp Vault Pricing: What It Actually Costs When the Dust Settles