Django + Celery + Redis + Docker - Fix Your Broken Background Tasks

Why Background Tasks Matter (And Why Django Sucks at Them)

Django wasn't built for background work. Try processing a 10MB file upload inline and watch your server burn. I learned this when our report generation endpoint regularly hit 30-second timeouts during lunch rush - turns out accounting people all export reports at exactly 12:15 PM.

What Goes Wrong With Synchronous Django

The stuff that breaks everything:

Sending emails - blocks HTTP thread for 2-8 seconds depending on SMTP
Image resizing - 50MB photos will eat your CPU alive
PDF generation - memory usage spikes to 2GB+ for complex reports
Data exports - CSV with 100k rows takes 45 seconds to build
Third-party API calls - external services go down, your requests hang

Real failure story: Our customer uploaded a 47MB product photo during peak traffic. Django tried to resize it inline, consumed 3GB RAM, triggered OOMKilled, took down the whole container. 200 users got 502 errors because one person uploaded a massive image.

Error log looked like this:

[2025-08-15 12:23:45] ERROR django.request: Internal Server Error: /upload/
[2025-08-15 12:23:47] CRITICAL gunicorn.error: WORKER TIMEOUT (pid:1847)
[2025-08-15 12:23:48] WARNING kernel: [15234.567890] Memory cgroup out of memory: Killed process 1847 (gunicorn: worker) score 1000 or total-vm:3145728kB, anon-rss:2097152kB

Why Django Async Views Don't Fix This

Django 4.1+ has async views but they're useless for CPU-intensive work. async def only helps with I/O waiting - database queries, HTTP requests, file reads. But image processing, PDF generation, data crunching? Still blocks the event loop.

Plus async Django is a pain in the ass to debug. Stack traces get weird, database connections act funny, and most third-party packages don't support it anyway.

The Background Task Solution That Actually Works

Distributed Task Queue Architecture

Split your work into two phases:

Web request: Accept the job, return immediately
Background worker: Process the job separately, update database when done

User uploads file → Django saves it, queues a task, returns "Processing..." → Celery worker handles resize → Updates database → User gets notification.

Architecture Overview: Django web servers handle HTTP requests while Celery workers process background tasks through Redis message broker.

Tech stack I use:

Redis: Message queue (because it's simple and we already use it for caching)
Celery: Task runner (despite its networking bullshit, it works)
PostgreSQL: Database (shared between web and workers)
Docker: Because deployment without containers is suffering

Why Redis Instead of RabbitMQ

I tried RabbitMQ first. Spent two days fighting Erlang dependencies, cluster configuration, management UI permissions, and memory management issues. Said fuck it and went with Redis after reading the Redis vs RabbitMQ comparison.

Redis advantages:

Already running it for Django cache, sessions, etc.
One docker run command and it works
Easy to debug with `redis-cli`
Uses way less memory than RabbitMQ

Redis gotchas:

Messages disappear if Redis crashes (enable AOF persistence)
No fancy routing like RabbitMQ (we don't need it anyway)
Memory usage can get weird with long queues

Celery Integration Hell

Celery talks to Django through shared database connections and settings import. Works great until it doesn't.

Connection problems: Workers can't find Django models, import errors everywhere, database connections timeout. Fixed by making sure `PYTHONPATH` and `DJANGO_SETTINGS_MODULE` are identical between web and worker containers. Check the Celery Django integration docs and Django deployment checklist for common configuration issues.

Database connection limits: PostgreSQL defaults to 100 connections. With 8 web workers + 6 Celery workers + monitoring, you hit limits fast. Had to bump `max_connections = 200` and add connection pooling.

Memory leaks: Celery workers grow memory over time. Set `CELERY_WORKER_MAX_TASKS_PER_CHILD = 1000` to recycle them before they eat all your RAM.

Docker Networking Pain Points

Docker networking will fuck you. Use service names, not localhost. This took me 4 hours to figure out because local development worked fine but Docker containers couldn't talk to each other.

Wrong:

CELERY_BROKER_URL = 'redis://localhost:6379/1'

Right:

CELERY_BROKER_URL = 'redis://redis:6379/1'  # 'redis' is the service name

Also, mount volumes consistently or workers can't access uploaded files. Both web and worker containers need the same volume mounts for media files.

Performance Reality Check

Before: Report generation blocked web requests for 30+ seconds, users got timeout errors, server CPU spiked to 90%+ during peak hours.

After: Report requests return instantly with "Processing..." message, background workers handle the heavy lifting, web server stays responsive even during export rushes.

Actual numbers from production:

Response time for report requests: 850ms → 45ms
Peak CPU usage: 85% → 35% (work spread over time)
User timeout errors: ~50/day → 0
Concurrent user capacity: roughly doubled

When You Don't Need This

Don't over-engineer simple apps. If your Django app handles basic CRUD operations and nothing takes longer than 200ms, stick with synchronous code.

Skip background tasks for:

Basic blogs, portfolios, simple CMSes
Apps with <1000 daily active users
Operations that complete in under 1 second
Prototypes and MVPs (add complexity later)

The setup overhead isn't worth it unless you're actually hitting performance walls.

Docker Compose Config That Won't Fuck You Over

Took me 3 months to get a Docker setup that doesn't randomly break in production. Here's the docker-compose.yml I actually use - it handles the crashes, memory issues, and networking problems that tutorials skip.

Docker Compose Architecture

Multi-Container Setup: PostgreSQL database, Redis broker, Django web servers, and Celery workers running as separate Docker containers. Check the Docker Compose documentation for configuration reference and multi-container app patterns.

## docker-compose.yml - Production Configuration
version: '3.8'

services:
  # PostgreSQL Database
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: ${DB_NAME:-djangodb}
      POSTGRES_USER: ${DB_USER:-postgres} 
      POSTGRES_PASSWORD: ${DB_PASSWORD:-postgres}
    volumes:
      - postgres_data:/var/lib/postgresql/data/
      - ./backups:/backups
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER:-postgres}"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  # Redis Message Broker & Cache
  redis:
    image: redis:7.4-alpine
    command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
    restart: unless-stopped

  # Django Web Application
  web:
    build: 
      context: .
      dockerfile: Dockerfile
    command: gunicorn core.wsgi:application --bind 0.0.0.0:8000 --workers 3 --worker-class gevent --worker-connections 1000
    environment:
      - DATABASE_URL=postgres://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@db:5432/${DB_NAME:-djangodb}
      - REDIS_URL=redis://redis:6379/0
      - CELERY_BROKER_URL=redis://redis:6379/1
      - CELERY_RESULT_BACKEND=redis://redis:6379/2
      - DEBUG=False
    volumes:
      - ./staticfiles:/app/staticfiles
      - ./media:/app/media
    ports:
      - "8000:8000"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health/"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

  # Celery Worker Pool
  worker:
    build:
      context: .
      dockerfile: Dockerfile
    command: celery -A core worker --loglevel=info --concurrency=4 --max-tasks-per-child=1000
    environment:
      - DATABASE_URL=postgres://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@db:5432/${DB_NAME:-djangodb}
      - REDIS_URL=redis://redis:6379/0
      - CELERY_BROKER_URL=redis://redis:6379/1
      - CELERY_RESULT_BACKEND=redis://redis:6379/2
      - DEBUG=False
    volumes:
      - ./media:/app/media
      - ./logs:/app/logs
    depends_on:
      - db
      - redis
    healthcheck:
      test: ["CMD", "celery", "-A", "core", "inspect", "ping"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    deploy:
      replicas: 2

  # Celery Beat Scheduler
  beat:
    build:
      context: .
      dockerfile: Dockerfile  
    command: celery -A core beat --loglevel=info --scheduler django_celery_beat.schedulers:DatabaseScheduler
    environment:
      - DATABASE_URL=postgres://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@db:5432/${DB_NAME:-djangodb}
      - REDIS_URL=redis://redis:6379/0
      - CELERY_BROKER_URL=redis://redis:6379/1
      - CELERY_RESULT_BACKEND=redis://redis:6379/2
      - DEBUG=False
    volumes:
      - ./logs:/app/logs
    depends_on:
      - db
      - redis
    restart: unless-stopped

  # Celery Flower Monitoring
  flower:
    build:
      context: .
      dockerfile: Dockerfile
    command: celery -A core flower --port=5555 --broker=redis://redis:6379/1
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/1
      - CELERY_RESULT_BACKEND=redis://redis:6379/2
    ports:
      - "5555:5555"
    depends_on:
      - redis
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Why these settings matter:

Redis Persistence: `--appendonly yes` - learned this after Redis crashed and ate 2000 queued tasks
Memory Limits: `--maxmemory 512mb` - prevents Redis from consuming all server memory (happened twice)
Worker Concurrency: `--concurrency=4` - more workers = more database connections = pain
Task Recycling: `--max-tasks-per-child=1000` - workers grow to 800MB without this
Health Checks: Docker will restart broken containers automatically
Database URLs: Environment variables so dev/staging/prod configs don't conflict

Django Configuration Integration

## settings/production.py
import os
from celery import Celery

## Database Configuration
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': os.environ.get('DB_NAME', 'djangodb'),
        'USER': os.environ.get('DB_USER', 'postgres'),
        'PASSWORD': os.environ.get('DB_PASSWORD'),
        'HOST': os.environ.get('DB_HOST', 'db'),
        'PORT': os.environ.get('DB_PORT', '5432'),
        'CONN_MAX_AGE': 60,
    }
}

## Redis Cache Configuration  
CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': os.environ.get('REDIS_URL', 'redis://redis:6379/0'),
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            'IGNORE_EXCEPTIONS': True,  # Graceful degradation if Redis is down
            'CONNECTION_POOL_KWARGS': {
                'max_connections': 50,
                'retry_on_timeout': True,
            }
        }
    }
}

## Celery Configuration
CELERY_BROKER_URL = os.environ.get('CELERY_BROKER_URL', 'redis://redis:6379/1')
CELERY_RESULT_BACKEND = os.environ.get('CELERY_RESULT_BACKEND', 'redis://redis:6379/2')

## Celery Task Settings
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True

## Task Routing
CELERY_TASK_ROUTES = {
    'core.tasks.send_email': {'queue': 'emails'},
    'core.tasks.process_image': {'queue': 'images'},
    'core.tasks.generate_report': {'queue': 'reports'},
}

## Worker Configuration
CELERY_WORKER_PREFETCH_MULTIPLIER = 1  # Prevent worker hoarding
CELERY_TASK_ACKS_LATE = True          # Acknowledge after completion
CELERY_WORKER_MAX_TASKS_PER_CHILD = 1000  # Prevent memory leaks

## Result Backend Settings
CELERY_RESULT_EXPIRES = 3600  # Results expire after 1 hour
CELERY_TASK_RESULT_EXPIRES = 3600

## Error Handling
CELERY_TASK_ANNOTATIONS = {
    '*': {
        'rate_limit': '100/m',
        'time_limit': 300,  # 5 minutes max
        'soft_time_limit': 240,  # 4 minutes warning
    }
}

## Session Storage
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
SESSION_CACHE_ALIAS = 'default'

Dockerfile Optimization

## Dockerfile - Multi-stage production build
FROM python:3.12-slim as base

## Install system dependencies
RUN apt-get update && apt-get install -y \
    postgresql-client \
    curl \
    && rm -rf /var/lib/apt/lists/*

## Create application user
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

## Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

## Copy application code
COPY . .
RUN chown -R appuser:appuser /app

USER appuser

## Health check endpoint
COPY --chown=appuser:appuser healthcheck.py .

## Default command (override in compose)
CMD ["gunicorn", "core.wsgi:application", "--bind", "0.0.0.0:8000"]

Environment Configuration

## .env - Environment Variables
DB_NAME=djangodb_prod
DB_USER=django_user  
DB_PASSWORD=secure_password_here
DB_HOST=db
DB_PORT=5432

REDIS_URL=redis://redis:6379/0
CELERY_BROKER_URL=redis://redis:6379/1
CELERY_RESULT_BACKEND=redis://redis:6379/2

DEBUG=False
SECRET_KEY=your-secret-key-here
ALLOWED_HOSTS=localhost,127.0.0.1,yourdomain.com

## Production Settings
GUNICORN_WORKERS=3
CELERY_WORKER_CONCURRENCY=4
REDIS_MAXMEMORY=512mb

Scaling Configuration

Horizontal Scaling Commands:

## Scale workers during high load
docker-compose up --scale worker=5

## Scale web servers  
docker-compose up --scale web=3

## Monitor resource usage
docker stats

Scaling reality:

Start Small: 1 web, 2 workers - see what breaks first
Scale Workers First: Usually workers are the bottleneck, not web servers
Monitor Queue Depth: If queue has 100+ tasks for more than 5 minutes, add workers
Resource Limits: Without limits, one container can starve others (learned this the hard way)

Docker Compose Commands for Development

## Initial setup
docker-compose up --build -d

## Run migrations
docker-compose exec web python manage.py migrate

## Create superuser
docker-compose exec web python manage.py createsuperuser

## View logs
docker-compose logs -f worker
docker-compose logs -f beat

## Access Django shell
docker-compose exec web python manage.py shell

## Monitor Celery with Flower
## After starting services, visit localhost:5555 in your browser (or your-domain:5555 in production)

## Stop all services
docker-compose down

## Clean rebuild
docker-compose down --volumes
docker-compose up --build

This Docker setup works for our 20k daily active users. It's not perfect but it doesn't randomly break at 3am anymore.

Message Broker Reality Check - What Actually Happens

Feature	Redis	RabbitMQ	Amazon SQS	PostgreSQL
Setup Pain Level	Easy (if you know Redis)	Fuck this, so much config	Zero setup, costs money	Already there
Performance	Fast enough for us	Probably faster, didn't test	Slow API limits	Definitely too slow
Memory Usage	~200MB for our workload	Haven't run it long enough	N/A (AWS problem)	Grows forever if you're not careful
Message Durability	Lost messages twice during crashes	Should be better (theory)	AWS promises it works	PostgreSQL is rock solid
Routing	We just use default queue	Looks complicated	Basic queues work fine	No routing
Scaling	Works until it doesn't	Never tried clustering	Scales itself (costs $$)	Don't even think about it
Monitoring	`redis-cli` and pray	Management UI is nice	CloudWatch (more $$$)	Database logs
Monthly Cost	~$15 (t3.small instance)	~$25 if we switched	Depends on usage	~$12 (already paying)
Production Failures	2 Redis crashes in 8 months	Haven't used in prod	SQS has been solid	DB queue was disaster
Django Integration	Share Redis with cache/sessions	Need separate service	Boto3 dependency hell	ORM queries everywhere

Production Deployment - What Actually Works

We've been running this Django/Celery setup in production for 8 months. Started with 2 containers, now run 12 during peak hours. Here's what I wish someone told me before I deployed this shit.

Multi-Environment Docker Deployment

Development Environment

## docker-compose.dev.yml
version: '3.8'
services:
  web:
    build: .
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/app  # Hot reload for development
    environment:
      - DEBUG=True
      - CELERY_TASK_ALWAYS_EAGER=True  # Synchronous for debugging

  worker:
    command: celery -A core worker --loglevel=debug --concurrency=2
    volumes:
      - .:/app

  redis:
    command: redis-server --appendonly no  # No persistence needed

Staging Environment

## docker-compose.staging.yml  
version: '3.8'
services:
  web:
    image: your-registry.com/app:${VERSION}
    command: gunicorn core.wsgi --bind 0.0.0.0:8000 --workers 2
    environment:
      - DEBUG=False
      - CELERY_BROKER_URL=redis://redis:6379/1

  worker:
    image: your-registry.com/app:${VERSION}
    command: celery -A core worker --loglevel=info --concurrency=2
    deploy:
      replicas: 2

  redis:
    command: redis-server --appendonly yes --maxmemory 256mb

Production Environment

## docker-compose.prod.yml
version: '3.8'
services:
  web:
    image: your-registry.com/app:${VERSION}
    command: gunicorn core.wsgi --bind 0.0.0.0:8000 --workers 4 --worker-class gevent
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M

  worker:
    image: your-registry.com/app:${VERSION}
    command: celery -A core worker --loglevel=warning --concurrency=4 --max-tasks-per-child=500
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: '2.0'
          memory: 2G

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 1gb --maxmemory-policy volatile-lru
    volumes:
      - redis_prod:/data
    deploy:
      resources:
        limits:
          memory: 1.2G

Kubernetes Deployment Pattern

Kubernetes Architecture

## k8s/celery-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-workers
spec:
  replicas: 5
  selector:
    matchLabels:
      app: celery-worker
  template:
    metadata:
      labels:
        app: celery-worker
    spec:
      containers:
      - name: worker
        image: your-registry.com/app:latest
        command: ["celery", "-A", "core", "worker", "--loglevel=info", "--concurrency=4"]
        env:
        - name: CELERY_BROKER_URL
          valueFrom:
            secretKeyRef:
              name: redis-credentials
              key: broker-url
        resources:
          limits:
            cpu: "2000m"
            memory: "2Gi"
          requests:
            cpu: "500m" 
            memory: "512Mi"
        livenessProbe:
          exec:
            command:
            - celery
            - -A
            - core
            - inspect
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - celery
            - -A 
            - core
            - inspect
            - active
          initialDelaySeconds: 5
          periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
  name: celery-flower
spec:
  selector:
    app: celery-flower
  ports:
  - port: 5555
    targetPort: 5555
  type: LoadBalancer

Horizontal Pod Autoscaler (HPA)

## k8s/celery-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: celery-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: celery-workers
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: redis_queue_length
        selector:
          matchLabels:
            queue: "default"
      target:
        type: AverageValue
        averageValue: "100"  # Scale up when queue length > 100 per worker
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Advanced Scaling Strategies

Queue-Based Scaling

Monitor queue depth and scale workers automatically:

## monitoring/queue_monitor.py
import redis
import subprocess
from celery import Celery

def get_queue_length(queue_name='default'):
    r = redis.Redis(host='redis', port=6379, db=1)
    return r.llen(f'celery:{queue_name}')

def scale_workers(queue_length):
    if queue_length > 500:  # Heavy load
        subprocess.run(['docker-compose', 'up', '--scale', 'worker=10', '-d'])
    elif queue_length > 100:  # Medium load
        subprocess.run(['docker-compose', 'up', '--scale', 'worker=5', '-d'])
    elif queue_length < 10:  # Light load
        subprocess.run(['docker-compose', 'up', '--scale', 'worker=2', '-d'])

Specialized Worker Pools

Different queues for different task types:

## docker-compose.specialized.yml
services:
  # Fast tasks (email, notifications)
  worker-fast:
    command: celery -A core worker -Q fast --loglevel=info --concurrency=8
    deploy:
      replicas: 3

  # CPU intensive tasks (image processing) 
  worker-cpu:
    command: celery -A core worker -Q cpu_intensive --loglevel=info --concurrency=2
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '4.0'

  # IO intensive tasks (file downloads, API calls)
  worker-io:
    command: celery -A core worker -Q io_intensive --loglevel=info --concurrency=20
    deploy:
      replicas: 4

Task routing configuration:

## settings/production.py
CELERY_TASK_ROUTES = {
    'core.tasks.send_email': {'queue': 'fast'},
    'core.tasks.send_notification': {'queue': 'fast'},
    'core.tasks.process_image': {'queue': 'cpu_intensive'},
    'core.tasks.resize_video': {'queue': 'cpu_intensive'},
    'core.tasks.download_file': {'queue': 'io_intensive'},
    'core.tasks.call_external_api': {'queue': 'io_intensive'},
}

Multi-Region Deployment

Global distributed task processing:

## Region-specific Redis clusters
services:
  # US East Redis
  redis-us-east:
    image: redis:7-alpine
    command: redis-server --port 6379 --cluster-enabled yes

  # EU West Redis  
  redis-eu-west:
    image: redis:7-alpine
    command: redis-server --port 6379 --cluster-enabled yes

  # Workers route to local Redis
  worker-us:
    environment:
      - CELERY_BROKER_URL=redis://redis-us-east:6379/1
      - REGION=us-east-1

  worker-eu:
    environment:
      - CELERY_BROKER_URL=redis://redis-eu-west:6379/1  
      - REGION=eu-west-1

High Availability Patterns

Redis Sentinel Configuration

## docker-compose.ha.yml - Redis HA setup
services:
  redis-master:
    image: redis:7-alpine
    command: redis-server --appendonly yes --replica-announce-ip redis-master

  redis-replica-1:
    image: redis:7-alpine
    command: redis-server --appendonly yes --replicaof redis-master 6379

  redis-replica-2:
    image: redis:7-alpine
    command: redis-server --appendonly yes --replicaof redis-master 6379

  redis-sentinel-1:
    image: redis:7-alpine
    command: redis-sentinel /etc/redis/sentinel.conf
    volumes:
      - ./sentinel.conf:/etc/redis/sentinel.conf

  worker:
    environment:
      - CELERY_BROKER_URL=sentinel://redis-sentinel-1:26379;sentinel://redis-sentinel-2:26379;sentinel://redis-sentinel-3:26379/mymaster

Database Connection Pooling

## settings/production.py - Production database config
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': os.environ.get('DB_NAME'),
        'USER': os.environ.get('DB_USER'),
        'PASSWORD': os.environ.get('DB_PASSWORD'), 
        'HOST': os.environ.get('DB_HOST'),
        'PORT': os.environ.get('DB_PORT', '5432'),
        'CONN_MAX_AGE': 300,  # Connection pooling
        'OPTIONS': {
            'MAX_CONNS': 20,   # Max connections per worker
            'MIN_CONNS': 5,    # Min connections maintained
        }
    }
}

## Celery database connection settings
CELERY_DATABASE_ENGINE_OPTIONS = {
    'echo': False,
    'pool_recycle': 3600,
    'pool_pre_ping': True,
}

Monitoring and Alerting

Monitoring Stack: Grafana dashboards for Redis metrics, Prometheus for data collection, and custom alerting for queue depth and worker health.

Prometheus Metrics

## monitoring/prometheus.yml
version: '3.8'
services:
  redis-exporter:
    image: oliver006/redis_exporter
    environment:
      - REDIS_ADDR=redis://redis:6379
    ports:
      - "9121:9121"

  celery-exporter:
    image: danihodovic/celery-exporter
    environment:
      - CELERY_BROKER_URL=redis://redis:6379/1
    ports:
      - "9540:9540"

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

Grafana Dashboard Queries

## Key metrics to monitor
queries:
  - queue_length: redis_list_length{instance="redis:6379"}
  - worker_active_tasks: celery_worker_tasks_active
  - task_failure_rate: rate(celery_task_total{state="FAILURE"}[5m])
  - worker_memory_usage: container_memory_usage_bytes{name=~".*worker.*"}
  - redis_memory_usage: redis_memory_used_bytes

Zero-Downtime Deployment

Rolling Deployment Pattern: Update workers first, then web servers, with health checks ensuring zero downtime during deployments.

#!/bin/bash
## deploy.sh - Rolling deployment script

## 1. Build new image
docker build -t your-app:${NEW_VERSION} .

## 2. Update workers gradually (drain existing tasks)
docker-compose exec worker celery -A core control cancel_consumer default
sleep 30  # Wait for current tasks to finish

## 3. Scale down old workers, scale up new ones
docker-compose up --scale worker-old=0 --scale worker-new=5 -d

## 4. Update web servers with rolling restart
for i in {1..3}; do
  docker-compose up --scale web=2 -d  # Reduce capacity
  docker-compose exec web-$i supervisorctl restart gunicorn
  sleep 10  # Health check time
  docker-compose up --scale web=3 -d  # Restore capacity
done

## 5. Verify deployment (replace localhost with your actual domain)
curl -f http://your-domain.com/health/ || exit 1  # Replace with your actual domain
docker-compose exec worker celery -A core inspect active || exit 1

Production Deployment Checklist:
✅ Health checks configured for all services
✅ Resource limits prevent container OOM kills
✅ Redis persistence enabled (AOF + RDB)
✅ Database connection pooling configured
✅ Log aggregation (ELK, Splunk, or CloudWatch)
✅ Monitoring alerts for queue depth, task failures, worker crashes
✅ Backup strategy for Redis data and PostgreSQL
✅ SSL/TLS certificates for production domains
✅ Security scanning of container images

This production deployment foundation scales from small teams to enterprise systems. The next critical component is implementing robust task patterns with proper error handling and retry logic.

Docker - Django, Celery & Redis Docker Compose setup by Very Academy

## Docker Django Celery Redis Setup Tutorial

This 20-minute video from Very Academy walks through setting up Django, Celery, Redis, and PostgreSQL with Docker Compose - exactly the stack we're discussing.

Watch: Docker - Django, Celery & Redis Docker Compose setup

Key timestamps:
- 0:00 - Project structure and requirements setup
- 3:15 - Docker container configuration
- 8:30 - Celery worker and beat setup
- 12:45 - Redis message broker configuration
- 16:20 - Testing the complete integration

Why this video helps: Shows the actual docker-compose.yml file structure, demonstrates real Celery task execution, and covers the Redis networking configuration that trips up most people. The presenter explains the "why" behind each configuration choice instead of just copy-pasting code.

Bonus: Uses the same Redis + Docker approach covered in our implementation guide, so the setup transfers directly to your production environment.

📺 YouTube

Shit That Will Break Your Celery Setup

Docker networking fuckery - workers can't find Redis

What happens: Workers start up, immediately crash with "Connection refused" or "Name resolution failed". Everything works fine on your laptop.Why it's broken: You used localhost:6379 in your Django settings because that's what every tutorial shows. Docker containers can't talk to localhost - they need service names.Fix:

## This breaks in Docker (but works locally)
CELERY_BROKER_URL = 'redis://localhost:6379/1'

## This actually works
CELERY_BROKER_URL = 'redis://redis:6379/1'  # 'redis' = service name in docker-compose

Time wasted: 4 hours the first time, 30 minutes every time after when I forgot

Tasks run synchronously and defeat the whole fucking point

What happens: You queue a task, it executes immediately in the web process instead of background worker. Defeats the entire purpose of using Celery.Why it's broken: Someone set CELERY_TASK_ALWAYS_EAGER = True in Django settings. This makes tasks execute synchronously for "easier debugging" but everyone forgets to turn it off.Fix:

## In settings.py - make sure this is False (or remove it entirely)
CELERY_TASK_ALWAYS_EAGER = False

How to check: Queue a slow task (like time.sleep(10)). If your web request blocks for 10 seconds, eager mode is on.Time wasted: 2 hours wondering why performance didn't improve

Redis keeps crashing and eating your queued tasks

What happens: Redis container randomly restarts, all queued tasks vanish, workers can't connect, everything stops working.Why it's broken: Default Redis config doesn't persist to disk. When container restarts (OOM kill, deployment, AWS maintenance), everything in memory disappears.Fix that actually works:

redis:
  image: redis:7-alpine
  command: redis-server --appendonly yes --save 60 1000 --maxmemory 400m --maxmemory-policy allkeys-lru
  volumes:
    - redis_data:/data
  restart: unless-stopped

What this does:

--appendonly yes: Writes every command to disk (AOF persistence)
--save 60 1000: Backup to disk every 60 seconds if 1000+ keys changed
--maxmemory 400m: Prevents Redis from using unlimited memory
--maxmemory-policy allkeys-lru: Evicts oldest keys when memory full

Still having crashes?: Check Docker logs - probably memory limits. Redis default is unlimited memory which triggers OOMKiller.

Workers grow like cancer and get OOM killed

What happens: Workers start at 100MB, grow to 800MB over a few days, then Docker kills them with "Memory cgroup out of memory". Tasks start failing randomly.Why it's broken: Python garbage collection isn't perfect. Workers accumulate memory leaks, especially with image processing, file handling, or database connections that don't close properly.Fixes that work:

## Force worker recycling - nuclear option but it works
CELERY_WORKER_MAX_TASKS_PER_CHILD = 500  # Kill worker after 500 tasks

## Don't hoard tasks in memory
CELERY_WORKER_PREFETCH_MULTIPLIER = 1    # Only grab one task at a time

## Close database connections properly  
CELERY_TASK_ACKS_LATE = True             # Acknowledge only after completion

Docker memory limits (be generous or workers die randomly):

worker:
  deploy:
    resources:
      limits:
        memory: 1.5G  # Generous limit
      reservations:
        memory: 512M

Debug memory usage: docker stats to watch memory grow, docker exec worker ps aux to see individual processes

ImportError hell - tasks can't find your Django models

What happens: Workers start fine, but tasks fail with ImportError: No module named 'myapp.models' or django.core.exceptions.ImproperlyConfigured.Why it's broken: Worker containers have different PYTHONPATH or DJANGO_SETTINGS_MODULE than web containers. Python can't find your Django app code.Fix: Make sure both web and worker containers have identical environment:

## celery.py - this file needs to be identical in both containers
import os
from celery import Celery

## Same settings module as web container
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')

app = Celery('myproject')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

Dockerfile (must be identical for web and worker):

WORKDIR /app
COPY . /app/
ENV PYTHONPATH=/app
ENV DJANGO_SETTINGS_MODULE=myproject.settings

Debug this: docker exec worker python -c "import myapp.models; print('works')" - should not fail

Nuclear option: If imports still break, add this to your task files:

import sys
sys.path.append('/app')

PostgreSQL "too many connections" nightmare

What happens: App works fine, you add 3 more Celery workers, suddenly PostgreSQL starts rejecting connections with "FATAL: too many connections for role".Why it's broken: PostgreSQL defaults to 100 connections total. Your web app uses 8 workers × 5 connections = 40. Add 6 Celery workers × 5 connections = 30 more. Plus monitoring, migrations, admin users. Hits limit fast.Fixes:

Increase PostgreSQL connection limit (if you control the DB):

ALTER SYSTEM SET max_connections = 200;
SELECT pg_reload_conf();

Limit connections per Django process:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'CONN_MAX_AGE': 300,  # Reuse connections for 5 minutes
        'OPTIONS': {
            'MAX_CONNS': 4,   # Max 4 connections per worker
        }
    }
}

Close database connections in tasks (nuclear option):

from django.db import connections

@shared_task
def some_task():
    # Do work
    result = process_data()
    
    # Force close all connections
    connections.close_all()
    return result

Tasks stuck in PENDING forever (the silent killer)

What happens: Tasks show as queued in monitoring, but never execute. No error messages, just infinite waiting.Why it's broken: Workers aren't registered for the right queues, or task routing is fucked up.Debug commands that actually help:

## Are workers alive?
docker exec worker celery -A myproject inspect ping

## What queues do workers listen to?
docker exec worker celery -A myproject inspect active_queues

## Any tasks actually running?
docker exec worker celery -A myproject inspect active

Common cause: Task routing config doesn't match worker queues:

## You routed tasks to 'emails' queue
CELERY_TASK_ROUTES = {
    'myapp.tasks.send_email': {'queue': 'emails'},
}

## But workers only listen to 'default' queue
## Fix: start workers with: celery -A myproject worker -Q default,emails

Memory death spiral - Redis grows until server dies

What happens: Redis memory usage grows from 100MB to 2GB+ over weeks, eventually triggers OOM killer, everything stops working.Why it's broken: Celery result storage never expires, even with TTL settings. Results accumulate forever.Fixes that work:

## Don't store results if you don't need them
CELERY_TASK_IGNORE_RESULT = True

## Or expire results aggressively  
CELERY_RESULT_EXPIRES = 300  # 5 minutes
CELERY_TASK_RESULT_EXPIRES = 300

## Use separate Redis DB for results (can flush without losing queues)
CELERY_BROKER_URL = 'redis://redis:6379/0'      # Queue storage
CELERY_RESULT_BACKEND = 'redis://redis:6379/1'  # Result storage

Monitor Redis memory: redis-cli info memory shows used memory, keys count

Essential Resources for Redis + Django + Celery + Docker Integration

30%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation

What Goes Wrong With Synchronous Django

Why Django Async Views Don't Fix This

The Background Task Solution That Actually Works

Why Redis Instead of RabbitMQ

Celery Integration Hell

Docker Networking Pain Points

Performance Reality Check

When You Don't Need This

Docker Compose Architecture

Django Configuration Integration

Dockerfile Optimization

Environment Configuration

Scaling Configuration

Docker Compose Commands for Development

Multi-Environment Docker Deployment

Development Environment

Staging Environment

Production Environment

Kubernetes Deployment Pattern

Horizontal Pod Autoscaler (HPA)

Advanced Scaling Strategies

Queue-Based Scaling

Specialized Worker Pools

Multi-Region Deployment

High Availability Patterns

Redis Sentinel Configuration

Database Connection Pooling

Monitoring and Alerting

Prometheus Metrics

Grafana Dashboard Queries

Zero-Downtime Deployment

Docker networking fuckery - workers can't find Redis

Tasks run synchronously and defeat the whole fucking point

Redis keeps crashing and eating your queued tasks

Workers grow like cancer and get OOM killed

ImportError hell - tasks can't find your Django models

PostgreSQL "too many connections" nightmare

Tasks stuck in PENDING forever (the silent killer)

Memory death spiral - Redis grows until server dies

Related Tools & Recommendations

Redis vs Memcached vs Hazelcast: Caching Decision Guide

Deploy Django with Docker Compose - Complete Production Guide

Django: Python's Web Framework for Perfectionists

Redis Alternatives: High-Performance In-Memory Databases

Redis Overview: In-Memory Database, Caching & Getting Started

Celery: Python Task Queue for Background Jobs & Async Tasks

Jenkins + Docker + Kubernetes: How to Deploy Without Breaking Production (Usually)

Django Troubleshooting Guide: Fix Production Errors & Debug

FastAPI - High-Performance Python API Framework

Django Production Deployment Guide: Docker, Security, Monitoring

Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)

Fix Kubernetes Service Not Accessible - Stop the 503 Hell

Express.js Production Guide: Optimize Performance & Prevent Crashes

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

SonarQube Review - Comprehensive Analysis & Real-World Assessment

FastAPI Production Deployment - What Actually Works

Claude API + FastAPI Integration: The Real Implementation Guide

Docker Won't Start on Windows 11? Here's How to Fix That Garbage

Stop Docker from Killing Your Containers at Random (Exit Code 137 Is Not Your Friend)

Docker Desktop's Stupidly Simple Container Escape Just Owned Everyone