Django Troubleshooting Guide - Fixing Production Disasters at 3 AM

Critical Django Production Failures (The Ones That Wake You Up)

Why does my Django app return "Bad Gateway" errors?

99% of the time it's Gunicorn workers dying due to memory exhaustion. Check dmesg | grep -i "killed process" for OOMKilled processes. Set memory limits in systemd and monitor with htop. The fix: reduce Gunicorn workers or increase server RAM.

Django admin login always shows "CSRF verification failed"

CSRF_TRUSTED_ORIGINS is misconfigured in Django 5.2+. Add your domain: CSRF_TRUSTED_ORIGINS = ['https://yourdomain.com', 'https://www.yourdomain.com']. Also check ALLOWED_HOSTS includes your domain. Without this, CSRF tokens fail validation.

"OperationalError: database is locked" on SQLite production

You're using SQLite in production with concurrent writes. SQLite locks the entire database for writes. Switch to PostgreSQL or MySQL immediately. No workaround exists for high-concurrency SQLite.

Django migrations fail with "column already exists"

Migration state is out of sync with actual database schema. Run python manage.py migrate --fake-initial to mark initial migrations as applied, then python manage.py migrate to apply remaining changes. Check migration conflicts.

Static files return 404 in production with DEBUG=False

STATIC_ROOT not configured or collectstatic not run. Set STATIC_ROOT = '/var/www/static/' and run python manage.py collectstatic. Configure web server to serve static files directly, not Django. Check static files deployment.

Django app consumes 100% CPU constantly

Infinite loop in views or bad database queries causing full table scans. Enable Django Debug Toolbar locally to inspect queries. In production, check SHOW PROCESSLIST on MySQL or pg_stat_activity on PostgreSQL for long-running queries.

"IntegrityError: duplicate key value violates unique constraint"

Race condition in database writes or broken auto-increment sequences. Use get_or_create() for atomic operations. For PostgreSQL, reset sequence: SELECT setval('table_id_seq', (SELECT MAX(id) FROM table));

Django template rendering extremely slow

Template inheritance loops or missing template caching. Check for circular extends. Enable template caching: TEMPLATES[0]['OPTIONS']['loaders'] = [('django.template.loaders.cached.Loader', [...])].

Celery tasks fail with "No module named" errors

Python path differences between Django and Celery workers. Ensure DJANGO_SETTINGS_MODULE is set for Celery. Use absolute imports in tasks. Check if virtual environment is activated for worker processes.

Django sessions expire immediately

SESSION_COOKIE_AGE set to 0 or SESSION_EXPIRE_AT_BROWSER_CLOSE enabled. Check session backend configuration. For Redis sessions, verify Redis server is running and accessible from Django app.

"KeyError" with composite primary keys in Django 5.2

Composite primary keys were added in Django 5.0 but still have edge cases.

If you get `Key

Error` during queries, check your get_or_create() calls

they need both keys specified.

The GitHub issue tracker has ongoing fixes for composite key edge cases.

Async views break with "SynchronousOnlyOperation" errors

Mixing async views with sync database calls causes deadlocks.

Use `database_sync_to_async` for ORM calls or `sync_to_async` for other operations.

The async documentation explains the gotchas

ignore it at your peril.

Memory Leaks: The Silent App Killers

Django memory leaks don't announce themselves - they slowly consume RAM until your server runs out and the kernel starts killing processes. Here's how to catch them before they kill your app, using memory profiling tools, production monitoring strategies, Django memory debugging, memory leak detection patterns, garbage collection analysis, heap profiling techniques, Django queryset optimization, ORM memory patterns, tracemalloc integration, and production debugging strategies.

The Django Memory Leak You'll Actually Encounter

Forget the textbook examples. Here's the memory leak that brought down our production API serving 2 million requests per day:

## views.py - This innocent code consumed 8GB RAM in 6 hours
def export_users(request):
    \"\"\"Export all users to CSV - looks harmless, right?\"\"\"
    users = User.objects.all()  # Don't fucking do this
    
    response = HttpResponse(content_type='text/csv')
    writer = csv.writer(response)
    
    for user in users:  # Memory leak: entire queryset loaded into RAM
        writer.writerow([user.email, user.created_at])
    
    return response

What happened: User.objects.all() loads every user into memory at once. With 500,000 users, that's 4-6GB of RAM. The server had 8GB total, so this single request killed everything else running.

The fix that actually works:

## views.py - Memory-efficient version that scales
def export_users(request):
    \"\"\"Export users without killing the server\"\"\"
    response = HttpResponse(content_type='text/csv')
    writer = csv.writer(response)
    
    # Iterator doesn't load everything into memory
    users = User.objects.all().iterator(chunk_size=1000)
    
    for user in users:
        writer.writerow([user.email, user.created_at])
    
    return response

The iterator() method with chunk_size=1000 processes 1000 records at a time, keeping memory usage constant regardless of dataset size.

QuerySet Memory Traps That Kill Production

Django's ORM makes it easy to write queries that consume ridiculous amounts of memory. Here are the patterns that will fuck you over:

The Prefetch That Ate Everything

## BAD - Loads 10,000 comments into memory per article
articles = Article.objects.prefetch_related('comments').all()
for article in articles:
    print(f\"{article.title}: {article.comments.count()} comments\")

If you have 100 articles with 10,000 comments each, you just loaded 1 million comment objects into RAM. The solution is Prefetch objects with limits:

## GOOD - Limits memory usage with selective prefetching
from django.db.models import Prefetch

articles = Article.objects.prefetch_related(
    Prefetch(
        'comments',
        queryset=Comment.objects.order_by('-created_at')[:10],
        to_attr='recent_comments'
    )
).all()

for article in articles:
    print(f\"{article.title}: {len(article.recent_comments)} recent comments\")

The Innocent Aggregate That Wasn't

## This looks safe but it's not
user_stats = User.objects.annotate(
    total_orders=Count('orders'),
    total_spent=Sum('orders__total')
).all()

## Memory explodes when you iterate
for user in user_stats:  # Loads entire result set
    print(f\"{user.email}: ${user.total_spent}\")

Annotations can create massive result sets. Use iterator() for large annotated queries:

## Safe version that won't kill your server
user_stats = User.objects.annotate(
    total_orders=Count('orders'),
    total_spent=Sum('orders__total')
).iterator(chunk_size=500)

for user in user_stats:
    print(f\"{user.email}: ${user.total_spent}\")

Production Memory Monitoring That Actually Helps

Don't wait for your app to crash. Here's monitoring that catches memory issues early:

## Memory monitoring middleware for production
import tracemalloc
import psutil
import logging

class MemoryMonitoringMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        tracemalloc.start()  # Enable memory tracking
        self.logger = logging.getLogger(__name__)
        
    def __call__(self, request):
        # Snapshot memory before request
        snapshot_before = tracemalloc.take_snapshot()
        process = psutil.Process()
        memory_before = process.memory_info().rss
        
        response = self.get_response(request)
        
        # Check memory after request
        snapshot_after = tracemalloc.take_snapshot()
        memory_after = process.memory_info().rss
        
        memory_delta = (memory_after - memory_before) / 1024 / 1024  # MB
        
        # Alert on suspicious memory usage
        if memory_delta > 100:  # More than 100MB per request
            top_stats = snapshot_after.compare_to(snapshot_before, 'lineno')
            
            leak_info = []
            for stat in top_stats[:5]:
                size_mb = stat.size_diff / 1024 / 1024
                leak_info.append(f\"  {size_mb:.1f}MB: {stat.traceback.format()[-1]}\")
            
            self.logger.error(
                f\"Memory leak detected: {request.path} used {memory_delta:.1f}MB
\"
                f\"Top allocations:
\" + \"
\".join(leak_info)
            )
        
        # Track memory trends
        if hasattr(request, 'user') and request.user.is_authenticated:
            cache_key = f\"memory_usage_{request.user.id}_{request.path}\"
            # Store memory usage for trend analysis
            cache.set(cache_key, memory_delta, 300)  # 5 minutes
            
        return response

This catches memory leaks in real-time and logs the exact code causing the problem. Install it in MIDDLEWARE and check your logs for leak alerts.

Garbage Collection Tuning for Django

Python's garbage collector can cause performance issues with large Django applications. Here's tuning that actually improves performance:

## settings.py - GC tuning for production Django
import gc

## Reduce GC frequency for generation 0 (helps with high request volumes)
gc.set_threshold(700, 10, 10)  # Default is (700, 10, 10)

## For memory-intensive apps, tune more aggressively
if os.environ.get('DJANGO_ENV') == 'production':
    gc.set_threshold(1500, 15, 15)  # Less frequent GC, better performance

Monitor GC performance in production:

## Add to your monitoring
import gc

def log_gc_stats():
    \"\"\"Log garbage collection statistics\"\"\"
    stats = gc.get_stats()
    logger.info(f\"GC Stats: {stats}\")
    
    # Check for excessive garbage collection
    if stats[0]['collections'] > 10000:  # Adjust threshold based on your app
        logger.warning(\"High GC activity detected - potential memory leak\")

The Nuclear Option: Memory Leak Detection in Production

When everything else fails, here's the code that saved our production environment. It identified a Celery task that was leaking 200MB per execution:

## Advanced memory leak detector - use only when desperate
import objgraph
import tracemalloc
from collections import defaultdict

class MemoryLeakHunter:
    def __init__(self):
        self.baseline = None
        self.request_count = 0
        
    def take_baseline(self):
        \"\"\"Call this after app warmup\"\"\"
        tracemalloc.start()
        self.baseline = tracemalloc.take_snapshot()
        logger.info(\"Memory baseline established\")
        
    def hunt_leaks(self, request):
        \"\"\"Call this periodically to hunt for leaks\"\"\"
        self.request_count += 1
        
        if self.request_count % 100 == 0:  # Check every 100 requests
            current = tracemalloc.take_snapshot()
            
            if self.baseline:
                top_stats = current.compare_to(self.baseline, 'lineno')
                
                # Log top memory growth
                logger.info(\"Top memory growth since baseline:\")
                for stat in top_stats[:10]:
                    size_mb = stat.size_diff / 1024 / 1024
                    logger.info(f\"  +{size_mb:.1f}MB: {stat.traceback.format()[-1]}\")
                
                # Check for concerning growth patterns
                total_growth = sum(stat.size_diff for stat in top_stats) / 1024 / 1024
                if total_growth > 500:  # More than 500MB growth
                    logger.error(f\"MEMORY LEAK DETECTED: {total_growth:.1f}MB growth\")
                    
                    # Generate object growth report
                    objgraph.show_growth(limit=10)

This saved us when a third-party library was leaking Django model instances. The logs showed exactly which code was causing the growth, making it easy to isolate and fix.

Memory leaks in Django aren't mysterious - they're usually QuerySets loading too much data, circular references in model relationships, or third-party libraries not cleaning up properly. The key is monitoring memory usage per request and investigating anything that grows over time.

Database & Performance Disasters

Django app randomly returns "Connection refused" to database

Database connection pool exhausted due to connections not being closed properly. Check CONN_MAX_AGE setting

if too high, connections never close. Set to 0 for development, 60-300 for production. Monitor active connections with SELECT * FROM pg_stat_activity (PostgreSQL) or SHOW PROCESSLIST (MySQL).

"This field cannot be blank" on required=False fields

Form validation vs model validation mismatch. Model field has blank=False but form field has required=False. Both must match: either blank=True, null=True on model AND required=False on form, or both required. Check field validation rules.

Django queries take 30+ seconds in production but fast locally

Missing database indexes on foreign keys or filtered fields. Run python manage.py dbshell then EXPLAIN ANALYZE your slow queries. Look for "Seq Scan" (table scans). Add indexes: class Meta: indexes = [models.Index(fields=['field_name'])]. Check database optimization guide.

"BrokenPipeError: [Errno 32] Broken pipe" during file uploads

Client disconnected during upload or file size exceeds limits. Check FILE_UPLOAD_MAX_MEMORY_SIZE and web server upload limits (nginx client_max_body_size). Handle with try/except around file processing and validate file sizes before processing.

Django admin shows "Programming Error: relation does not exist"

Migrations not applied to database or migration state inconsistency. Run python manage.py showmigrations to see unapplied migrations. Apply with python manage.py migrate. If migration conflicts exist, resolve with merge migrations.

Celery tasks fail with "django.core.exceptions.ImproperlyConfigured"

Django not initialized in Celery worker process. Add to your celery.py: import django; django.setup() before importing tasks. Set DJANGO_SETTINGS_MODULE environment variable for workers. Check Celery Django integration.

"DisallowedHost at /" error in production

ALLOWED_HOSTS doesn't include your domain. Add your production domains: ALLOWED_HOSTS = ['yourdomain.com', 'www.yourdomain.com']. For load balancers, include internal IPs. Check HTTP_HOST header in request to see what Django is receiving.

Django tests pass but app fails with ImportError

Python path differences between test and runtime environments. Tests run from project root, but production may run from different directory. Use absolute imports and ensure PYTHONPATH includes your project directory. Check if virtual environment is activated.

"OSError: [Errno 28] No space left on device"

Server disk full from log files, media uploads, or database growth. Check disk usage with df -h and large files with du -sh *. Implement log rotation, media cleanup policies, and database maintenance. Monitor disk space in production monitoring.

Django forms return "ManagementForm data is missing"

Formset management form not included in template or CSRF token issues. Include {{ formset.management_form }} in template before rendering formset forms. Check for JavaScript that modifies form without updating management form data. Use formset validation.

Django Error Categories & Debugging Approaches

Error Type	Typical Symptoms	Primary Cause	Debug Strategy	Production Impact
Memory Leaks	Gradual RAM increase, OOMKilled processes	Queryset.all() loading large datasets, circular references	tracemalloc, memory profiling middleware	Server crashes, cascade failures
Database Deadlocks	"Lock wait timeout exceeded", hanging requests	Concurrent transactions, poor query patterns	SHOW ENGINE INNODB STATUS, query logging	Request timeouts, user-facing errors
Static File 404s	Missing CSS/JS, broken images	STATIC_ROOT misconfigured, collectstatic not run	Check STATIC_URL, web server config	Broken user experience
CSRF Failures	Form submissions rejected, "CSRF verification failed"	CSRF_TRUSTED_ORIGINS missing, token mismatch	Browser dev tools, Django debug page	Users can't submit forms
Migration Conflicts	"Migration not applied", schema inconsistency	Parallel development, manual DB changes	showmigrations, --fake flags	Database corruption, app startup failures
Template Errors	"TemplateDoesNotExist", rendering failures	Path issues, inheritance loops	Template debug mode, TEMPLATES setting	Blank pages, 500 errors
Import Errors	"No module named", AttributeError on startup	PYTHONPATH issues, missing dependencies	sys.path inspection, pip freeze	App won't start
Session Issues	Users logged out randomly, session data lost	SESSION_ENGINE misconfigured, Redis connectivity	Session backend logs, Redis monitoring	User logout loops, lost shopping carts
File Upload Failures	"Connection reset by peer", upload timeouts	Size limits, permission issues, disk space	Web server logs, disk usage monitoring	Users can't upload content
Celery Task Failures	Tasks stuck in PENDING, worker crashes	Django not initialized, memory exhaustion	Celery logs, worker monitoring	Background jobs don't complete

Quick Navigation

Why does my Django app return "Bad Gateway" errors?

Django admin login always shows "CSRF verification failed"

"OperationalError: database is locked" on SQLite production

Django migrations fail with "column already exists"

Static files return 404 in production with DEBUG=False

Django app consumes 100% CPU constantly

"IntegrityError: duplicate key value violates unique constraint"

Django template rendering extremely slow

Celery tasks fail with "No module named" errors

Django sessions expire immediately

"KeyError" with composite primary keys in Django 5.2

Async views break with "SynchronousOnlyOperation" errors

The Django Memory Leak You'll Actually Encounter

QuerySet Memory Traps That Kill Production

The Prefetch That Ate Everything

The Innocent Aggregate That Wasn't

Production Memory Monitoring That Actually Helps

Garbage Collection Tuning for Django

The Nuclear Option: Memory Leak Detection in Production

Django app randomly returns "Connection refused" to database

"This field cannot be blank" on required=False fields

Django queries take 30+ seconds in production but fast locally

"BrokenPipeError: [Errno 32] Broken pipe" during file uploads

Django admin shows "Programming Error: relation does not exist"

Celery tasks fail with "django.core.exceptions.ImproperlyConfigured"

"DisallowedHost at /" error in production

Django tests pass but app fails with ImportError

"OSError: [Errno 28] No space left on device"

Django forms return "ManagementForm data is missing"

Related Tools & Recommendations

FastAPI - High-Performance Python API Framework

Django: Python's Web Framework for Perfectionists

Deploy Django with Docker Compose - Complete Production Guide

Neon Production Troubleshooting Guide: Fix Database Errors

React Production Debugging: Fix App Crashes & White Screens

Helm Troubleshooting Guide: Fix Deployments & Debug Errors

Django Production Deployment Guide: Docker, Security, Monitoring

Redis Caching in Django: Boost Performance & Solve Problems

Grok Code Fast 1: Emergency Production Debugging Guide

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Fix TaxAct Errors: Login, WebView2, E-file & State Rejection Guide

TaxBit Enterprise Production Troubleshooting: Debug & Fix Issues

Trivy & Docker Security Scanner Failures: Debugging CI/CD Integration Issues

OpenAI Browser: Optimize Performance for Production Automation

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

Fix Slow Next.js Build Times: Boost Performance & Productivity

pandas Overview: What It Is, Use Cases, & Common Problems

Arbitrum Production Debugging: Fix Gas & WASM Errors in Live Dapps

Fix Common Xcode Build Failures & Crashes: Troubleshooting Guide

Webpack: The Build Tool You'll Love to Hate & Still Use in 2025