Critical Django Production Failures (The Ones That Wake You Up)

Q

Why does my Django app return "Bad Gateway" errors?

A

99% of the time it's Gunicorn workers dying due to memory exhaustion. Check dmesg | grep -i "killed process" for OOMKilled processes. Set memory limits in systemd and monitor with htop. The fix: reduce Gunicorn workers or increase server RAM.

Q

Django admin login always shows "CSRF verification failed"

A

CSRF_TRUSTED_ORIGINS is misconfigured in Django 5.2+. Add your domain: CSRF_TRUSTED_ORIGINS = ['https://yourdomain.com', 'https://www.yourdomain.com']. Also check ALLOWED_HOSTS includes your domain. Without this, CSRF tokens fail validation.

Q

"OperationalError: database is locked" on SQLite production

A

You're using SQLite in production with concurrent writes. SQLite locks the entire database for writes. Switch to PostgreSQL or MySQL immediately. No workaround exists for high-concurrency SQLite.

Q

Django migrations fail with "column already exists"

A

Migration state is out of sync with actual database schema. Run python manage.py migrate --fake-initial to mark initial migrations as applied, then python manage.py migrate to apply remaining changes. Check migration conflicts.

Q

Static files return 404 in production with DEBUG=False

A

STATIC_ROOT not configured or collectstatic not run. Set STATIC_ROOT = '/var/www/static/' and run python manage.py collectstatic. Configure web server to serve static files directly, not Django. Check static files deployment.

Q

Django app consumes 100% CPU constantly

A

Infinite loop in views or bad database queries causing full table scans. Enable Django Debug Toolbar locally to inspect queries. In production, check SHOW PROCESSLIST on MySQL or pg_stat_activity on PostgreSQL for long-running queries.

Q

"IntegrityError: duplicate key value violates unique constraint"

A

Race condition in database writes or broken auto-increment sequences. Use get_or_create() for atomic operations. For PostgreSQL, reset sequence: SELECT setval('table_id_seq', (SELECT MAX(id) FROM table));

Q

Django template rendering extremely slow

A

Template inheritance loops or missing template caching. Check for circular extends. Enable template caching: TEMPLATES[0]['OPTIONS']['loaders'] = [('django.template.loaders.cached.Loader', [...])].

Q

Celery tasks fail with "No module named" errors

A

Python path differences between Django and Celery workers. Ensure DJANGO_SETTINGS_MODULE is set for Celery. Use absolute imports in tasks. Check if virtual environment is activated for worker processes.

Q

Django sessions expire immediately

A

SESSION_COOKIE_AGE set to 0 or SESSION_EXPIRE_AT_BROWSER_CLOSE enabled. Check session backend configuration. For Redis sessions, verify Redis server is running and accessible from Django app.

Q

"KeyError" with composite primary keys in Django 5.2

A

Composite primary keys were added in Django 5.0 but still have edge cases.

If you get `Key

Error` during queries, check your get_or_create() calls

  • they need both keys specified.

The GitHub issue tracker has ongoing fixes for composite key edge cases.

Q

Async views break with "SynchronousOnlyOperation" errors

A

Mixing async views with sync database calls causes deadlocks.

Use `database_sync_to_async` for ORM calls or `sync_to_async` for other operations.

The async documentation explains the gotchas

  • ignore it at your peril.

Memory Leaks: The Silent App Killers

Django memory leaks don't announce themselves - they slowly consume RAM until your server runs out and the kernel starts killing processes. Here's how to catch them before they kill your app, using memory profiling tools, production monitoring strategies, Django memory debugging, memory leak detection patterns, garbage collection analysis, heap profiling techniques, Django queryset optimization, ORM memory patterns, tracemalloc integration, and production debugging strategies.

The Django Memory Leak You'll Actually Encounter

Forget the textbook examples. Here's the memory leak that brought down our production API serving 2 million requests per day:

## views.py - This innocent code consumed 8GB RAM in 6 hours
def export_users(request):
    \"\"\"Export all users to CSV - looks harmless, right?\"\"\"
    users = User.objects.all()  # Don't fucking do this
    
    response = HttpResponse(content_type='text/csv')
    writer = csv.writer(response)
    
    for user in users:  # Memory leak: entire queryset loaded into RAM
        writer.writerow([user.email, user.created_at])
    
    return response

What happened: User.objects.all() loads every user into memory at once. With 500,000 users, that's 4-6GB of RAM. The server had 8GB total, so this single request killed everything else running.

The fix that actually works:

## views.py - Memory-efficient version that scales
def export_users(request):
    \"\"\"Export users without killing the server\"\"\"
    response = HttpResponse(content_type='text/csv')
    writer = csv.writer(response)
    
    # Iterator doesn't load everything into memory
    users = User.objects.all().iterator(chunk_size=1000)
    
    for user in users:
        writer.writerow([user.email, user.created_at])
    
    return response

The iterator() method with chunk_size=1000 processes 1000 records at a time, keeping memory usage constant regardless of dataset size.

QuerySet Memory Traps That Kill Production

Django's ORM makes it easy to write queries that consume ridiculous amounts of memory. Here are the patterns that will fuck you over:

The Prefetch That Ate Everything

## BAD - Loads 10,000 comments into memory per article
articles = Article.objects.prefetch_related('comments').all()
for article in articles:
    print(f\"{article.title}: {article.comments.count()} comments\")

If you have 100 articles with 10,000 comments each, you just loaded 1 million comment objects into RAM. The solution is Prefetch objects with limits:

## GOOD - Limits memory usage with selective prefetching
from django.db.models import Prefetch

articles = Article.objects.prefetch_related(
    Prefetch(
        'comments',
        queryset=Comment.objects.order_by('-created_at')[:10],
        to_attr='recent_comments'
    )
).all()

for article in articles:
    print(f\"{article.title}: {len(article.recent_comments)} recent comments\")

The Innocent Aggregate That Wasn't

## This looks safe but it's not
user_stats = User.objects.annotate(
    total_orders=Count('orders'),
    total_spent=Sum('orders__total')
).all()

## Memory explodes when you iterate
for user in user_stats:  # Loads entire result set
    print(f\"{user.email}: ${user.total_spent}\")

Annotations can create massive result sets. Use iterator() for large annotated queries:

## Safe version that won't kill your server
user_stats = User.objects.annotate(
    total_orders=Count('orders'),
    total_spent=Sum('orders__total')
).iterator(chunk_size=500)

for user in user_stats:
    print(f\"{user.email}: ${user.total_spent}\")

Production Memory Monitoring That Actually Helps

Don't wait for your app to crash. Here's monitoring that catches memory issues early:

## Memory monitoring middleware for production
import tracemalloc
import psutil
import logging

class MemoryMonitoringMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        tracemalloc.start()  # Enable memory tracking
        self.logger = logging.getLogger(__name__)
        
    def __call__(self, request):
        # Snapshot memory before request
        snapshot_before = tracemalloc.take_snapshot()
        process = psutil.Process()
        memory_before = process.memory_info().rss
        
        response = self.get_response(request)
        
        # Check memory after request
        snapshot_after = tracemalloc.take_snapshot()
        memory_after = process.memory_info().rss
        
        memory_delta = (memory_after - memory_before) / 1024 / 1024  # MB
        
        # Alert on suspicious memory usage
        if memory_delta > 100:  # More than 100MB per request
            top_stats = snapshot_after.compare_to(snapshot_before, 'lineno')
            
            leak_info = []
            for stat in top_stats[:5]:
                size_mb = stat.size_diff / 1024 / 1024
                leak_info.append(f\"  {size_mb:.1f}MB: {stat.traceback.format()[-1]}\")
            
            self.logger.error(
                f\"Memory leak detected: {request.path} used {memory_delta:.1f}MB
\"
                f\"Top allocations:
\" + \"
\".join(leak_info)
            )
        
        # Track memory trends
        if hasattr(request, 'user') and request.user.is_authenticated:
            cache_key = f\"memory_usage_{request.user.id}_{request.path}\"
            # Store memory usage for trend analysis
            cache.set(cache_key, memory_delta, 300)  # 5 minutes
            
        return response

This catches memory leaks in real-time and logs the exact code causing the problem. Install it in MIDDLEWARE and check your logs for leak alerts.

Garbage Collection Tuning for Django

Python's garbage collector can cause performance issues with large Django applications. Here's tuning that actually improves performance:

## settings.py - GC tuning for production Django
import gc

## Reduce GC frequency for generation 0 (helps with high request volumes)
gc.set_threshold(700, 10, 10)  # Default is (700, 10, 10)

## For memory-intensive apps, tune more aggressively
if os.environ.get('DJANGO_ENV') == 'production':
    gc.set_threshold(1500, 15, 15)  # Less frequent GC, better performance

Monitor GC performance in production:

## Add to your monitoring
import gc

def log_gc_stats():
    \"\"\"Log garbage collection statistics\"\"\"
    stats = gc.get_stats()
    logger.info(f\"GC Stats: {stats}\")
    
    # Check for excessive garbage collection
    if stats[0]['collections'] > 10000:  # Adjust threshold based on your app
        logger.warning(\"High GC activity detected - potential memory leak\")

The Nuclear Option: Memory Leak Detection in Production

When everything else fails, here's the code that saved our production environment. It identified a Celery task that was leaking 200MB per execution:

## Advanced memory leak detector - use only when desperate
import objgraph
import tracemalloc
from collections import defaultdict

class MemoryLeakHunter:
    def __init__(self):
        self.baseline = None
        self.request_count = 0
        
    def take_baseline(self):
        \"\"\"Call this after app warmup\"\"\"
        tracemalloc.start()
        self.baseline = tracemalloc.take_snapshot()
        logger.info(\"Memory baseline established\")
        
    def hunt_leaks(self, request):
        \"\"\"Call this periodically to hunt for leaks\"\"\"
        self.request_count += 1
        
        if self.request_count % 100 == 0:  # Check every 100 requests
            current = tracemalloc.take_snapshot()
            
            if self.baseline:
                top_stats = current.compare_to(self.baseline, 'lineno')
                
                # Log top memory growth
                logger.info(\"Top memory growth since baseline:\")
                for stat in top_stats[:10]:
                    size_mb = stat.size_diff / 1024 / 1024
                    logger.info(f\"  +{size_mb:.1f}MB: {stat.traceback.format()[-1]}\")
                
                # Check for concerning growth patterns
                total_growth = sum(stat.size_diff for stat in top_stats) / 1024 / 1024
                if total_growth > 500:  # More than 500MB growth
                    logger.error(f\"MEMORY LEAK DETECTED: {total_growth:.1f}MB growth\")
                    
                    # Generate object growth report
                    objgraph.show_growth(limit=10)

This saved us when a third-party library was leaking Django model instances. The logs showed exactly which code was causing the growth, making it easy to isolate and fix.

Memory leaks in Django aren't mysterious - they're usually QuerySets loading too much data, circular references in model relationships, or third-party libraries not cleaning up properly. The key is monitoring memory usage per request and investigating anything that grows over time.

Database & Performance Disasters

Q

Django app randomly returns "Connection refused" to database

A

Database connection pool exhausted due to connections not being closed properly. Check CONN_MAX_AGE setting

  • if too high, connections never close. Set to 0 for development, 60-300 for production. Monitor active connections with SELECT * FROM pg_stat_activity (PostgreSQL) or SHOW PROCESSLIST (MySQL).
Q

"This field cannot be blank" on required=False fields

A

Form validation vs model validation mismatch. Model field has blank=False but form field has required=False. Both must match: either blank=True, null=True on model AND required=False on form, or both required. Check field validation rules.

Q

Django queries take 30+ seconds in production but fast locally

A

Missing database indexes on foreign keys or filtered fields. Run python manage.py dbshell then EXPLAIN ANALYZE your slow queries. Look for "Seq Scan" (table scans). Add indexes: class Meta: indexes = [models.Index(fields=['field_name'])]. Check database optimization guide.

Q

"BrokenPipeError: [Errno 32] Broken pipe" during file uploads

A

Client disconnected during upload or file size exceeds limits. Check FILE_UPLOAD_MAX_MEMORY_SIZE and web server upload limits (nginx client_max_body_size). Handle with try/except around file processing and validate file sizes before processing.

Q

Django admin shows "Programming Error: relation does not exist"

A

Migrations not applied to database or migration state inconsistency. Run python manage.py showmigrations to see unapplied migrations. Apply with python manage.py migrate. If migration conflicts exist, resolve with merge migrations.

Q

Celery tasks fail with "django.core.exceptions.ImproperlyConfigured"

A

Django not initialized in Celery worker process. Add to your celery.py: import django; django.setup() before importing tasks. Set DJANGO_SETTINGS_MODULE environment variable for workers. Check Celery Django integration.

Q

"DisallowedHost at /" error in production

A

ALLOWED_HOSTS doesn't include your domain. Add your production domains: ALLOWED_HOSTS = ['yourdomain.com', 'www.yourdomain.com']. For load balancers, include internal IPs. Check HTTP_HOST header in request to see what Django is receiving.

Q

Django tests pass but app fails with ImportError

A

Python path differences between test and runtime environments. Tests run from project root, but production may run from different directory. Use absolute imports and ensure PYTHONPATH includes your project directory. Check if virtual environment is activated.

Q

"OSError: [Errno 28] No space left on device"

A

Server disk full from log files, media uploads, or database growth. Check disk usage with df -h and large files with du -sh *. Implement log rotation, media cleanup policies, and database maintenance. Monitor disk space in production monitoring.

Q

Django forms return "ManagementForm data is missing"

A

Formset management form not included in template or CSRF token issues. Include {{ formset.management_form }} in template before rendering formset forms. Check for JavaScript that modifies form without updating management form data. Use formset validation.

Django Error Categories & Debugging Approaches

Error Type

Typical Symptoms

Primary Cause

Debug Strategy

Production Impact

Memory Leaks

Gradual RAM increase, OOMKilled processes

Queryset.all() loading large datasets, circular references

tracemalloc, memory profiling middleware

Server crashes, cascade failures

Database Deadlocks

"Lock wait timeout exceeded", hanging requests

Concurrent transactions, poor query patterns

SHOW ENGINE INNODB STATUS, query logging

Request timeouts, user-facing errors

Static File 404s

Missing CSS/JS, broken images

STATIC_ROOT misconfigured, collectstatic not run

Check STATIC_URL, web server config

Broken user experience

CSRF Failures

Form submissions rejected, "CSRF verification failed"

CSRF_TRUSTED_ORIGINS missing, token mismatch

Browser dev tools, Django debug page

Users can't submit forms

Migration Conflicts

"Migration not applied", schema inconsistency

Parallel development, manual DB changes

showmigrations, --fake flags

Database corruption, app startup failures

Template Errors

"TemplateDoesNotExist", rendering failures

Path issues, inheritance loops

Template debug mode, TEMPLATES setting

Blank pages, 500 errors

Import Errors

"No module named", AttributeError on startup

PYTHONPATH issues, missing dependencies

sys.path inspection, pip freeze

App won't start

Session Issues

Users logged out randomly, session data lost

SESSION_ENGINE misconfigured, Redis connectivity

Session backend logs, Redis monitoring

User logout loops, lost shopping carts

File Upload Failures

"Connection reset by peer", upload timeouts

Size limits, permission issues, disk space

Web server logs, disk usage monitoring

Users can't upload content

Celery Task Failures

Tasks stuck in PENDING, worker crashes

Django not initialized, memory exhaustion

Celery logs, worker monitoring

Background jobs don't complete

Django Troubleshooting Resources

Related Tools & Recommendations

tool
Similar content

FastAPI - High-Performance Python API Framework

The Modern Web Framework That Doesn't Make You Choose Between Speed and Developer Sanity

FastAPI
/tool/fastapi/overview
100%
tool
Similar content

Django: Python's Web Framework for Perfectionists

Build robust, scalable web applications rapidly with Python's most comprehensive framework

Django
/tool/django/overview
81%
howto
Similar content

Deploy Django with Docker Compose - Complete Production Guide

End the deployment nightmare: From broken containers to bulletproof production deployments that actually work

Django
/howto/deploy-django-docker-compose/complete-production-deployment-guide
67%
tool
Similar content

Neon Production Troubleshooting Guide: Fix Database Errors

When your serverless PostgreSQL breaks at 2AM - fixes that actually work

Neon
/tool/neon/production-troubleshooting
64%
tool
Similar content

React Production Debugging: Fix App Crashes & White Screens

Five ways React apps crash in production that'll make you question your life choices.

React
/tool/react/debugging-production-issues
62%
tool
Similar content

Helm Troubleshooting Guide: Fix Deployments & Debug Errors

The commands, tools, and nuclear options for when your Helm deployment is fucked and you need to debug template errors at 3am.

Helm
/tool/helm/troubleshooting-guide
62%
tool
Similar content

Django Production Deployment Guide: Docker, Security, Monitoring

From development server to bulletproof production: Docker, Kubernetes, security hardening, and monitoring that doesn't suck

Django
/tool/django/production-deployment-guide
59%
integration
Similar content

Redis Caching in Django: Boost Performance & Solve Problems

Learn how to integrate Redis caching with Django to drastically improve app performance. This guide covers installation, common pitfalls, and troubleshooting me

Redis
/integration/redis-django/redis-django-cache-integration
56%
tool
Similar content

Grok Code Fast 1: Emergency Production Debugging Guide

Learn how to use Grok Code Fast 1 for emergency production debugging. This guide covers strategies, playbooks, and advanced patterns to resolve critical issues

XAI Coding Agent
/tool/xai-coding-agent/production-debugging-guide
56%
tool
Similar content

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Explore PostgreSQL's advantages over other databases, dive into real-world production horror stories, solutions for common issues, and expert debugging tips.

PostgreSQL
/tool/postgresql/overview
47%
tool
Similar content

Fix TaxAct Errors: Login, WebView2, E-file & State Rejection Guide

The 3am tax deadline debugging guide for login crashes, WebView2 errors, and all the shit that goes wrong when you need it to work

TaxAct
/tool/taxact/troubleshooting-guide
47%
tool
Similar content

TaxBit Enterprise Production Troubleshooting: Debug & Fix Issues

Real errors, working fixes, and why your monitoring needs to catch these before 3AM calls

TaxBit Enterprise
/tool/taxbit-enterprise/production-troubleshooting
47%
tool
Similar content

Trivy & Docker Security Scanner Failures: Debugging CI/CD Integration Issues

Troubleshoot common Docker security scanner failures like Trivy database timeouts or 'resource temporarily unavailable' errors in CI/CD. Learn to debug and fix

Docker Security Scanners (Category)
/tool/docker-security-scanners/troubleshooting-failures
47%
tool
Similar content

OpenAI Browser: Optimize Performance for Production Automation

Making This Thing Actually Usable in Production

OpenAI Browser
/tool/openai-browser/performance-optimization-guide
47%
troubleshoot
Similar content

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

Production-tested solutions for MongoDB topology errors that break Node.js apps and kill database connections

MongoDB
/troubleshoot/mongodb-topology-closed/connection-pool-exhaustion-solutions
46%
troubleshoot
Similar content

Fix Slow Next.js Build Times: Boost Performance & Productivity

When your 20-minute builds used to take 3 minutes and you're about to lose your mind

Next.js
/troubleshoot/nextjs-slow-build-times/build-performance-optimization
42%
tool
Similar content

pandas Overview: What It Is, Use Cases, & Common Problems

Data manipulation that doesn't make you want to quit programming

pandas
/tool/pandas/overview
42%
tool
Similar content

Arbitrum Production Debugging: Fix Gas & WASM Errors in Live Dapps

Real debugging for developers who've been burned by production failures

Arbitrum SDK
/tool/arbitrum-development-tools/production-debugging-guide
42%
tool
Similar content

Fix Common Xcode Build Failures & Crashes: Troubleshooting Guide

Solve common Xcode build failures, crashes, and performance issues with this comprehensive troubleshooting guide. Learn emergency fixes and debugging strategies

Xcode
/tool/xcode/troubleshooting-guide
42%
tool
Similar content

Webpack: The Build Tool You'll Love to Hate & Still Use in 2025

Explore Webpack, the JavaScript build tool. Understand its powerful features, module system, and why it remains a core part of modern web development workflows.

Webpack
/tool/webpack/overview
41%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization