Gunicorn - The Python Server Everyone Ends Up Using

Currently viewing the human version

Why Everyone Uses Gunicorn (And Why You Should Too)

Look, Gunicorn isn't going to win any "sexiest Python server" awards. But after deploying it dozens of times across everything from startups to enterprise systems, here's the brutal truth: it just works.

While other servers make you read 50-page configuration manuals or debug mysterious worker crashes at 3am, Gunicorn starts up, handles requests, and keeps running. That boring reliability is exactly why it's become the default choice for Python web deployment.

Let me walk you through what actually matters when you're trying to keep your Python web app running in production.

The Real Story About Workers

Gunicorn uses a pre-fork worker model - basically it spawns a bunch of worker processes that can handle requests. The master process just sits there managing the workers, restarting them when they crash (and they will crash eventually).

How the worker model works:

Master Process (PID 1000)
├── Worker 1 (PID 1001) - handles HTTP requests
├── Worker 2 (PID 1002) - handles HTTP requests
├── Worker 3 (PID 1003) - handles HTTP requests
└── Worker 4 (PID 1004) - handles HTTP requests

Start with (2 × cores) + 1 workers. If your server has 4 cores, try 9 workers. Your app might need more or fewer - there's no magic formula despite what everyone tells you. I've seen Django apps that needed 20 workers on a 4-core box because they were hitting the database constantly.

Pro tip: Workers eat memory. If you're running a memory-leaky Django app (and let's be honest, most are), restart workers every few thousand requests with --max-requests 5000. Your memory usage will thank you.

Pro tip: If workers are acting weird during restarts, add --max-requests-jitter 1000 to randomize restart timing. Prevents the thundering herd problem when all workers restart at the same time.

What Actually Breaks in Production

Now that you understand how workers function, let me save you some 3am debugging sessions by covering the real problems you'll hit in production:

Memory creep: Workers slowly consume more RAM over time. This is usually your app's fault, not Gunicorn's. I spent 6 hours once tracking down a Pandas DataFrame that wasn't getting garbage collected.
Worker timeouts: If your view takes longer than 30 seconds, the worker gets killed. Set --timeout 120 if you're doing heavy processing, but seriously, fix your slow code instead.
Too many workers: Don't just throw more workers at performance problems. You'll run out of memory fast. I watched someone go from 8 to 40 workers and tank their server when they hit the 4GB RAM limit.
Database connections: Each worker needs its own DB connection. With 20 workers, you need at least 20 database connections available. PostgreSQL defaults to 100 max connections, so do the math.
File descriptor limits: On Ubuntu 20.04, the default ulimit -n is 1024. Hit that limit and new connections just fail silently. Bump it to 65536 in your systemd service file.
The Docker Alpine trap: Alpine images sometimes break Gunicorn's signal handling. Your graceful restarts just... don't. Use debian-slim instead, or enjoy debugging why your containers ignore SIGTERM.
Logs that lie: Gunicorn says "Listening at http://0.0.0.0:8000" but actually dies 5 seconds later because your app import failed. Always tail the logs during startup or you'll think it's working when it's not.

The nginx + Gunicorn Dance

Speaking of production problems, here's one you'll definitely hit: trying to use Gunicorn by itself. Don't do this. You absolutely need nginx (or similar) in front of Gunicorn. I learned this the hard way when Gunicorn tried to serve a 50MB video file and basically died.

The typical production setup:

Internet → Nginx (Port 80/443) → Gunicorn (Port 8000) → Django App

Nginx handles:

Static files (CSS, JS, images)
SSL termination
Request buffering
Gzip compression

Gunicorn handles:

Your Python app
That's it

This separation saved my ass when someone uploaded a 200MB PDF to our Django admin and shared the direct link on Reddit. Nginx served it just fine while Gunicorn kept handling API requests.

Framework Reality Check

Now that we've covered the infrastructure setup, let's talk about what Gunicorn actually works well with. The short answer: pretty much everything Python.

Django: Just works, been using this combo for years
Flask: Solid, no surprises
FastAPI: Use with async workers (gevent/eventlet) if you want decent performance

Don't expect miracles with async frameworks though. If your code does blocking stuff (like most Django ORM calls), async workers won't help much.

Real experience: I tried switching a Django app to async workers because "async is faster." Spent 3 days debugging why performance got worse before realizing the Django ORM was still blocking on every database call.

Version Reality (As of September 2025)

Current version is around 23.x. Recent versions fixed some HTTP parsing bugs that could bite you in edge cases. I had one client stuck on 20.1.0 for months because they were scared of upgrading - turned out the newer versions fixed a memory leak they'd been working around.

Requires Python 3.7+, but seriously, if you're still on 3.7 in 2025, that's a bigger problem than choosing a web server.

Deployment pain: The jump from 21.x to 22.x broke some apps because of stricter HTTP parsing. Nothing wrong with the new behavior, but suddenly weird client requests that worked before started getting rejected. Check your logs after upgrading or you'll wonder why mobile apps started failing randomly.

When NOT to Use Gunicorn

Real talk: Sometimes you shouldn't use Gunicorn.

WebSockets: Use something async-native like Uvicorn
Massive file uploads: You'll have a bad time
Windows: It doesn't run on Windows, period
Tiny containers: The multi-process model might be overkill

My Deployment Script

This is what I actually run in production:

gunicorn myapp.wsgi:application \
    --bind 0.0.0.0:8000 \
    --workers 9 \
    --worker-class sync \
    --timeout 120 \
    --max-requests 5000 \
    --max-requests-jitter 1000 \
    --preload \
    --access-logfile - \
    --error-logfile -

The --preload flag loads your app before forking workers. Saves memory but means you can't gracefully reload code without restart. I learned this when trying to deploy a hotfix at 2am and wondering why the code wasn't updating.

Memory tip: --preload cuts memory usage by about 30% for typical Django apps, but breaks code reloading. Pick your poison.

Bottom Line

After walking through workers, production gotchas, infrastructure requirements, and deployment scripts, here's what it all comes down to: Gunicorn is the Python server that just works.

Is it the fastest? No. Is it the most feature-rich? Definitely not. But it's reliable, well-documented, and boring in all the right ways. Instagram uses it to serve millions of users, so it's probably fine for whatever you're building.

Just remember the cardinal rule: don't serve static files with it unless you hate yourself and your users. Use nginx for static files instead.

Personal take: After 5 years of running Gunicorn in production, it's the boring choice that lets me sleep at night. Yeah, uWSGI might be 20% faster if you configure it right, but Gunicorn starts up and keeps working. Sometimes that's all you need.

Python Web Servers: The Real Story

Server	Personal Damage Level	SO Tabs Needed	Real Performance	Why I Use It (Or Don't)
Gunicorn	Low stress, my go-to	2-3	~2,000 RPS on my setup	Just works, Instagram uses it, lets me sleep
uWSGI	High blood pressure	15+	~4,000 RPS (when it works)	Too smart for its own good, config hell
Uvicorn	Medium worry level	1-2	~6,000+ RPS for async	Great for FastAPI, confusing otherwise
Waitress	Very low stress	0-1	~1,500 RPS max	Windows deployments, internal tools
mod_wsgi	Existential dread	5-8	??? (depends on Apache mood)	Only when corporate forces Apache

Questions I Get Asked (And My Honest Answers)

How many workers should I use?

Honestly? No idea. Start with (2 × cores) + 1 and see what breaks.

I've seen 20 workers on a 4-core box because the app was constantly hitting the database. I've also seen 2 workers handle more load than 10 workers because the app wasn't garbage.

Reality: Just try different numbers and see what happens. There's no formula that works for everyone.

Why does my Gunicorn keep running out of memory?

Your app probably has a memory leak. Fix that first, then blame Gunicorn.

Seriously though, each worker loads your entire app into memory. If your Django app is 200MB and you have 10 workers, that's 2GB just sitting there. Add --max-requests 5000 to restart workers periodically and buy yourself some time.

Why do I need nginx with Gunicorn?

Because Gunicorn serving static files is like using a Ferrari to deliver pizza - technically possible but stupid expensive.

Nginx handles static files in ~microseconds. Gunicorn takes milliseconds and ties up a worker process. With enough static requests, you'll run out of workers and your site will crawl.

Plus nginx does SSL termination, gzip compression, and protects you from slow clients. Gunicorn does none of that well.

My app is slow, should I add more workers?

Probably not, but I've been wrong before.

Things to check before you blame Gunicorn:

Database is slow (it's always the database)
You're making 50 API calls per request
Your code sucks

Adding workers to fix a slow database is like buying more cars to fix traffic jams. Spoiler: doesn't work.

Should I use async workers?

Only if your app is genuinely async-friendly. Most Django apps aren't.

If you're doing database queries with the Django ORM, you're blocking anyway. Async workers won't help. FastAPI with proper async database calls? Yeah, async workers make sense.

When async workers help: Lots of HTTP API calls, long-polling, streaming responses
When they don't: Traditional Django/Flask apps that block on database calls

How do I debug why Gunicorn is dying?

Check your logs first: journalctl -u your-service or wherever you're logging.

Common culprits:

Worker timeout: Your code takes too long (increase --timeout)
OOM killer: You're using too much memory (reduce workers or fix memory leaks)
File descriptor limit: Too many connections (check ulimit -n)

Add --log-level debug for more verbose output, but prepare for log spam.

Can I run Gunicorn on Windows?

No. It uses Unix-specific features like fork(). Use Waitress if you're stuck on Windows.

This comes up more than you'd think. The answer is always "switch to Linux or use a different server."

My Gunicorn workers keep timing out

Default timeout is 30 seconds. If your views take longer, workers get killed.

Either:

Fix your slow code (preferred solution)
Increase timeout: --timeout 120
Use async processing for long tasks (Celery, RQ, etc.)

I've seen people set 300-second timeouts to "fix" slow database queries. Don't be that person.

Why is my memory usage growing over time?

Memory leaks in your app code. Gunicorn workers accumulate memory and never release it.

Memory leak pattern looks like this:

Hour 1: 200MB per worker
Hour 2: 250MB per worker
Hour 3: 320MB per worker
Hour 4: 400MB per worker
... eventually: OOM killer strikes

Quick fix: Restart workers periodically with --max-requests 1000
Real fix: Profile your app and find the leak

Pro tip: Python's garbage collector isn't perfect. Some third-party libraries leak memory like a sieve.

How do I gracefully restart Gunicorn?

Send HUP signal to the master process: kill -HUP $(cat gunicorn.pid)

This spawns new workers and kills old ones after they finish current requests. Usually works, but occasionally a worker gets stuck and you need kill -9.

Why does Gunicorn work fine locally but die on my $5 VPS?

Because your $5 VPS has 512MB of RAM and you're trying to run 8 workers. Do the math.

I spent 4 hours debugging this once before realizing I was trying to run a 200MB Django app with 10 workers on a machine with 1GB RAM. The OOM killer was having a field day.

Should I use --preload?

Depends. --preload loads your app once then forks workers.

Saves memory but makes reloading code harder.

Use --preload if: You're memory-constrained and restart the whole process for code changes
Skip it if: You want to reload code with HUP signal

What about Docker containers?

Use fewer workers in containers. A 4-core host with 10 containers each running 8 workers = 80 workers fighting for 4 cores. Do the math.

I usually run 2-4 workers per container and scale horizontally instead.

Why does Gunicorn randomly exit code 143?

Because Docker sent SIGTERM and your app took too long to shut down. Docker waits 10 seconds then sends SIGKILL (exit code 137).

Either handle shutdown signals properly in your app or increase Docker's grace period to 30 seconds. I learned this when our "graceful" deployments kept showing as crashed in monitoring even though they worked fine.

Resources That Actually Help

31%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation

The Real Story About Workers

What Actually Breaks in Production

The nginx + Gunicorn Dance

Framework Reality Check

Version Reality (As of September 2025)

When NOT to Use Gunicorn

My Deployment Script

Bottom Line

How many workers should I use?

Why does my Gunicorn keep running out of memory?

Why do I need nginx with Gunicorn?

My app is slow, should I add more workers?

Should I use async workers?

How do I debug why Gunicorn is dying?

Can I run Gunicorn on Windows?

My Gunicorn workers keep timing out

Why is my memory usage growing over time?

How do I gracefully restart Gunicorn?

Why does Gunicorn work fine locally but die on my $5 VPS?

Should I use --preload?

What about Docker containers?

Why does Gunicorn randomly exit code 143?

Related Tools & Recommendations

Deploy Django with Docker Compose - Complete Production Guide

FastAPI Production Deployment Errors - The Debugging Hell Guide

Django Troubleshooting Guide - Fixing Production Disasters at 3 AM

Stop Breaking FastAPI in Production - Kubernetes Reality Check

django-redis - Redis Cache That Actually Works

Automate Your SSL Renewals Before You Forget and Take Down Production

nginx - когда Apache лёг от нагрузки

NGINX Ingress Controller - Traffic Routing That Doesn't Shit the Bed

Docker Daemon Won't Start on Windows 11? Here's the Fix

Docker 프로덕션 배포할 때 털리지 않는 법

jQuery - The Library That Won't Die

Hoppscotch - Open Source API Development Ecosystem

Stop Jira from Sucking: Performance Troubleshooting That Works

Fix Your FastAPI App's Biggest Performance Killer: Blocking Operations

Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You

Your Kubernetes Cluster is Probably Fucked

Northflank - Deploy Stuff Without Kubernetes Nightmares

Django - The Web Framework for Perfectionists with Deadlines

LM Studio MCP Integration - Connect Your Local AI to Real Tools

FastAPI - High-Performance Python API Framework