PostgreSQL Performance Optimization - AI Knowledge Base
Memory Configuration
Shared Buffers Configuration
- Standard recommendation: 25% of RAM (from 2005 documentation)
- Production reality: Causes OS cache competition on modern systems
- Actual working configuration:
- 8-16GB RAM: 25% acceptable (2-4GB)
- 32GB+ RAM: 15-20% maximum (4-6GB on 32GB system)
- High-write workloads: 10-15%
- Critical failure point: Above 8GB shared_buffers unless >64GB RAM
- Real-world impact: Dropping from 16GB to 8GB shared_buffers improved performance 40% on 64GB system
Work Memory Settings
- Per-operation per-connection multiplier: Each connection uses work_mem × number of operations
- Danger zone: Above 64MB without understanding query patterns
- Production settings:
- OLTP workloads: 2-4MB
- Analytics/reporting: 16-64MB
- Mixed workloads: 8-16MB
- Memory calculation:
Real memory = 4MB + (work_mem × avg_operations) + temp_buffers
- Failure scenario: 256MB work_mem with reporting queries = OOM crashes
Connection Memory Overhead
- Base cost per connection: 2-4MB
- Additional costs: work_mem multipliers, temp_buffers (8MB default)
- Breaking point: Above 100 connections without pooling
- Real calculation example:
- 100 connections × 40MB each = 4GB before storing data
- 500 connections = 20GB overhead
Maintenance Work Memory
- Default inadequacy: 64MB too small for production
- Production setting: 5-10% of RAM or 1-2GB maximum
- Performance impact: 3x faster index creation with 2GB vs 64MB
- Separate autovacuum setting: autovacuum_work_mem = 1GB
Query Performance and Indexing
Index Type Selection Matrix
Index Type | Use Case | Size Overhead | Write Impact | Query Types |
---|---|---|---|---|
B-tree | General purpose (80% of cases) | 1x | Low | Equality, ranges, ORDER BY |
GIN | JSONB, arrays, full-text | 3-5x | High | Containment (@>), array ops |
GiST | Geometric, nearest-neighbor | 2-3x | Medium | PostGIS, similarity |
BRIN | Time-series (ordered data) | 0.001x | Very low | Range queries on ordered data |
Critical Query Patterns That Cause Failures
N+1 Query Problem:
- Symptom: One query per user/record in loops
- Impact: Linear performance degradation with data growth
- Solution: Window functions with single query
Large OFFSET Pagination:
- Failure point: OFFSET >10,000 becomes unusable
- Impact: Scans and discards all offset rows
- Solution: Cursor-based pagination with WHERE conditions
OR Conditions Breaking Indexes:
- Problem:
WHERE col1 = ? OR col2 = ?
cannot use indexes efficiently - Solution: UNION ALL with separate indexed queries
Index Design Principles
- Multi-column order: Most selective columns first
- Partial indexes: 5-10x smaller for filtered queries
- Index prefixes: PostgreSQL can use left prefixes of multi-column indexes
- Over-indexing threshold: >6 indexes per table kills write performance
Performance Monitoring and Alerting
Critical Metrics and Thresholds
Metric | Warning | Critical | Impact if Exceeded |
---|---|---|---|
Buffer Hit Ratio | <97% | <95% | 5x slower queries from disk I/O |
Connection Usage | >80% | >90% | Connection exhaustion errors |
Dead Tuple Ratio | >15% | >30% | Exponential query slowdown |
Query Duration | >500ms avg | >1000ms avg | Application timeouts |
Disk Space | <20% free | <10% free | Database shutdown |
Lock Wait Time | >30s | >60s | Application deadlocks |
Buffer Hit Ratio Monitoring
-- Target: >95% consistently
SELECT round((sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read))) * 100, 2)
FROM pg_statio_user_tables;
- Performance impact: Drop from 99% to 92% = 5x slower queries
- Root causes: Insufficient shared_buffers, missing indexes, inadequate RAM
Autovacuum Failure Detection
-- Alert on >15% dead tuples
SELECT schemaname, tablename,
round((n_dead_tup::float / (n_dead_tup + n_live_tup)) * 100, 1) as dead_percentage
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000;
- Failure progression: 100ms queries become 10s queries due to bloat
- Emergency fix: Manual VACUUM (takes locks, plan maintenance window)
Production Configuration Templates
32GB RAM System (Mixed Workload)
shared_buffers = 6GB
work_mem = 8MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
max_connections = 200 # Requires connection pooling
wal_buffers = 32MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
Connection Pooling (PgBouncer)
pool_mode = transaction # Not session
max_client_conn = 300
default_pool_size = 25
reserve_pool_size = 5
- Critical: Use transaction pooling, not session pooling
- Breaking point: Session pooling kills performance with connection reuse
Scaling Decision Matrix
Current State | Next Action | Resource Cost | Complexity | Expected Gain |
---|---|---|---|---|
<64GB RAM, CPU bound | Vertical scaling | Low (if budget) | Low | 2-3x performance |
>70% reads | Read replicas | Medium | Medium | Offload 70% of queries |
Single table >10TB | Partitioning | High | High | 10x query performance |
>25k writes/sec | Sharding | Very High | Very High | Horizontal scale (3x engineering cost) |
Storage I/O bound | SSD upgrade | Low | Low | 2x performance boost |
Failure Scenarios and Recovery
Connection Exhaustion
- Symptom: "FATAL: remaining connection slots reserved"
- Immediate fix: Kill idle connections, implement pooling
- Prevention: Alert at 80% connection usage
Memory Exhaustion (OOM)
- Common cause: work_mem too high × concurrent operations
- Calculation: connections × operations × work_mem = total memory
- Recovery: Reduce work_mem, kill heavy queries, add RAM
Disk Space Exhaustion
- Critical threshold: <10% free space
- WAL accumulation: Stuck replication slots prevent cleanup
- Emergency cleanup: Drop replication slots, increase max_wal_size
Query Performance Degradation
- Primary cause: Stale statistics (80% of cases)
- Quick fix:
ANALYZE table_name
- Prevention: Monitor last_analyze timestamps
Resource Investment Analysis
Optimization Type | Time Investment | Skill Level | Performance Impact | Maintenance Overhead |
---|---|---|---|---|
Memory tuning | 2-4 hours | Intermediate | High (40% gains) | Low |
Index optimization | 4-8 hours | Intermediate | Very High (5-100x) | Medium |
Connection pooling | 1-2 hours | Beginner | High (eliminates OOM) | Low |
Query rewriting | 8-16 hours | Advanced | Very High (45s→1.2s) | High |
Monitoring setup | 4-6 hours | Intermediate | Preventive | Low |
Common Misconceptions
- "More shared_buffers is always better" - False: Competes with OS cache on modern systems
- "Default settings work for production" - False: Defaults optimized for 2005 hardware
- "Connection pooling is optional" - False: Mandatory above 100 connections
- "Hardware fixes query problems" - False: Bad queries don't improve with better hardware
- "VACUUM FULL fixes performance" - False: Takes exclusive locks, often makes problems worse
Emergency Response Procedures
High CPU Usage
- Identify running queries:
pg_stat_activity
- Check for missing indexes:
EXPLAIN ANALYZE
- Kill runaway queries:
pg_cancel_backend(pid)
Memory Pressure
- Check connection count vs limits
- Reduce work_mem immediately
- Implement connection pooling
Storage Issues
- Monitor disk space: <10% = critical
- Check WAL directory size
- Drop stuck replication slots if necessary
This knowledge base prioritizes actionable intelligence and real-world failure scenarios over theoretical optimization.
Related Tools & Recommendations
Don't Get Screwed by NoSQL Database Pricing - MongoDB vs Redis vs DataStax Reality Check
I've seen database bills that would make your CFO cry. Here's what you'll actually pay once the free trials end and reality kicks in.
PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life
alternative to cockroachdb
MySQL Workbench Performance Issues - Fix the Crashes, Slowdowns, and Memory Hogs
Stop wasting hours on crashes and timeouts - actual solutions for MySQL Workbench's most annoying performance problems
MySQL HeatWave - Oracle's Answer to the ETL Problem
Combines OLTP and OLAP in one MySQL database. No more data pipeline hell.
MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide
Migrate MySQL to PostgreSQL without destroying your career (probably)
PostgreSQL vs MySQL vs MariaDB - Developer Ecosystem Analysis 2025
PostgreSQL, MySQL, or MariaDB: Choose Your Database Nightmare Wisely
MariaDB - What MySQL Should Have Been
competes with MariaDB
MariaDB Performance Optimization - Making It Not Suck
competes with MariaDB
pgAdmin - The GUI You Get With PostgreSQL
It's what you use when you don't want to remember psql commands
Docker Desktop Alternatives That Don't Suck
Tried every alternative after Docker started charging - here's what actually works
Docker Swarm - Container Orchestration That Actually Works
Multi-host Docker without the Kubernetes PhD requirement
Docker Security Scanner Performance Optimization - Stop Waiting Forever
compatible with Docker Security Scanners (Category)
Prometheus + Grafana + Jaeger: Stop Debugging Microservices Like It's 2015
When your API shits the bed right before the big demo, this stack tells you exactly why
Setting Up Prometheus Monitoring That Won't Make You Hate Your Job
How to Connect Prometheus, Grafana, and Alertmanager Without Losing Your Sanity
Set Up Microservices Monitoring That Actually Works
Stop flying blind - get real visibility into what's breaking your distributed services
Making Pulumi, Kubernetes, Helm, and GitOps Actually Work Together
Stop fighting with YAML hell and infrastructure drift - here's how to manage everything through Git without losing your sanity
CrashLoopBackOff Exit Code 1: When Your App Works Locally But Kubernetes Hates It
compatible with Kubernetes
Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You
Stop debugging distributed transactions at 3am like some kind of digital masochist
SQL Server 2025 - Vector Search Finally Works (Sort Of)
competes with Microsoft SQL Server 2025
I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too
Four Months of Pain, 47k Lost Sessions, and What Actually Works
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization