PostgreSQL's one-process-per-connection model is a relic from the 90s that works great until you hit 200 concurrent connections, then you need PgBouncer or everything falls apart. The query planner is brilliant 95% of the time and completely wrong the other 5% - usually during your busiest hour when a single reporting query decides to do a sequential scan on your biggest table.
Memory Configuration: Get This Wrong and You're Fucked
PostgreSQL's memory management is needlessly complex - you've got to tune at least four different memory settings or performance will be dogshit. Miss one setting and you'll spend weekends debugging why your supposedly beefy server is crawling. The PostgreSQL wiki has some guidance, but most of it is outdated bullshit that doesn't account for modern hardware.
Shared Buffers: The "25% of RAM" Lie That Ruins Performance
Everyone parrots the `shared_buffers` = 25% of RAM rule, which is absolute garbage advice that'll kill performance on any serious server. Here's what actually works:
- For systems with 8-16GB RAM: 25% is appropriate (2-4GB)
- For systems with 32GB+ RAM: 15-20% maximum to avoid OS cache competition
- For high-write workloads: Even lower, around 10-15%
I learned this the hard way when I cranked shared_buffers up to 32GB thinking more cache = better performance. Performance went to absolute shit because PostgreSQL was fighting with the OS cache. Both were caching the same damn data, so I basically created a memory war zone. Took me three hours of debugging at 2:47am to figure out the issue, then dropping it to around 12GB slowly fixed the performance problems over the next 20 minutes as the caches stabilized.
-- Check current buffer hit ratio
SELECT
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_read) as heap_read,
(sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read))) * 100 as ratio
FROM pg_statio_user_tables;
Target a buffer hit ratio above 95%. If it's lower, consider increasing shared_buffers
or optimizing queries.
Work Memory: The Setting That'll Eat All Your RAM
`work_mem` is pure evil if you don't understand it. It's per operation per connection, which means a single complex query can use work_mem multiple times, and with 100 connections you can suddenly consume gigabytes without realizing it. I learned about this setting when a reporting query consumed 50GB of RAM and brought down production at 2:14am on a Friday. Guess who spent the weekend rebuilding the server.
Critical calculation: Total potential memory = connections × work_mem × operations per query
For a system with 100 connections where queries average 3 operations each:
- At 16MB work_mem: 100 × 16MB × 3 = 4.8GB potential usage
- At 64MB work_mem: 100 × 64MB × 3 = 19.2GB potential usage
Recommended settings by workload:
- OLTP systems: 2-8MB (many simple queries, high concurrency)
- Analytics/reporting: 32-128MB (complex queries, fewer concurrent users)
- Mixed workloads: 8-16MB with connection pooling mandatory
Advanced Query Optimization Techniques
PostgreSQL's query planner makes decisions based on table statistics and config parameters, which sounds great until it fucks up your busiest query because the stats are stale. I've spent way too many nights debugging why a query that ran in 100ms yesterday is now taking 30 seconds because the planner decided to do a full table scan instead of using the perfectly good index. You'll know you're in this hell when you see "LOG: duration: 45123.456 ms" showing up in your logs every few minutes.
Index Strategy Beyond B-Trees
While B-tree indexes handle 80% of use cases, PostgreSQL offers specialized index types for specific scenarios:
GIN Indexes for JSONB and array operations, as explained in this detailed PostgreSQL tutorial:
-- For JSONB containment queries
CREATE INDEX CONCURRENTLY idx_metadata_gin ON products USING GIN (metadata);
-- Efficient for: WHERE metadata @> '{"category": "electronics"}'
BRIN Indexes for time-series data, perfect for PostgreSQL time-series workloads as TimescaleDB demonstrates:
-- Extremely small indexes for naturally ordered data
CREATE INDEX CONCURRENTLY idx_created_brin ON events USING BRIN (created_at);
BRIN indexes can be 1000x smaller than B-tree indexes while maintaining excellent performance for range queries on naturally clustered data.
Partial Indexes for filtered access patterns:
-- Only index active records
CREATE INDEX CONCURRENTLY idx_active_orders
ON orders (user_id, created_at)
WHERE status IN ('pending', 'processing');
Query Plan Analysis with EXPLAIN
Use EXPLAIN (ANALYZE, BUFFERS)
to understand actual query performance:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.username
ORDER BY order_count DESC;
Key metrics to monitor:
- Seq Scan on large tables: Usually indicates missing indexes
- High buffer reads: Indicates I/O pressure
- Nested Loop with large datasets: Often needs different join strategy
Connection and Process Management
PostgreSQL's process-per-connection model requires careful connection management at scale.
PgBouncer: Not Optional If You Want to Sleep At Night
PgBouncer isn't a nice-to-have - it's fucking mandatory. PostgreSQL without connection pooling is like driving without brakes. You'll crash eventually, it's just a matter of when. The official documentation is terrible and transaction pooling breaks half the applications that try to use it, but you need it anyway. Percona's guide explains the trade-offs better than the official docs:
[databases]
production = host=localhost port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 300
default_pool_size = 25
reserve_pool_size = 5
Pool modes:
- Transaction pooling: Best performance, breaks session state
- Session pooling: Maintains session state, lower performance
- Statement pooling: Rarely used, breaks prepared statements
Autovacuum: "Automatic" My Ass
Here's what I figured out about autovacuum after debugging this shit way too many times. Autovacuum is supposed to be automatic but you'll spend hours tuning it for high-write workloads. Out of the box, it's too conservative and your tables will bloat until performance dies. I've seen 40% dead tuples on production tables because autovacuum couldn't keep up. The PostgreSQL autovacuum tuning guide from pganalyze is actually useful, unlike most documentation:
-- Check table bloat
SELECT schemaname, tablename,
n_dead_tup,
n_live_tup,
round((n_dead_tup::float / (n_dead_tup + n_live_tup)) * 100, 2) as dead_percentage
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_percentage DESC;
For tables with high update rates, tune autovacuum per-table:
ALTER TABLE busy_table SET (
autovacuum_vacuum_scale_factor = 0.05, -- 5% instead of 20%
autovacuum_analyze_scale_factor = 0.02 -- 2% instead of 10%
);
Write-Ahead Log (WAL) Optimization
WAL configuration significantly impacts write performance and crash recovery time.
Checkpoint Configuration
Checkpoint tuning is where I've lost the most weekends - the database runs fine for hours, then suddenly everything blocks for 30 seconds while PostgreSQL flushes dirty pages. The first time this happened in production, I thought we were getting DDoSed because all connections just hung. Turns out the default checkpoint settings are garbage for any serious workload. This comprehensive guide to PostgreSQL checkpoint tuning explains the internals better than the official docs:
-- Spread checkpoints over time
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
-- Size WAL appropriately for write load
max_wal_size = 4GB
min_wal_size = 1GB
Setting checkpoint_completion_target = 0.9
spreads checkpoint I/O over 90% of the interval instead of creating those massive I/O spikes that make your monitoring dashboards look like someone's having a heart attack.
Monitoring WAL Generation
-- Measure WAL generation rate
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as wal_size
FROM pg_replication_slots;
-- Check for stuck replication slots
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as retained_wal
FROM pg_replication_slots
WHERE restart_lsn IS NOT NULL;
Performance Monitoring with Built-in Tools
PostgreSQL provides extensive performance monitoring capabilities through its statistics collector. The pg_stat_statements extension is essential for production monitoring, and this monitoring checklist from Crunchy Data covers the key metrics to watch.
Essential Performance Queries
-- Find slowest queries
SELECT query,
calls,
total_exec_time,
mean_exec_time,
rows/calls as avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Monitor lock contention
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
WHERE NOT blocked_locks.GRANTED;
Production Configuration Example
Here's a production-ready configuration for a 32GB system running mixed OLTP/analytics workloads:
## Memory settings
shared_buffers = 6GB
work_mem = 8MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
## WAL settings
wal_buffers = 32MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
## Connection settings
max_connections = 200 # With PgBouncer
This configuration assumes connection pooling is in place and provides a solid foundation for most production workloads. Always monitor performance metrics and adjust based on your specific usage patterns. For more advanced tuning, check out PostgreSQL performance best practices from EnterpriseDB and Heroku's PostgreSQL performance guide which actually covers real-world scenarios.