Currently viewing the AI version
Switch to human version

PostgreSQL Streaming Replication: AI-Optimized Production Guide

Configuration Requirements

Infrastructure Specifications

  • Identical PostgreSQL versions required: Exact version match (14.8 ≠ 14.9) prevents mysterious replication failures
  • Network connectivity: Dedicated NICs recommended for replication traffic
  • Disk space: Minimum 3x primary database size for WAL accumulation during outages
    • Critical failure point: 50GB database can generate 200GB WAL during weekend network outage
    • Production reality: Size for disaster scenarios, not normal operation

Version Compatibility

  • PostgreSQL 15+ recommended for monitoring improvements
  • Streaming replication: Same major versions only (14↔15 incompatible)
  • Cross-version needs: Use logical replication with additional complexity

Primary Server Configuration

Essential postgresql.conf Settings

wal_level = replica                    # Required for standby WAL data
max_wal_senders = 5                   # Each standby + backup tools consume one
wal_keep_size = 2GB                   # Prevents WAL deletion before standby processing
archive_mode = on                     # Backup plan when streaming fails
listen_addresses = 'specific_ip'      # Never use '*' in production
max_connections = 200                 # Account for replication connections

Critical Failure Modes

  • pg_basebackup fails 3 times average before working due to:
    1. Firewall/IP address errors
    2. pg_hba.conf authentication failures
    3. Permission denied on destination
    4. Network timeouts during large copies
    5. Primary WAL sender slot exhaustion

Security Configuration

CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'secure_password';

pg_hba.conf entry:

host replication replication_user 10.0.1.100/32 scram-sha-256

Standby Server Setup

Base Backup Process

sudo -u postgres pg_basebackup \
    -h 10.0.1.99 \
    -p 5432 \
    -U replication_user \
    -D /var/lib/postgresql/15/main \
    -Fp -Xs -P -R -W

Time Requirements

  • 100GB database: 2-6 hours depending on network
  • 1TB database: Cancel weekend plans
  • Production planning: 2-4 hours minimum (not "brief maintenance window")

Standby-Specific Settings

hot_standby = on                      # Enable read-only queries
hot_standby_feedback = off            # Prevents primary bloat
max_connections = 100                 # Lower for read-only server
wal_receiver_timeout = 60s            # Failure detection vs network tolerance

Performance Impact and Resource Requirements

Network Bandwidth

  • WAL generation: 1GB/hour becomes 2-3GB/hour network traffic
  • Catchup scenarios: Bandwidth spikes during standby recovery
  • Network failures: Can overwhelm "enterprise" connections

Storage Requirements

  • WAL accumulation: 100GB database can fill 500GB partition during outages
  • Monitoring threshold: Alert when WAL directory >20% of disk space
  • Disaster sizing: Plan for 3x database size minimum

Primary Server Impact

  • Normal operation: Minimal performance impact
  • Network issues: WAL accumulation can crash primary server
  • Disk space exhaustion: Entire primary database goes down

Operational Troubleshooting

Replication Status Verification

-- Primary server check
SELECT * FROM pg_stat_replication;  -- Should show state='streaming'

-- Standby server check  
SELECT pg_is_in_recovery();         -- Should return true

Common Failure Scenarios

Connection Failures

  • 90% cause: pg_hba.conf misconfiguration
  • 9% cause: Firewall blocking port 5432
  • 1% cause: Obscure network issues

Replication Lag Growth

  • High flush_lag: Network bottleneck
  • High replay_lag: Underpowered standby server
  • Solution trade-offs: Better hardware vs accepting lag

Long Query Conflicts

  • Symptom: Standby queries cancelled by replication
  • Root cause: Long-running reports conflict with primary updates
  • Impact: 2-hour reports terminated by simple UPDATEs

Failover Procedures

Emergency Promotion

# CRITICAL: Ensure old primary is completely down first
pg_ctl promote -D /var/lib/postgresql/15/main

Post-Failover Requirements

  1. Update all application connection strings
  2. Reconfigure monitoring systems
  3. Plan standby replacement strategy

Synchronous vs Asynchronous Trade-offs

Aspect Synchronous Asynchronous
Data Loss Risk Zero (if network stable) Some data lost on primary failure
Commit Performance Slower, network-dependent Minimal impact
Network Requirements High reliability required Tolerates occasional hiccups
Use Cases Financial/medical data Most web applications
Complexity High ongoing tuning Low until failures occur

Production Monitoring Requirements

Critical Alerts

  • Replication lag > 30 seconds
  • WAL directory disk space < 20%
  • Missing replication processes
  • Standby connection failures

Tools Integration

  • Prometheus + postgres_exporter: Production monitoring
  • pg_stat_replication: Built-in status monitoring
  • Log monitoring: PostgreSQL error logs for failure detection

Version Upgrade Constraints

Major Version Limitations

  • Streaming replication: Cannot cross major versions
  • Upgrade options:
    1. Logical replication to new version (complex)
    2. Downtime for primary upgrade + standby rebuild
    3. pg_upgrade + standby resync
  • Reality: All upgrade paths have significant complexity

Resource Investment Requirements

Time Investments

  • Initial setup: 4-8 hours including troubleshooting
  • Large database sync: Hours to days depending on size
  • Failover testing: Plan monthly testing windows
  • Troubleshooting: Network issues can consume entire weekends

Expertise Requirements

  • PostgreSQL administration: Advanced level required
  • Network troubleshooting: Essential for replication issues
  • Monitoring setup: Critical for production stability
  • Disaster recovery: Must be tested and documented

Infrastructure Costs

  • Standby hardware: Size equally to primary (don't cheap out)
  • Network capacity: Plan for 2-3x normal WAL traffic
  • Storage overhead: 3x primary database size minimum
  • Monitoring tools: Budget for proper alerting systems

Critical Warnings

Documentation Gaps

  • "Brief maintenance window": Actually 2-4 hours minimum
  • Network requirements: Underspecified in official docs
  • Disk space planning: WAL accumulation severely underestimated
  • Failure scenarios: Real-world complexity not covered

Breaking Points

  • WAL disk exhaustion: Crashes entire primary database
  • Network instability: Can make replication unusable
  • Standby query workload: Long queries will be cancelled
  • Split-brain scenarios: Requires careful primary shutdown verification

Production Gotchas

  • pg_basebackup timeouts: Test with actual database sizes
  • SSL certificate management: Easy to overlook, hard to fix later
  • Connection pooling: Replication consumes application connections
  • Backup tool conflicts: pg_basebackup competes for WAL sender slots

Related Tools & Recommendations

compare
Recommended

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life

alternative to cockroachdb

cockroachdb
/compare/postgresql-mysql-mariadb-sqlite-cockroachdb/database-decision-guide
100%
tool
Recommended

MySQL Workbench Performance Issues - Fix the Crashes, Slowdowns, and Memory Hogs

Stop wasting hours on crashes and timeouts - actual solutions for MySQL Workbench's most annoying performance problems

MySQL Workbench
/tool/mysql-workbench/fixing-performance-issues
84%
tool
Recommended

MySQL HeatWave - Oracle's Answer to the ETL Problem

Combines OLTP and OLAP in one MySQL database. No more data pipeline hell.

Oracle MySQL HeatWave
/tool/oracle-mysql-heatwave/overview
84%
howto
Recommended

MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide

Migrate MySQL to PostgreSQL without destroying your career (probably)

MySQL
/howto/migrate-mysql-to-postgresql-production/mysql-to-postgresql-production-migration
84%
tool
Recommended

SQL Server 2025 - Vector Search Finally Works (Sort Of)

competes with Microsoft SQL Server 2025

Microsoft SQL Server 2025
/tool/microsoft-sql-server-2025/overview
72%
pricing
Recommended

Don't Get Screwed by NoSQL Database Pricing - MongoDB vs Redis vs DataStax Reality Check

I've seen database bills that would make your CFO cry. Here's what you'll actually pay once the free trials end and reality kicks in.

MongoDB Atlas
/pricing/nosql-databases-enterprise-cost-analysis-mongodb-redis-cassandra/enterprise-pricing-comparison
69%
tool
Recommended

SQLite - The Database That Just Works

Zero Configuration, Actually Works

SQLite
/tool/sqlite/overview
61%
tool
Recommended

SQLite Performance: When It All Goes to Shit

Your database was fast yesterday and slow today. Here's why.

SQLite
/tool/sqlite/performance-optimization
61%
troubleshoot
Recommended

Docker Daemon Won't Start on Linux - Fix This Shit Now

Your containers are useless without a running daemon. Here's how to fix the most common startup failures.

Docker Engine
/troubleshoot/docker-daemon-not-running-linux/daemon-startup-failures
50%
news
Recommended

Linux Foundation Takes Control of Solo.io's AI Agent Gateway - August 25, 2025

Open source governance shift aims to prevent vendor lock-in as AI agent infrastructure becomes critical to enterprise deployments

Technology News Aggregation
/news/2025-08-25/linux-foundation-agentgateway
50%
compare
Recommended

PostgreSQL vs MySQL vs MariaDB - Developer Ecosystem Analysis 2025

PostgreSQL, MySQL, or MariaDB: Choose Your Database Nightmare Wisely

PostgreSQL
/compare/postgresql/mysql/mariadb/developer-ecosystem-analysis
46%
tool
Recommended

MariaDB - What MySQL Should Have Been

competes with MariaDB

MariaDB
/tool/mariadb/overview
46%
tool
Recommended

MariaDB Performance Optimization - Making It Not Suck

competes with MariaDB

MariaDB
/tool/mariadb/performance-optimization
46%
tool
Recommended

pgAdmin - The GUI You Get With PostgreSQL

It's what you use when you don't want to remember psql commands

pgAdmin
/tool/pgadmin/overview
45%
alternatives
Recommended

Docker Desktop Alternatives That Don't Suck

Tried every alternative after Docker started charging - here's what actually works

Docker Desktop
/alternatives/docker-desktop/migration-ready-alternatives
45%
tool
Recommended

Docker Swarm - Container Orchestration That Actually Works

Multi-host Docker without the Kubernetes PhD requirement

Docker Swarm
/tool/docker-swarm/overview
45%
tool
Recommended

Docker Security Scanner Performance Optimization - Stop Waiting Forever

compatible with Docker Security Scanners (Category)

Docker Security Scanners (Category)
/tool/docker-security-scanners/performance-optimization
45%
tool
Recommended

PostgreSQL WAL Tuning - Stop Getting Paged at 3AM

The WAL configuration guide for engineers who've been burned by shitty defaults

PostgreSQL Write-Ahead Logging (WAL)
/tool/postgresql-wal/wal-architecture-tuning
45%
tool
Recommended

Debezium - Database Change Capture Without the Pain

Watches your database and streams changes to Kafka. Works great until it doesn't.

Debezium
/tool/debezium/overview
45%
integration
Recommended

Prometheus + Grafana + Jaeger: Stop Debugging Microservices Like It's 2015

When your API shits the bed right before the big demo, this stack tells you exactly why

Prometheus
/integration/prometheus-grafana-jaeger/microservices-observability-integration
41%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization