PostgreSQL Logical Replication - When Streaming Replication Isn't Enough

What Actually Is Logical Replication and Why Should You Care?

PostgreSQL Logical Replication Architecture

PostgreSQL WAL Architecture

Look, logical replication sounds fancy but it's basically this: instead of copying the entire database like streaming replication does, you pick specific tables and only sync those. PostgreSQL reads the WAL records, decodes them into actual SQL operations (INSERT, UPDATE, DELETE), and sends those to your subscriber database.

The catch? Tables need a primary key or unique index to work properly. No primary key means you're stuck with REPLICA IDENTITY FULL, which sends the entire row for every update and will destroy your WAL volume on busy tables.

How It Actually Works (Without the Bullshit)

You create a publication on your source database (publisher) listing which tables to replicate. Then you create a subscription on the destination database (subscriber) that connects to the publisher.

The publisher starts a walsender process that uses the built-in `pgoutput` plugin to decode WAL records into logical replication messages. These get streamed to the subscriber where apply workers execute the SQL operations.

PostgreSQL 17 Release

PostgreSQL 17 allegedly fixed some of the biggest pain points with failover slot synchronization and the new `pg_createsubscriber` utility that converts standby servers to logical subscribers. I'm still testing this shit - too many times PostgreSQL "fixes" have introduced new problems that are worse than the original bugs. But so far, it looks promising.

Why You'd Actually Want to Use This Thing

Different PostgreSQL versions? No problem. Logical replication works between major versions going back to 9.4. I've used it to migrate from PostgreSQL 11 to 15 with zero downtime. Streaming replication would've required matching versions and a full outage.

Only need specific tables? Perfect. You can replicate individual tables, specific columns, or even filter rows with WHERE clauses in your publication. Beats copying a 2TB database when you only need the user activity tables.

Want to write to the replica? Logical replication lets you write to subscriber databases. The subscriber can have different indexes, triggers, even completely different schemas. Try doing that with streaming replication (hint: you can't).

Multiple data sources? A subscriber can pull from multiple publishers. I've used this to aggregate customer data from regional databases into a central analytics warehouse.

PostgreSQL 17 Finally Fixed the Big Issues

PostgreSQL 17 dropped on September 26, 2024 and honestly, it fixed the stuff that's been driving everyone crazy about logical replication:

Failover slot synchronization: Remember how logical replication slots would just disappear after failover? Yeah, that's fixed. Slots now sync between primary and standby, so your replication doesn't die when the primary goes down. This was a massive operational pain point.

pg_createsubscriber utility: You can now convert existing physical standbys to logical subscribers without rebuilding everything from scratch. I've rebuilt way too many logical replication setups because of this missing piece.

pg_upgrade preservation: Your logical replication setup actually survives major version upgrades now. Before this, upgrading PostgreSQL meant tearing down and rebuilding all your replication relationships.

What Actually Breaks in Production

Tables without primary keys are a nightmare. You'll need either a unique index or REPLICA IDENTITY FULL. The latter sends the entire row for every update and will absolutely murder your WAL volume. I learned this the hard way on a busy events table - WAL usage went completely insane, like 5x or 6x normal volume. The monitoring graphs looked like a hockey stick.

WAL retention will fill up your disk. Logical replication slots prevent WAL cleanup until all subscribers catch up. If a subscriber goes down or gets stuck, your primary server runs out of disk space. I've seen this kill production servers when monitoring wasn't set up properly. Set up alerts on pg_replication_slots.restart_lsn lag or you'll get paged at 3am when the server dies.

Network connectivity is critical. Each subscription needs its own connection to the publisher. Unlike streaming replication with one connection, logical replication multiplies your connection requirements. Network hiccups cause apply lag that compounds quickly on busy systems.

Sequences drift like crazy. Logical replication doesn't sync sequences, so your subscriber's sequence values drift from the publisher. If you fail over to a subscriber, you'll get primary key conflicts when new inserts use duplicate sequence values.

Performance and What Doesn't Work

Logical replication uses less I/O than streaming replication since it only processes actual data changes, but the WAL decoding uses CPU cycles. On a busy system, you'll see higher CPU usage on the publisher from the walsender processes.

Schema changes aren't replicated. You have to apply DDL changes manually on both publisher and subscriber. This includes adding columns, changing types, creating indexes - everything structural needs to be coordinated manually.

TRUNCATE wasn't supported until PostgreSQL 11. If you're still on 10 or earlier, TRUNCATE operations just don't replicate. Subscribers keep their data while the publisher table gets emptied.

Large objects are fucked. TOAST columns and large objects aren't supported. Same with certain column types. Check the restrictions before committing to logical replication.

No conflict resolution. If you write to both publisher and subscriber, conflicts just error out and you're stuck fixing it manually. There's no automatic resolution like "last write wins" or anything useful. I once had a conflict that took me 3 hours to figure out because the error message was something useless like "duplicate key value violates unique constraint" with zero fucking context about which operation or row caused it. You need external tools like Postgres-XL, EnterpriseDB's BDR, or pglogical for multi-master scenarios, but honestly those add their own complexity and new ways to break.

When You Actually Need This

Cross-version upgrades: I've used logical replication to upgrade from PostgreSQL 11 to 15 with zero downtime. Well, mostly zero - had a brief hiccup when sequences got out of sync, but that's another story. Set up logical replication to the new version, let it sync, then switch over. Way less risky than pg_upgrade on a production system.

Selective data sync: Need to sync just user tables to a reporting database? Logical replication lets you pick specific tables and even filter rows with WHERE clauses. Beats copying a 2TB database when you only need 100GB of actual data.

Multi-region compliance: GDPR requires EU customer data stays in EU? Set up logical replication with row filters based on customer location. Each region gets only the data it's legally allowed to have.

Real-time ETL: Stream changes to your data warehouse as they happen. The subscriber can have completely different schemas, indexes, even extra computed columns. Try doing that with streaming replication.

Just remember: logical replication is complex as hell compared to streaming replication. Only use it when you actually need the features it provides. For basic high availability, streaming replication is usually what you want. Check out comprehensive tutorials if you're just getting started.

PostgreSQL Logical Replication FAQ (The Real Questions)

Why is logical replication so much more complicated than streaming replication?

Because it's doing way more work. Streaming replication just copies WAL blocks. Logical replication has to decode those blocks into SQL operations, filter them by table/row, and stream the changes. More moving parts means more ways to break.

My table has no primary key - am I screwed?

Pretty much. You need either a primary key, unique index, or REPLICA IDENTITY FULL. The latter sends the entire row for every UPDATE and will absolutely destroy your WAL volume. I've seen it increase WAL from 10GB/day to 200GB/day on a busy table. Actually, might've been worse than that

I just remember the monitoring graphs going completely batshit insane. Add a primary key or suffer the consequences.

Can I replicate between PostgreSQL 13 and 16?

Yeah, logical replication works between versions back to 9.4. I've used it to migrate from 13 to 16 with zero downtime. Just watch out for new features in the destination version that don't exist in the source.

Why is this eating all my disk space and how do I fix it before the server dies?

Logical replication slots hold onto WAL files until subscribers catch up. If a subscriber goes down, WAL just keeps piling up. Check pg_replication_slots.restart_lsn and compare it to current WAL position. Big gap = problem.Quick fix: Drop dead slots with SELECT pg_drop_replication_slot('slot_name'). Nuclear option: Restart the entire replication from scratch. Monitor slot lag or you'll run out of disk at 3am.

Schema changes just broke everything - now what?

DDL doesn't replicate. You have to apply schema changes manually to both sides. I always do subscriber first, then publisher, so new columns have somewhere to go. If you mess up the order, replication errors out and you're stuck fixing it manually.

Can I have multiple subscribers pulling from one publisher?

Sure.

Each subscriber gets its own replication slot, so they don't interfere with each other. Just watch your connection limits

each subscription uses a connection.

Can the subscriber have different table structures?

Yeah, within limits. The subscriber can have extra columns, different indexes, constraints, whatever. But the replicated columns need matching names and compatible types. Column order doesn't matter as long as names match. I've used this to add computed columns and better indexes on read replicas.

PostgreSQL 17 fixed failover - does it actually work?

So far, yes. Before 17, logical replication slots would disappear during failover and you'd have to rebuild everything. PostgreSQL 17 added slot synchronization between primary and standby. Set sync_replication_slots = on and slots should survive failover. I've tested this in dev but haven't had a production failover yet to really confirm it works as advertised. Fingers crossed.

Changes aren't showing up on the subscriber - what's the dumb thing I missed?

Check these in order:

Is the table in the publication?
Does the publication filter exclude the rows?
Does the table have a replication identity?
Check pg_stat_subscription for errors.

90% of the time it's #1 - you forgot to add the table to the publication. The error message will be something unhelpful like "logical replication worker for subscription has exited with exit code 1" which tells you absolutely nothing useful.

How do I know when this is about to break?

Watch `pg_stat_subscription.apply_lag` - if it keeps growing, you're falling behind. Check `pg_replication_slots.restart_lsn` lag too. Set up alerts because when slots get stuck, they prevent WAL cleanup and your disk fills up fast.

Also monitor pg_stat_subscription.worker_type - if apply workers are frequently restarting, you probably have conflicts or data type issues.

Can I replicate only some rows from a table?

Yeah, PostgreSQL 15 added row filters with WHERE clauses. You can do CREATE PUBLICATION pub_name FOR TABLE customers WHERE (region = 'US'). Column filtering isn't supported directly, but you can create views on the subscriber side.

Will this destroy my performance?

Logical replication uses CPU for WAL decoding on the publisher and network bandwidth for streaming changes. It's usually lighter on disk I/O than streaming replication since it only processes actual data changes. But on a busy system with lots of updates, the CPU overhead from decoding can be noticeable.

Why does my replication randomly break every Tuesday?

This is weirdly specific, but I've seen this. Usually it's some batch job or maintenance script that runs weekly and does something that breaks replication. Check your cron jobs for anything that truncates tables, drops and recreates objects, or does massive bulk updates. Also check if someone's running pg_dump or other maintenance that might be interfering with WAL.

PostgreSQL Replication: Which One Won't Screw You Over?

Feature	Logical Replication	Physical Streaming Replication	WAL Shipping
Replication Level	Table/row level selective	Entire cluster block-level	Entire cluster file-level
Cross-Version Support	Yes (PostgreSQL 9.4+)	Same major version only	Same major version only
Subscriber Writability	Fully writable	Read-only standby	Read-only standby
Schema Flexibility	Different schemas supported	Identical schema required	Identical schema required
DDL Replication	Manual DDL application	Automatic DDL replication	Automatic DDL replication
Setup Complexity	Moderate publication/subscription	Low streaming configuration	Low archive/restore setup
Network Requirements	Persistent connection per subscription	Single persistent connection	Periodic file transfer
Failover Time	Depends on apply lag	Near-instant (seconds)	Minutes to hours
Resource Overhead	CPU for WAL decoding	Minimal CPU, network bandwidth	Disk I/O for archiving
Conflict Resolution	None (manual intervention)	Not applicable	Not applicable
Large Object Support	No	Yes	Yes
Sequence Synchronization	No	Yes	Yes
TRUNCATE Support	PostgreSQL 11+	Yes	Yes
Pain in the Ass Factor	High (lots of moving parts)	Medium (mostly works)	Low (set it and forget it)
What Breaks First	Apply workers crash	Network hiccup kills stream	Archive destination fills up

Setting Up Logical Replication Without Losing Your Mind

PostgreSQL Logical Replication Process

Logical replication setup is more involved than streaming replication, but the flexibility is worth it if you actually need the features. Here's how to do it without shooting yourself in the foot.

Configuration That Will Actually Work

Set wal_level = logical in `postgresql.conf`. This increases WAL volume by 10-30% depending on your workload, so monitor disk space. Restart required - plan your downtime window accordingly.

Bump connection limits: Set `max_replication_slots` and max_wal_senders to at least 2x your planned subscriptions. Maybe 3x to be safe? Each subscription eats one slot and one sender process. I've been caught by this limit way too many times - nothing like having replication fail because you hit some arbitrary connection limit.

Fix pg_hba.conf: Add replication access for your replication user. Don't use superuser for replication - create a dedicated user with REPLICATION privilege. The subscriber connection needs REPLICATION and CREATE privileges.

The Basic Setup (Publisher Side)

Create a publication specifying exactly which tables to replicate. Don't use FOR ALL TABLES unless you actually want everything:

-- Be specific about what you're replicating
CREATE PUBLICATION my_app_pub FOR TABLE users, orders, products;

-- PostgreSQL 15+ lets you filter operations
CREATE PUBLICATION audit_pub FOR TABLE logs (INSERT);

Subscriber Setup (The Easy Part)

Create the table structures first - DDL doesn't replicate automatically. Then create the subscription:

CREATE SUBSCRIPTION my_app_sub 
CONNECTION 'host=publisher port=5432 dbname=source_db user=repl_user sslmode=require' 
PUBLICATION my_app_pub;

PostgreSQL Configuration

Initial sync happens automatically but can take hours on large tables. Or days if you're unlucky. The publisher's replication slot holds WAL during this time, so monitor disk space or you'll run out and get paged at 2am like I did once. Fun times.

PostgreSQL 17's New Stuff That Actually Helps

Failover slot sync: Set `sync_replication_slots = on` on your standby. Logical replication slots finally survive failover without breaking everything.

pg_createsubscriber: Converts physical standbys to logical subscribers without rebuilding from scratch. Saves hours of downtime:

pg_createsubscriber --publisher-host=primary \
    --publication=my_app_pub --subscription=my_app_sub

Monitoring (Or How to Not Get Paged at 3am)

PostgreSQL Monitoring

Watch `pg_replication_slots` for slot lag. When restart_lsn falls behind, WAL piles up and your disk fills:

-- Check slot lag (set up alerts for this)
SELECT slot_name, active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as behind
FROM pg_replication_slots WHERE slot_type = 'logical';

Monitor `pg_stat_subscription` for apply worker failures. When workers crash, run ALTER SUBSCRIPTION sub_name REFRESH PUBLICATION to restart them. There are specific queries for debugging replication issues.

If a slot gets stuck, drop it before it kills your disk:

SELECT pg_drop_replication_slot('dead_slot_name');

What Will Bite You in Production

Tables without primary keys: Use REPLICA IDENTITY FULL and your WAL volume explodes. Add primary keys or accept the pain.

Sequences drift: Logical replication doesn't sync sequences, so your subscriber sequences get out of sync with the publisher. If you failover to the subscriber, new inserts will conflict with existing data.

Large transactions: Massive batch updates cause apply lag and can crash apply workers. Break big transactions into smaller chunks.

SSL is not optional: Use sslmode=require in connection strings. Don't send replication traffic unencrypted across networks.

Unique constraints cause conflicts: If data arrives out of order, unique constraints on the subscriber can cause replication to fail. Consider DEFERRABLE INITIALLY DEFERRED constraints.

Stuck slots are a nightmare: WAL retention from stuck slots can fill your disk quickly. Monitor slot advancement and have cleanup procedures ready.

Test everything in development first. Logical replication has more moving parts than streaming replication, so there are more ways for it to break. The PostgreSQL 17 improvements help, but it's still complex as hell compared to streaming replication.

You'll want to bookmark the resources section below - you're going to need those links when things start breaking at 3am on a weekend. And they will break, probably when you least expect it.

Quick Navigation

How It Actually Works (Without the Bullshit)

Why You'd Actually Want to Use This Thing

PostgreSQL 17 Finally Fixed the Big Issues

What Actually Breaks in Production

Performance and What Doesn't Work

When You Actually Need This

Why is logical replication so much more complicated than streaming replication?

My table has no primary key - am I screwed?

Can I replicate between PostgreSQL 13 and 16?

Why is this eating all my disk space and how do I fix it before the server dies?

Schema changes just broke everything - now what?

Can I have multiple subscribers pulling from one publisher?

Can the subscriber have different table structures?

PostgreSQL 17 fixed failover - does it actually work?

Changes aren't showing up on the subscriber - what's the dumb thing I missed?

How do I know when this is about to break?

Can I replicate only some rows from a table?

Will this destroy my performance?

Why does my replication randomly break every Tuesday?

Configuration That Will Actually Work

The Basic Setup (Publisher Side)

Subscriber Setup (The Easy Part)

PostgreSQL 17's New Stuff That Actually Helps

Monitoring (Or How to Not Get Paged at 3am)

What Will Bite You in Production

Related Tools & Recommendations

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB

PostgreSQL vs MySQL vs MariaDB: Developer Ecosystem Analysis

ClickHouse Overview: Analytics Database Performance & SQL Guide

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

Cassandra Vector Search for RAG: Simplify AI Apps with 5.0

PostgreSQL Performance Optimization: Master Tuning & Monitoring

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Neon Production Troubleshooting Guide: Fix Database Errors

Supabase Production Deployment: Best Practices & Scaling Guide

Neon Serverless PostgreSQL: An Honest Review & Production Insights

Change Data Capture (CDC) Troubleshooting Guide: Fix Common Issues

Supabase Overview: PostgreSQL with Bells & Whistles

Liquibase Overview: Automate Database Schema Changes & DevOps

mongoexport Performance Optimization: Speed Up Large Exports

Redis Cluster Production Issues: Troubleshooting & Survival Guide

CDC Enterprise Implementation Guide: Real-World Challenges & Solutions

Flyway: Database Migrations Explained - Why & How It Works

Zero Downtime Database Migration: 2025 Tools That Actually Work

PostgreSQL Alternatives: Escape Production Nightmares

Firebase - Google's Backend Service for Serverless Development