pg_basebackup - PostgreSQL's Built-in Backup Tool

What pg_basebackup Actually Does (And Why It'll Frustrate You)

PostgreSQL Replication Flow

pg_basebackup is PostgreSQL's built-in backup tool that copies your entire database cluster while it's running. Sounds simple, right? Well, it is and it isn't.

pg_basebackup creates what's called a "physical backup," which means it's copying the actual data files on disk, not dumping SQL like pg_dump does. This makes it fast as hell for large databases (we're talking 500GB+ where pg_dump would take 12 hours and pg_basebackup finishes in 2), but it also means you're stuck with some annoying limitations.

The Good News

When pg_basebackup works, it fucking works. I've seen it backup a 2TB production database in under 6 hours over gigabit ethernet. Try doing that with pg_dump and you'll be waiting until next Tuesday.

The backup happens while your database is running - no downtime, no locking users out, no "sorry the app is down for maintenance" emails. It uses PostgreSQL's replication protocol (the same one streaming replicas use) so your database thinks it's just sending data to another server.

The Reality Check

But here's where things get interesting. pg_basebackup will absolutely murder your production server's performance if you don't rate-limit it. I learned this the hard way when a 3AM backup brought our entire API to its knees because I forgot the --max-rate flag.

The tool includes WAL (Write-Ahead Log) files, which is what makes the backup consistent. Without WAL files, your backup is just a bunch of data files that were copied at different times - completely useless. The '-X stream' option streams WAL files during the backup, which doubles your network usage but prevents the "backup looks successful but is actually corrupted" nightmare.

PostgreSQL Architecture Overview

Understanding the PostgreSQL backup architecture is crucial - pg_basebackup leverages the same replication protocol used by streaming replicas, which explains why it requires replication slots and proper connection authentication.

Version-Specific Gotchas

PostgreSQL 13 changed how WAL files are named, breaking backup scripts that depended on the old naming scheme. Fun times.

PostgreSQL 14 added compression options, but the built-in compression is single-threaded and slower than just piping the output through gzip.

PostgreSQL 17 introduced incremental backups, which sound amazing until you realize they're buggy as hell and the tooling around them is still half-baked. The PostgreSQL 17 release notes detail the new features, but real-world testing shows issues with incremental manifests and block tracking overhead. Stick with pgBackRest if you need reliable incrementals.

What You Actually Get

pg_basebackup gives you either:

A directory that looks exactly like your PostgreSQL data directory (plain format)
A tar file containing the same (tar format)

The plain format is what you want 99% of the time. Tar format is only useful if you're copying backups around, and even then it's a pain in the ass to restore from - you have to extract the entire thing before PostgreSQL can read it.

The backup is only valid for the same PostgreSQL major version. You can't backup PostgreSQL 15 and restore it on PostgreSQL 16. That's what pg_dump is for.

Anyway, here's how to actually use this without destroying your production environment.

How to Actually Make pg_basebackup Work (Without Destroying Production)

The Setup That'll Save Your Job

AWS pg_basebackup Architecture

Before you even think about running pg_basebackup, you need to configure PostgreSQL properly. Miss any of this and you'll get cryptic error messages that'll waste 2 hours of your life. The PostgreSQL documentation on backup configuration covers these settings, but here's what actually matters in production.

First, the user permissions - and this trips up everyone:

-- This looks simple but there's gotchas
CREATE USER backup_user WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- You also need to grant access to specific databases if using logical replication
GRANT CONNECT ON DATABASE your_db TO backup_user;

postgresql.conf settings that will bite you if wrong:

## This is the big one - if max_wal_senders is too low, pg_basebackup fails silently
max_wal_senders = 5    # Default is 10, but check your replicas first
## See: https://postgresqlco.nf/doc/en/param/max_wal_senders/

## wal_level MUST be replica or higher 
wal_level = replica

## If you're on an older version and this is set to minimal, you're fucked
## wal_level = minimal  # This will break everything

pg_hba.conf - the file that makes grown DBAs cry:

## Allow replication connections (this goes in pg_hba.conf)
host replication backup_user 10.0.0.0/8 md5

## Don't use 'trust' in production unless you enjoy getting hacked
## host replication backup_user 0.0.0.0/0 trust  # NO!

Commands That Actually Work in Production

The basic backup that won't kill your server:

## Rate limiting is NOT optional in production
pg_basebackup -h db-server -D /backup/postgres -U backup_user \
  --max-rate=50M -P -v -X stream

## That 50M rate limit? I learned this after taking down production
## Start conservative, you can always increase it later

For large databases (>500GB), use this or wait forever:

## Compress on the fly and stream WAL
pg_basebackup -h db-server -D /backup/postgres -U backup_user \
  --max-rate=100M -Ft -z -X stream -P -v

## The -z flag uses gzip. It's single-threaded and slow, but built-in
## For better compression, pipe the output through pigz or lz4

The "oh shit, production is down" emergency backup:

## Remove rate limiting when you need speed (and pray your disk I/O holds up)
pg_basebackup -h db-server -D /emergency-backup -U backup_user \
  -P -v -X stream --no-sync

## --no-sync skips the final fsync, saving 10-30 seconds
## Only use when every second counts

WAL Streaming Protocol

What Will Go Wrong (And How to Fix It)

\"FATAL: number of requested standby connections exceeds max_wal_senders\"

This means your `max_wal_senders` is set too low. Check how many replicas you have running:

SELECT * FROM pg_stat_replication;

Each replica uses one wal_sender slot, plus pg_basebackup needs one more. The PostgreSQL monitoring guide explains how to track replication connection usage and slot management.

"ERROR: could not connect to server: Connection refused"

Your pg_hba.conf is probably wrong. Test with:

## This should work if your config is right
psql \"host=db-server user=backup_user dbname=postgres replication=database\"

Backup completes but is corrupted

This happens when WAL files get out of sync. Always use -X stream for production backups:

## Good - streams WAL during backup
pg_basebackup -X stream -D backup

## Bad - fetches WAL after backup (race condition prone)
pg_basebackup -X fetch -D backup

The backup fills up your disk

Check your disk space first, genius:

## Get database size
SELECT pg_size_pretty(pg_database_size('your_db'));

## Make sure you have 2x that much free space for the backup
df -h /backup

Real-World Performance Numbers

From our production environment (PostgreSQL 15, 800GB database) running on AWS RDS with performance monitoring enabled:

Without rate limiting: 45 minutes backup time, 100% CPU on database server, API timeouts
With 50MB/s limit: 2.5 hours backup time, 30% CPU usage, no user impact
With compression: 3.5 hours backup time, 60% space savings, one CPU core maxed out

Your mileage will vary, but start conservative and work your way up.

Of course, pg_basebackup isn't the only game in town. Let's see how it stacks up against the competition when you're actually running this stuff in production.

The Brutal Truth About PostgreSQL Backup Tools

Reality Check	pg_basebackup	pg_dump	pgBackRest	Barman
Backup Time (500GB DB)	2-4 hours	8-12 hours	45 minutes	2 hours
CPU Usage During Backup	30-60% (rate limited)	10-20%	80-90%	40-60%
Network Bandwidth	Saturates link w/o rate limit	Low	Saturates link	High
Restore Time	1-2 hours	6-10 hours	30 minutes	1.5 hours
Storage Efficiency	1x (no compression by default)	0.3x (SQL is tiny)	0.4x (excellent compression)	0.6x
Setup Complexity	30 minutes	5 minutes	2-4 hours	1-2 hours
Will It Work Out of Box?	Probably not	Yes	Definitely not	Probably not
Debugging When Broken	Good luck	Easy	Documentation hell	Medium

Questions You'll Actually Ask (When Things Go Wrong)

Why does my backup randomly fail at 90% completion?

Usually network timeouts or disk space running out. The backup process streams data continuously, and if your network hiccups or the target disk fills up, pg_basebackup dies. Check network stability with ping -i 0.2 db-server during backup. For disk space, monitor with watch -n 5 df -h /backup. If you're backing up over WAN, add --checkpoint=spread to reduce I/O spikes that trigger timeouts.

How do I know if my backup is actually restorable without testing it?

You don't. Seriously. I've seen "successful" backups that were corrupted because WAL files got out of sync. The only way to know is to test restore to a different server. For PostgreSQL 13+, use pg_verifybackup backup_directory but understand this only checks file integrity, not logical consistency. A monthly restore test to a throwaway instance is the only real validation.

Why does pg_basebackup eat all my bandwidth even with rate limiting?

The rate limiting only applies to data files, not WAL streaming. If you use -X stream, WAL files transfer at full speed and can easily saturate a gigabit connection on busy databases. Monitor WAL generation with:

SELECT pg_current_wal_lsn(), pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')/1024/1024 AS mb_generated;

For high-transaction databases, WAL can generate 100MB/hour+, which bypasses your rate limits.

My backup says it completed successfully but the directory is empty

Check your permissions and the exact error in the PostgreSQL logs. This usually happens when:

The backup user doesn't have REPLICATION privileges
The target directory doesn't exist or isn't writable
SELinux is blocking file creation (check ausearch -m avc on RHEL/CentOS)
The exit code will be 0 even if the backup failed silently due to permission issues.

Why do incremental backups sometimes end up larger than full backups?

PostgreSQL 17's incremental backups are buggy as hell. They sometimes include unchanged files, and the metadata overhead can be significant for databases with many small files. Incremental backups also don't compress as well because they're storing block-level changes rather than full files. Just use pgBackRest if you need real incremental backups.

How do I recover from a corrupted backup manifest?

If the backup manifest gets corrupted, your backup is effectively useless for Postgre

SQL 13+. You can try pg_resetwal on the backup directory, but you're probably better off taking a new backup. This is why I always use --no-manifest on production backups

one less thing to break.

pg_basebackup is maxing out my database server CPU, what gives?

You probably didn't rate limit it. pg_basebackup will read data as fast as your disks can provide it, which on NVMe drives can completely saturate your I/O subsystem. Always use --max-rate=50M or lower for production. Monitor with iostat -x 1 during backup to see disk utilization.

Can I run multiple pg_basebackup processes simultaneously?

Technically yes, but each one consumes a max_wal_senders slot and competes for disk I/O. Running 2-3 parallel backups will usually just make everything slower. If you need parallel backup, use pgBackRest instead

it's designed for this.

The backup worked fine for months, now it randomly fails

Check if your database size crossed a threshold where the backup duration exceeds wal_sender_timeout (default 60 seconds). As databases grow, backups take longer, and if PostgreSQL doesn't see activity from the backup process, it kills the connection. Increase wal_sender_timeout to 300 seconds or more for large databases.

Why does restoring the backup take longer than creating it?

Because you're probably restoring to spinning disks while the original server has SSDs, or your restore target has different I/O characteristics. Backup is mostly sequential reads (fast), restore involves random writes during WAL replay (slow). Also, if you used compression, decompression during restore is single-threaded and CPU-bound.

Quick Navigation

The Good News

The Reality Check

Version-Specific Gotchas

What You Actually Get

The Setup That'll Save Your Job

Commands That Actually Work in Production

What Will Go Wrong (And How to Fix It)

Real-World Performance Numbers

Why does my backup randomly fail at 90% completion?

How do I know if my backup is actually restorable without testing it?

Why does pg_basebackup eat all my bandwidth even with rate limiting?

My backup says it completed successfully but the directory is empty

Why do incremental backups sometimes end up larger than full backups?

How do I recover from a corrupted backup manifest?

pg_basebackup is maxing out my database server CPU, what gives?

Can I run multiple pg_basebackup processes simultaneously?

The backup worked fine for months, now it randomly fails

Why does restoring the backup take longer than creating it?

Related Tools & Recommendations

PostgreSQL WAL Tuning: Optimize Write-Ahead Logging for Production

PostgreSQL Streaming Replication: Production Setup Guide

Change Data Capture (CDC) Troubleshooting Guide: Fix Common Issues

Neon Production Troubleshooting Guide: Fix Database Errors

Supabase Production Deployment: Best Practices & Scaling Guide

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Neon Serverless PostgreSQL: An Honest Review & Production Insights

PostgreSQL Common Errors & Solutions: Fix Database Issues

pgLoader Overview: Migrate MySQL, Oracle, MSSQL to PostgreSQL

Migrate MySQL to PostgreSQL: A Practical, Step-by-Step Guide

pgAdmin Overview: The PostgreSQL GUI, Its Flaws & Features

Supabase Overview: PostgreSQL with Bells & Whistles

Google Cloud SQL: Managed Databases, No DBA Required

ClickHouse Overview: Analytics Database Performance & SQL Guide

PostgreSQL vs MySQL vs MariaDB - Performance Analysis 2025

PostgreSQL Alternatives: Escape Production Nightmares

Clair Production Monitoring: Debug & Optimize Vulnerability Scans

MySQL to PostgreSQL Production Migration: Complete Guide with pgloader

PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB: Cloud DBs

PostgreSQL vs. MySQL vs. MongoDB: Enterprise Scaling Reality