PostgreSQL CDC: Logical Replication Configuration That Actually Works

PostgreSQL Logo

PostgreSQL logical replication actually works reliably once you figure out all the WAL settings that the official docs barely mention. I learned this the hard way when our WAL directory consumed 500GB during a weekend deployment because nobody told me about max_slot_wal_keep_size.

PostgreSQL streams changes through the WAL (Write-Ahead Log) using replication slots. The slot keeps track of where your connector is in the stream, so restarts don't lose data. Simple concept, but the configuration will bite you if you're not careful.

Essential PostgreSQL Configuration

postgresql.conf settings that prevent common disasters:

## WAL configuration for CDC
wal_level = logical
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = 1GB
max_slot_wal_keep_size = 5GB

## Connection management
max_connections = 300
shared_preload_libraries = 'wal2json'

## Performance tuning for CDC
wal_buffers = 64MB
checkpoint_completion_target = 0.9

Why these settings matter (learned from production failures):

  • max_slot_wal_keep_size = 5GB prevents WAL from eating your entire disk when CDC falls behind - without this, I've seen WAL consume 200GB+ during a 4-hour connector outage
  • max_replication_slots = 10 allows multiple CDC connectors without hitting the default limit of 5 - discovered this at 2am when the sixth connector failed to start
  • wal2json plugin performs 30% better than the default pgoutput for high-volume changes, but crashes on PostgreSQL 13.2.0 specifically - test your version

Debezium PostgreSQL Connector Configuration

Working production connector config:

{
  "name": "postgres-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "debezium_user",
    "database.password": "${file:/opt/kafka/secrets/db-password.txt:password}",
    "database.dbname": "production_db",
    "database.server.name": "postgres-server",
    "slot.name": "debezium_slot",
    "publication.name": "dbz_publication",
    "plugin.name": "wal2json",
    "table.include.list": "public.orders,public.payments,public.inventory",
    "schema.history.internal.kafka.topic": "schema-changes.postgres",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "database.connectionTimeoutInMs": 30000,
    "heartbeat.interval.ms": 60000,
    "max.queue.size": 16000,
    "max.batch.size": 4096,
    "poll.interval.ms": 1000
  }
}

Database user setup:

-- Create CDC user with proper permissions
CREATE USER debezium_user WITH REPLICATION LOGIN PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE production_db TO debezium_user;
GRANT USAGE ON SCHEMA public TO debezium_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO debezium_user;

-- Create publication for tables you want to monitor
CREATE PUBLICATION dbz_publication FOR TABLE public.orders, public.payments, public.inventory;

Critical Monitoring Queries

WAL retention monitoring (set up alerts when lag_size > 1GB):

SELECT slot_name, 
       active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as lag_size,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as flush_lag
FROM pg_replication_slots 
WHERE slot_name = 'debezium_slot';

Connection usage monitoring:

SELECT count(*), state, usename 
FROM pg_stat_activity 
WHERE usename = 'debezium_user' 
GROUP BY state, usename;

Performance Optimization Patterns

For high-volume tables, use REPLICA IDENTITY FULL to capture complete row data:

ALTER TABLE high_volume_orders REPLICA IDENTITY FULL;

For tables with large JSONB/TEXT columns, exclude unnecessary fields:

{
  "column.exclude.list": "public.user_profile.large_metadata,public.logs.full_stacktrace"
}

Common PostgreSQL CDC Pitfalls

WAL accumulation during maintenance: Always pause connectors before long-running operations or face disk space hell:

## Before maintenance (NEVER skip this)
curl -X PUT localhost:8083/connectors/postgres-cdc-connector/pause

## After maintenance  
curl -X PUT localhost:8083/connectors/postgres-cdc-connector/resume

I once forgot to pause connectors before a 6-hour database migration. WAL files grew to 180GB and filled the disk, taking down the entire database. Learn from my stupidity.

Schema evolution nightmare: Test every schema change in staging with CDC running - no exceptions. PostgreSQL handles adding columns fine, but renaming or dropping columns destroys connectors with cryptic error messages like "slot has been invalidated because it exceeded the maximum reserved size".

TOAST field disasters: Large JSONB or TEXT fields stored in TOAST tables crash connectors with OOM errors that won't show up until you hit production load. I spent 3 days debugging why our connector died every Tuesday at 2pm - turns out that's when the weekly report export triggered, and large JSON blobs killed the connector.

Connection exhaustion reality: CDC holds database connections permanently. Forget to account for this and your application will start throwing FATAL: remaining connection slots are reserved for non-replication superuser connections errors. Use pgbouncer for app connections, dedicate separate connection pools for CDC.

For comprehensive PostgreSQL CDC guidance, see PostgreSQL logical replication documentation, Debezium PostgreSQL connector reference, PostgreSQL WAL configuration guide, wal2json plugin documentation, PostgreSQL replication slots monitoring, PostgreSQL performance tuning for replication, pgbouncer connection pooling setup, PostgreSQL monitoring queries collection, Postgres CDC best practices guide, and PostgreSQL high availability documentation.

MySQL CDC: Binlog Configuration and Position Management

MySQL Logo

MySQL CDC will break in mysterious ways until you enable GTID, and even then you'll lose sleep over binlog position tracking. I've had MySQL binlog CDC work perfectly for months, then lose 6 hours of data during a routine restart because binlog positions got corrupted.

MySQL CDC reads from binary logs (binlog) that record every row change. The tricky part is tracking your position in these files - use old-school file positions and you're gambling with data loss on every restart. GTID makes this slightly less awful, but MySQL binlog CDC is still fragile as hell.

Essential MySQL Configuration

my.cnf settings for production CDC:

[mysqld]
## Binlog configuration for CDC
server-id = 1
log-bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 7
max_binlog_size = 1073741824
sync_binlog = 1

## GTID configuration (recommended)
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON

## Performance settings
innodb_flush_log_at_trx_commit = 1
binlog_cache_size = 1M
max_binlog_cache_size = 1G

Why these settings are critical (learned from data loss incidents):

  • binlog_row_image = FULL captures complete before/after row data - without this, you'll spend hours debugging why CDC is missing half the changes
  • expire_logs_days = 7 prevents binlog files from being deleted too quickly - set this too low and connector failures longer than your retention window will require full re-snapshots
  • gtid_mode = ON enables position recovery after failures - trust me, without GTID you're gambling with your data every restart
  • sync_binlog = 1 ensures durability but kills performance - I've seen 40% write throughput drops, but it's worth it to not lose binlog data during crashes

Debezium MySQL Connector Configuration

Production-tested connector config:

{
  "name": "mysql-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql.internal",
    "database.port": "3306",
    "database.user": "debezium_user",
    "database.password": "${file:/opt/kafka/secrets/mysql-password.txt:password}",
    "database.server.id": "184054",
    "database.server.name": "mysql-server",
    "database.include.list": "ecommerce,analytics",
    "table.include.list": "ecommerce.orders,ecommerce.products,analytics.events",
    "database.history.kafka.topic": "schema-changes.mysql",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "include.schema.changes": "true",
    "database.connectionTimeoutInMs": 30000,
    "connect.timeout.ms": 30000,
    "gtid.source.includes": ".*",
    "snapshot.mode": "initial",
    "snapshot.locking.mode": "minimal",
    "binlog.buffer.size": 8192,
    "max.queue.size": 8192,
    "max.batch.size": 2048,
    "poll.interval.ms": 1000,
    "heartbeat.interval.ms": 60000,
    "skipped.operations": "d"
  }
}

Database User Setup and Permissions

MySQL user creation:

-- Create CDC user with necessary privileges
CREATE USER 'debezium_user'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium_user'@'%';

-- For specific databases only (more secure)
GRANT SELECT ON ecommerce.* TO 'debezium_user'@'%';
GRANT SELECT ON analytics.* TO 'debezium_user'@'%';

FLUSH PRIVILEGES;

Binlog Monitoring and Management

Monitor binlog lag and retention:

-- Check current binlog status
SHOW MASTER STATUS;

-- List available binlog files  
SHOW BINARY LOGS;

-- Check binlog space usage
SELECT 
    ROUND(SUM(file_size)/1024/1024/1024,2) AS 'Binlog Size (GB)',
    COUNT(*) as 'File Count'
FROM information_schema.binary_log_files;

Position tracking queries for debugging:

-- Check replication status
SHOW SLAVE STATUS\G

-- Verify GTID configuration
SHOW VARIABLES LIKE 'gtid%';

-- Monitor binlog events (for debugging)
SHOW BINLOG EVENTS IN 'mysql-bin.000001' LIMIT 10;

MySQL-Specific CDC Challenges

Binlog Position Recovery Hell: When connectors fail, they often lose track of binlog position and you're fucked. With GTID enabled, recovery is slightly less painful:

{
  "gtid.source.includes": ".*",
  "database.server.id": "184054"
}

But even with GTID, I've seen connectors restart from hours-old positions after network hiccups. Monitor this religiously.

Large Transaction Disasters: Bulk imports create massive binlog events that crash connectors with OOM errors. Configure bigger buffer sizes or your connector will die:

{
  "binlog.buffer.size": 32768,
  "max.queue.size": 16000
}

Schema Evolution Nightmare:

  • ALTER TABLE statements lock tables and cause CDC lag that can take hours to catch up
  • Use pt-online-schema-change for large tables - it's complex as hell but prevents the disaster
  • Test every schema change with CDC running in staging - I've seen "simple" column additions break connectors in production

Connection Management Reality: MySQL's default max_connections = 151 is pathetic. CDC holds connections permanently, so plan accordingly:

SHOW PROCESSLIST;
SHOW VARIABLES LIKE 'max_connections';

Increase max_connections or applications will get Too many connections errors during peak load.

Performance Tuning for High-Volume MySQL CDC

For busy e-commerce databases, tune these settings:

[mysqld]
## Increased binlog cache for large transactions
binlog_cache_size = 4M
max_binlog_cache_size = 4G

## Faster binlog writes
binlog_group_commit_sync_delay = 0
binlog_group_commit_sync_no_delay_count = 0

## Connection tuning
max_connections = 500
thread_cache_size = 100

Connector tuning for high throughput:

{
  "max.queue.size": 32000,
  "max.batch.size": 4096,
  "poll.interval.ms": 500,
  "database.connectionTimeoutInMs": 60000
}

Disaster Recovery Patterns

When binlog files get deleted:

  1. Check if required binlog still exists: SHOW BINARY LOGS;
  2. If missing, reset to current position (loses data): "snapshot.mode": "schema_only"
  3. For critical data, restore from backup and replay

GTID-based recovery (preferred approach):

{
  "gtid.source.includes": ".*",
  "snapshot.mode": "when_needed"
}

Position-based recovery (fallback):

{
  "snapshot.mode": "initial",
  "binlog.filename": "mysql-bin.000047",  
  "binlog.position": "154832"
}

Monitoring and Alerting Setup

Critical MySQL CDC metrics to monitor:

  • Binlog file count and total size
  • Replication lag (seconds behind master)
  • Connection count for CDC user
  • Error rate in MySQL error log
  • Disk space usage for binlog directory

Prometheus monitoring queries:

- name: mysql_binlog_files_total
  help: Number of MySQL binlog files
  type: gauge
  query: "SELECT COUNT(*) FROM information_schema.binary_log_files"

- name: mysql_binlog_size_bytes  
  help: Total size of MySQL binlog files
  type: gauge
  query: "SELECT SUM(file_size) FROM information_schema.binary_log_files"

For complete MySQL CDC implementation details, see MySQL replication documentation, Debezium MySQL connector guide, MySQL binary log documentation, MySQL GTID concepts, Percona toolkit for schema changes, MySQL performance tuning guide, MySQL connection management, MySQL monitoring and diagnostics, MySQL CDC troubleshooting guide, and MySQL high availability setup.

MongoDB CDC: Change Streams and Oplog Configuration

MongoDB Logo

MongoDB change streams are way less painful than dealing with MySQL binlog bullshit, but don't let that fool you - they'll still bite you in production. Resume tokens work great until your oplog fills up and tokens expire during that "quick 30-minute maintenance" that somehow takes 4 hours.

MongoDB change streams are the cleanest CDC implementation you'll find - they're built into the database and designed for this exact use case. You open a cursor, watch for changes, get a resume token to pick up where you left off after failures. The catch? Tokens expire if your oplog doesn't retain enough history, and you're back to full snapshots.

MongoDB Configuration for CDC

mongod.conf settings for production change streams:

## Enable replica set (required for change streams)
replication:
  replSetName: "rs0"
  
## Oplog sizing (critical for change stream reliability)
storage:
  wiredTiger:
    engineConfig:
      configString: "eviction_target=80,eviction_trigger=95"
  oplogSizeMB: 10240  # 10GB minimum for production

## Journaling for durability
storage:
  journal:
    enabled: true
    commitIntervalMs: 100

## Network and connection settings
net:
  maxIncomingConnections: 1000
  compression:
    compressors: "snappy,zstd"

Debezium MongoDB Connector Configuration

Production connector setup:

{
  "name": "mongodb-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
    "tasks.max": "1",
    "mongodb.connection.string": "mongodb://debezium_user:password@mongo1:27017,mongo2:27017,mongo3:27017/admin?replicaSet=rs0&authSource=admin",
    "mongodb.name": "mongodb-server",
    "database.include.list": "ecommerce,analytics,logs",
    "collection.include.list": "ecommerce.orders,ecommerce.products,analytics.user_events",
    "snapshot.mode": "initial",
    "capture.mode": "change_streams_update_full",
    "field.exclude.list": "ecommerce.orders.internal_notes,logs.*.stack_trace",
    "heartbeat.interval.ms": 60000,
    "max.queue.size": 16000,
    "max.batch.size": 2048,
    "poll.interval.ms": 1000,
    "mongodb.ssl.enabled": "true",
    "mongodb.ssl.invalid.hostname.allowed": "false"
  }
}

Database User Setup and Authentication

Create MongoDB CDC user:

// Connect to admin database
use admin

// Create user with appropriate roles
db.createUser({
  user: "debezium_user",
  pwd: "secure_password", 
  roles: [
    { role: "clusterMonitor", db: "admin" },
    { role: "changeStreamPreAndPostImages", db: "admin" },
    { role: "read", db: "ecommerce" },
    { role: "read", db: "analytics" },
    { role: "read", db: "logs" },
    { role: "read", db: "local" }  // Required for oplog access
  ]
})

// Enable pre/post images for collections (MongoDB 6.0+)
db.runCommand({
  collMod: "ecommerce.orders",
  changeStreamPreAndPostImages: { enabled: true }
})

Change Stream Monitoring and Operations

Monitor change stream performance:

// Check oplog status and size
db.runCommand("replSetGetStatus").optimes

// Monitor oplog usage
db.oplog.rs.stats()

// Check current change stream cursors
db.runCommand("serverStatus").metrics.cursor

// Monitor replication lag across replica set members
rs.status().members.forEach(member => {
  print(`${member.name}: ${member.optime ? member.optime.ts : 'N/A'}`)
})

MongoDB-Specific CDC Patterns

Resume Token Hell: MongoDB change streams use resume tokens for fault tolerance, but tokens expire if your connector is down longer than the oplog window. Monitor token expiration or face data loss:

// Check oplog window for resume token validity
db.runCommand("replSetGetStatus").optimes.lastCommittedOpTime

I've seen resume tokens become invalid during weekend maintenance windows, forcing full re-snapshots of 500GB collections. Size your oplog accordingly.

Sharded Cluster Nightmare: Sharded deployments are a pain in the ass - you need to configure each shard separately and mongos routing adds latency:

{
  "mongodb.connection.string": "mongodb://mongos1:27017,mongos2:27017/admin",
  "mongodb.members.auto.discover": "false",
  "mongodb.connect.timeout.ms": 30000,
  "mongodb.server.selection.timeout.ms": 30000
}

I've seen shard rebalancing operations invalidate change streams right in the middle of processing 50GB of backlog, forcing full connector restarts during peak traffic. MongoDB's "seamless" balancing isn't so seamless when you're trying to maintain real-time data sync.

Pre and Post Images: Enable full document capture for updates:

// Enable for specific collections
db.runCommand({
  collMod: "orders", 
  changeStreamPreAndPostImages: { enabled: true }
})

Performance Optimization for MongoDB CDC

Oplog Sizing Strategy: Size oplog to maintain at least 24 hours of changes:

// Calculate oplog size requirements
db.oplog.rs.find().limit(1).sort({$natural: -1})  // Latest
db.oplog.rs.find().limit(1).sort({$natural: 1})   // Oldest

// Resize oplog (requires MongoDB 3.6+)
db.adminCommand({replSetResizeOplog: 1, size: 20480})  // 20GB

Connection Pool Tuning:

## MongoDB connection settings
net:
  maxIncomingConnections: 2000
  serviceExecutor: adaptive
  
## WiredTiger cache sizing
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 32  # 50-60% of available RAM

Index Strategy for Change Streams: Ensure proper indexing for filtered change streams:

// Index on frequently filtered fields
db.orders.createIndex({ "status": 1, "createdAt": 1 })
db.user_events.createIndex({ "userId": 1, "eventType": 1 })

Handling MongoDB-Specific Challenges

Resume Token Expiration: When change streams fall behind, resume tokens become invalid:

{
  "snapshot.mode": "when_needed",
  "mongodb.change.stream.full.document": "updateLookup"
}

Large Document Issues: MongoDB's 16MB document limit can cause problems:

{
  "field.exclude.list": "logs.*.large_payload,events.*.raw_data",
  "max.queue.size": 32000
}

Transaction Support: MongoDB 4.0+ multi-document transactions work with change streams:

{
  "capture.mode": "change_streams_update_full",
  "mongodb.change.stream.full.document": "updateLookup"
}

Disaster Recovery for MongoDB CDC

Change Stream Failure Recovery:

  1. Resume token still valid: Connector resumes automatically
  2. Resume token expired: Falls back to timestamp-based resume
  3. Complete failure: Restart with snapshot.mode: "initial"

Oplog Recovery Strategies:

// Check oplog retention window
const oldestOpTime = db.oplog.rs.find().sort({$natural: 1}).limit(1).next().ts
const newestOpTime = db.oplog.rs.find().sort({$natural: -1}).limit(1).next().ts
print(`Oplog window: ${newestOpTime.getTime() - oldestOpTime.getTime()}ms`)

Monitoring and Alerting for MongoDB CDC

Key metrics to track:

  • Oplog utilization percentage
  • Change stream cursor count and memory usage
  • Resume token age and validity
  • Replication lag across replica set members
  • Connection count for CDC users

Prometheus monitoring setup:

## MongoDB exporter configuration
mongodb_exporter:
  - job_name: 'mongodb'
    static_configs:
      - targets: ['mongo1:9216', 'mongo2:9216', 'mongo3:9216']
    metrics_path: /metrics
    params:
      collect[]: [oplog, replset_status, top]

Critical alerts:

  • Oplog usage > 80%
  • Replication lag > 10 seconds
  • Change stream cursor count > 1000
  • Resume token age > 1 hour

Advanced MongoDB CDC Patterns

Filtered Change Streams for performance:

{
  "mongodb.change.stream.pipeline": "[{\"$match\": {\"operationType\": {\"$in\": [\"insert\", \"update\"]}, \"fullDocument.status\": \"active\"}}]"
}

Multi-Collection Aggregation:

{
  "collection.include.list": "ecommerce.orders,ecommerce.order_items",
  "transforms": "flatten",
  "transforms.flatten.type": "io.debezium.transforms.ExtractNewRecordState"
}

For comprehensive MongoDB CDC implementation, see MongoDB change streams documentation, Debezium MongoDB connector reference, MongoDB oplog sizing guide, MongoDB replica set configuration, MongoDB change streams best practices, MongoDB sharding considerations, MongoDB performance optimization, MongoDB monitoring setup, MongoDB CDC with event-driven architecture, and MongoDB production deployment guide.

Database Platform CDC Comparison: Configuration Complexity and Production Reality

Platform

CDC Method

Configuration Complexity

Performance Impact

Failure Recovery

Production Readiness

PostgreSQL

Logical replication (WAL)

Medium

  • WAL tuning required

Low (1-3% overhead)

Good

  • replication slots survive restarts

Excellent

  • most reliable

MySQL

Binary log parsing

High

  • binlog position tracking

Medium (3-8% overhead)

Poor

  • position gets lost easily

Good

  • requires careful ops

MongoDB

Change streams/Oplog

Low

  • native change streams

Low (2-5% overhead)

Excellent

  • resume tokens

Excellent

  • cleanest API

SQL Server

Built-in CDC or log reader

Medium

  • CDC must be enabled

Medium (5-10% overhead)

Good

  • transaction log based

Good

  • enterprise features

Oracle

LogMiner or GoldenGate

Very High

  • complex licensing

Variable (2-15% overhead)

Excellent

  • enterprise tooling

Excellent

  • if you can afford it

SQL Server and Oracle CDC: Enterprise Database Implementation

SQL Server Logo

SQL Server CDC Configuration

SQL Server CDC works incredibly well if you don't mind babysitting SQL Agent jobs and dealing with Microsoft's licensing maze. Oracle works even better if you can afford to spend more on licensing than most companies spend on their entire infrastructure stack.

SQL Server CDC uses the transaction log to track changes and stores them in system tables that you'll need to babysit. There are capture and cleanup jobs that you'll spend time tuning, because Microsoft's defaults are garbage for production workloads.

Enable CDC on Database and Tables:

-- Enable CDC on database (requires sysadmin role)
USE [ProductionDB]
EXEC sys.sp_cdc_enable_db

-- Enable CDC on specific tables
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'Orders',
    @role_name = N'CDC_Reader',
    @supports_net_changes = 1

-- Verify CDC is enabled
SELECT name, is_cdc_enabled 
FROM sys.databases 
WHERE name = 'ProductionDB'

SELECT name, is_tracked_by_cdc 
FROM sys.tables 
WHERE is_tracked_by_cdc = 1

Debezium SQL Server Connector Configuration:

{
  \"name\": \"sqlserver-cdc-connector\",
  \"config\": {
    \"connector.class\": \"io.debezium.connector.sqlserver.SqlServerConnector\",
    \"tasks.max\": \"1\",
    \"database.hostname\": \"sqlserver.internal\",
    \"database.port\": \"1433\", 
    \"database.user\": \"debezium_user\",
    \"database.password\": \"${file:/opt/kafka/secrets/sqlserver-password.txt:password}\",
    \"database.names\": \"ProductionDB\",
    \"database.server.name\": \"sqlserver-prod\",
    \"table.include.list\": \"dbo.Orders,dbo.Products,dbo.Customers\",
    \"database.history.kafka.topic\": \"schema-changes.sqlserver\",
    \"database.history.kafka.bootstrap.servers\": \"kafka:9092\",
    \"database.encrypt\": \"true\",
    \"database.trustServerCertificate\": \"true\",
    \"snapshot.mode\": \"initial\",
    \"max.queue.size\": 16000,
    \"max.batch.size\": 2048,
    \"poll.interval.ms\": 1000
  }
}

CDC User Setup:

-- Create login and user for CDC
CREATE LOGIN debezium_user WITH PASSWORD = 'SecurePassword123!';
USE [ProductionDB];
CREATE USER debezium_user FOR LOGIN debezium_user;

-- Grant necessary permissions
ALTER ROLE db_owner ADD MEMBER debezium_user;
-- Or more restrictive:
-- GRANT SELECT ON SCHEMA::dbo TO debezium_user;
-- EXEC sp_addrolemember 'CDC_Reader', 'debezium_user';

Critical SQL Server CDC Monitoring:

-- Monitor CDC cleanup job status
SELECT job_id, name, enabled, date_created, date_modified
FROM msdb.dbo.sysjobs 
WHERE name LIKE 'cdc%cleanup%'

-- Check CDC log scan job
SELECT job_id, name, enabled 
FROM msdb.dbo.sysjobs 
WHERE name LIKE 'cdc%capture%'

-- Monitor change table sizes
SELECT 
    t.name AS table_name,
    ct.name AS change_table_name,
    SUM(p.rows) AS row_count,
    SUM(a.used_pages) * 8 / 1024 AS size_mb
FROM sys.tables t
INNER JOIN cdc.change_tables ct ON t.object_id = ct.source_object_id
INNER JOIN sys.partitions p ON ct.object_id = p.object_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY t.name, ct.name
ORDER BY size_mb DESC

Oracle CDC with Debezium LogMiner

Oracle Logo

Oracle CDC works incredibly well if you don't mind spending more on licensing than most companies spend on their entire infrastructure stack. LogMiner requires Oracle Enterprise Edition at $47,500 per processor - yes, you read that right. I've seen companies discover they need GoldenGate licensing after deploying "free" LogMiner-based CDC, leading to surprise $200K licensing bills during compliance audits. Oracle's sales team is very good at finding these situations.

Oracle Database Configuration:

-- Enable supplemental logging (required for CDC)
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- Enable archive log mode
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;

-- Set retention policy for redo logs
ALTER SYSTEM SET log_archive_dest_1='LOCATION=/oracle/archive' SCOPE=BOTH;

-- Create CDC user with privileges
CREATE USER debezium_user IDENTIFIED BY SecurePassword123;
GRANT CONNECT TO debezium_user;
GRANT CREATE SESSION TO debezium_user;
GRANT SELECT ON V_$DATABASE TO debezium_user;
GRANT SELECT ON V_$LOGFILE TO debezium_user;
GRANT SELECT ON V_$LOG TO debezium_user;
GRANT SELECT ON V_$ARCHIVED_LOG TO debezium_user;
GRANT SELECT_CATALOG_ROLE TO debezium_user;

Debezium Oracle Connector Configuration:

{
  \"name\": \"oracle-cdc-connector\",
  \"config\": {
    \"connector.class\": \"io.debezium.connector.oracle.OracleConnector\", 
    \"tasks.max\": \"1\",
    \"database.hostname\": \"oracle.internal\",
    \"database.port\": \"1521\",
    \"database.user\": \"debezium_user\",
    \"database.password\": \"${file:/opt/kafka/secrets/oracle-password.txt:password}\",
    \"database.dbname\": \"ORCL\",
    \"database.server.name\": \"oracle-server\",
    \"table.include.list\": \"INVENTORY.ORDERS,INVENTORY.PRODUCTS\",
    \"database.history.kafka.topic\": \"schema-changes.oracle\", 
    \"database.history.kafka.bootstrap.servers\": \"kafka:9092\",
    \"log.mining.strategy\": \"online_catalog\",
    \"log.mining.continuous.mine\": \"true\",
    \"log.mining.session.max.ms\": 60000,
    \"snapshot.mode\": \"initial\",
    \"max.queue.size\": 16000,
    \"max.batch.size\": 2048
  }
}

Oracle CDC Monitoring Queries:

-- Check archive log generation rate
SELECT TO_CHAR(first_time,'YYYY-MM-DD HH24') as hour,
       COUNT(*) as log_count,
       SUM(blocks * block_size)/1024/1024 as mb_generated
FROM v$archived_log 
WHERE first_time > SYSDATE - 1
GROUP BY TO_CHAR(first_time,'YYYY-MM-DD HH24')
ORDER BY hour;

-- Monitor current redo log status
SELECT group#, thread#, status, bytes/1024/1024 as size_mb, members
FROM v$log
ORDER BY group#;

-- Check supplemental logging status  
SELECT supplemental_log_data_min, supplemental_log_data_pk,
       supplemental_log_data_all, supplemental_log_data_ui
FROM v$database;

Enterprise CDC Deployment Patterns

High Availability Setup for SQL Server:

-- Configure Always On Availability Groups for CDC
-- Primary replica handles CDC capture
-- Secondary replicas can be used for read-only CDC consumers

ALTER AVAILABILITY GROUP [AG-ProductionDB]
ADD DATABASE [ProductionDB];

-- Enable CDC on secondary replica (read-only)
ALTER DATABASE [ProductionDB] SET READ_ONLY;

Oracle RAC Configuration:

-- In Oracle RAC, CDC should connect to specific instances
-- Configure service names for consistent connection routing
BEGIN
  DBMS_SERVICE.CREATE_SERVICE(
    service_name => 'CDC_SERVICE',
    network_name => 'CDC_SERVICE',
    failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT,
    failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC
  );
END;
/

Performance Optimization for Enterprise Databases

SQL Server CDC Performance Tuning:

-- Tune CDC cleanup job frequency  
EXEC sys.sp_cdc_change_job 
    @job_type = N'cleanup',
    @retention = 4320,  -- 3 days in minutes
    @threshold = 5000;

-- Monitor CDC performance impact
SELECT 
    db_name(database_id) as database_name,
    object_name(object_id) as object_name,
    avg_fragmentation_in_percent,
    page_count
FROM sys.dm_db_index_physical_stats(DB_ID('ProductionDB'), NULL, NULL, NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 30
ORDER BY avg_fragmentation_in_percent DESC;

Oracle CDC Performance Settings:

-- Increase redo log size for high-volume systems  
ALTER DATABASE ADD LOGFILE GROUP 4 SIZE 1024M;
ALTER DATABASE ADD LOGFILE GROUP 5 SIZE 1024M;
ALTER DATABASE ADD LOGFILE GROUP 6 SIZE 1024M;

-- Configure log buffer size
ALTER SYSTEM SET log_buffer = 268435456 SCOPE=SPFILE;  -- 256MB

-- Monitor LogMiner performance
SELECT name, value 
FROM v$sysstat 
WHERE name LIKE '%redo%' OR name LIKE '%log%'
ORDER BY name;

Enterprise CDC Operational Considerations

Backup and Recovery Impact:

  • SQL Server: CDC change tables must be included in backup strategy
  • Oracle: Archive log retention affects CDC resume capability
  • Both: Test CDC recovery after database restores

Licensing Shock Reality:

  • Oracle: LogMiner requires Enterprise Edition ($47,500 per processor) - I've seen $300K+ annual bills for modest Oracle CDC setups
  • SQL Server: CDC requires Standard Edition minimum ($1,859 per core) - much more reasonable than Oracle
  • GoldenGate: Additional $17,500 per processor for real-time features - Oracle's sales team will push this hard

Security and Compliance:

-- SQL Server: Audit CDC access
CREATE SERVER AUDIT [CDC_Audit]
TO FILE (FILEPATH = 'C:\Audit\', MAXSIZE = 1024 MB);

ALTER SERVER AUDIT [CDC_Audit] WITH (STATE = ON);

CREATE DATABASE AUDIT SPECIFICATION [CDC_Database_Audit] 
FOR SERVER AUDIT [CDC_Audit]
ADD (SELECT ON SCHEMA::cdc BY debezium_user);

Disaster Recovery Testing:

  • Document CDC-specific recovery procedures
  • Test resume capability after prolonged outages
  • Validate data consistency after failover scenarios
  • Maintain runbooks for emergency CDC restart procedures

Cost-Benefit Analysis for Enterprise CDC

SQL Server CDC Total Cost (3 years):

  • Licensing: $200K-500K (depends on core count)
  • Infrastructure: $150K-300K (HA setup)
  • Operations: $400K-600K (DBA time)
  • Total: $750K-1.4M

Oracle CDC Total Cost (3 years):

  • Database licensing: $500K-1.2M (Enterprise Edition)
  • GoldenGate licensing: $200K-500K
  • Infrastructure: $300K-600K (RAC setup)
  • Operations: $600K-900K (Oracle DBA expertise)
  • Total: $1.6M-3.2M

ROI Considerations:

  • Regulatory compliance value (avoiding fines)
  • Business intelligence acceleration
  • Operational efficiency gains
  • Risk mitigation (data consistency)

Enterprise databases offer robust CDC capabilities but require significant investment in licensing, expertise, and infrastructure. Evaluate total cost of ownership including licensing, operational complexity, and required skill sets before committing to enterprise database CDC solutions.

For enterprise CDC implementation resources, see SQL Server CDC documentation, Debezium SQL Server connector guide, SQL Server monitoring queries, Oracle GoldenGate documentation, Debezium Oracle connector, Oracle LogMiner guide, Oracle supplemental logging setup, SQL Server performance tuning, Oracle RAC with CDC, and enterprise CDC cost analysis guide.

Database Platform CDC FAQ: Real Questions from Production Deployments

Q

Which database platform should I choose for CDC?

A

Stick with what you already have and configure it properly.

Switching databases just for CDC is rarely worth the migration effort and risk. If you're starting fresh:

  • PostgreSQL:

Best balance of reliability, features, and operational complexity

  • MongoDB: Cleanest CDC experience with change streams
  • MySQL:

Only if you already have deep MySQL expertise

  • SQL Server/Oracle: If you need enterprise features and have the budget Performance isn't everything: A well-configured MySQL setup often outperforms a poorly-configured PostgreSQL setup. Focus on operational expertise over theoretical benchmarks.
Q

How do I size PostgreSQL WAL for CDC?

A

Calculate based on peak write volume, not average:

-- Monitor WAL generation rate during peak hours
SELECT 
  date_trunc('hour', now()) as hour,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), 
    lag(pg_current_wal_lsn()) OVER (ORDER BY date_trunc('hour', now())))
  ) as wal_generated_per_hour
FROM generate_series(now() - interval '24 hours', now(), interval '1 hour');

Sizing formula: max_slot_wal_keep_size = (peak_wal_generation_per_hour × 8 hours)

Example: If you generate 2GB/hour during peak, set max_slot_wal_keep_size = 16GB

Monitor with alerts:

-- Alert when WAL lag exceeds 5GB
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) 
FROM pg_replication_slots 
WHERE pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 5368709120;
Q

Why does MySQL binlog CDC break after database restarts?

A

Because MySQL's binlog position tracking is garbage. Without GTID enabled, Debezium tracks binlog file names and positions that become completely useless after rotation. It's like trying to navigate with a map that changes every time you look at it.

Enable GTID properly:

-- Current MySQL session
SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON;
SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;
SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;
SET @@GLOBAL.GTID_MODE = ON;

-- Make permanent in my.cnf
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
log_bin = mysql-bin
log_slave_updates = ON

Verify GTID is working:

SHOW MASTER STATUS;  -- Should show GTID coordinates
SHOW VARIABLES LIKE 'gtid_mode';  -- Should be 'ON'

For existing CDC without GTID: You're fucked. You'll need to restart from a fresh snapshot because there's no clean upgrade path. Yes, this means downtime and a full re-snapshot of potentially terabytes of data.

Q

How long should I keep MongoDB oplog for CDC?

A

Minimum 24 hours of operations, but size for your longest expected outage plus buffer.

Calculate required oplog size:

// Check current oplog usage
db.oplog.rs.stats()

// Monitor oplog generation rate
const start = db.oplog.rs.find().sort({$natural: -1}).limit(1).next().ts
// Wait 1 hour...
const end = db.oplog.rs.find().sort({$natural: -1}).limit(1).next().ts
const hourlyGrowthMB = (end.getTime() - start.getTime()) / 1000 / 60 / 60 * avgOplogSizeMB

print(`Hourly oplog growth: ${hourlyGrowthMB} MB`)
print(`Recommended size for 48h retention: ${hourlyGrowthMB * 48} MB`)

Resize oplog if needed:

// Resize to 20GB (requires MongoDB 3.6+)
db.adminCommand({replSetResizeOplog: 1, size: 20480})

Monitor oplog health:

// Check oplog window (oldest to newest)
const stats = db.oplog.rs.stats()
const window = stats.maxSize / stats.storageSize * (stats.lastWriteTimestamp - stats.firstWriteTimestamp)
print(`Oplog retention window: ${window / 1000 / 60 / 60} hours`)
Q

What's the best practice for CDC schema changes?

A

Test every schema change with CDC running in staging. No exceptions.

Safe changes (usually work):

  • Adding nullable columns
  • Adding new indexes
  • Increasing column sizes (VARCHAR(50) → VARCHAR(100))

Dangerous changes (often break CDC):

  • Renaming columns
  • Dropping columns
  • Changing data types
  • Adding NOT NULL columns without defaults

Schema change procedure:

  1. Test in staging with actual CDC load:

    -- PostgreSQL: Test with logical replication running
    ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 1;
    
    -- Monitor CDC lag during and after change
    SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn))
    FROM pg_replication_slots;
    
  2. Plan for connector restarts: Most schema changes require restarting connectors

  3. Have rollback plan: Be prepared to drop and recreate connectors if needed

  4. Use database-specific safe change tools:

    • MySQL: pt-online-schema-change
    • PostgreSQL: Schema changes during low-traffic periods
    • MongoDB: Schema changes are usually safe (document-based)
Q

How do I handle CDC during database maintenance?

A

Pause connectors before maintenance to prevent WAL/binlog accumulation:

## Pause all CDC connectors
curl -X PUT localhost:8083/connectors/postgres-connector/pause
curl -X PUT localhost:8083/connectors/mysql-connector/pause

## Wait for lag to drop to near zero
## PostgreSQL:
psql -c \"SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) FROM pg_replication_slots;\"

## MySQL:
mysql -e \"SHOW MASTER STATUS; SHOW SLAVE STATUS\G\"

## Perform maintenance...

## Resume connectors
curl -X PUT localhost:8083/connectors/postgres-connector/resume
curl -X PUT localhost:8083/connectors/mysql-connector/resume

For long maintenance windows (>4 hours):

  • PostgreSQL: May exceed max_slot_wal_keep_size, requiring fresh snapshots
  • MySQL: Binlog files may get purged, requiring position reset
  • MongoDB: Resume tokens may expire, falling back to timestamp-based resume

Plan for snapshot time: Large tables may take hours to re-snapshot after long outages.

Q

Why is my CDC lag higher than expected?

A

Common causes and solutions:

Network latency between components:

## Test latency between CDC components
ping -c 10 kafka-broker.internal
traceroute postgres-server.internal

Undersized connectors:

{
  \"max.queue.size\": 32000,    // Increase from default 8192
  \"max.batch.size\": 4096,     // Increase from default 2048
  \"poll.interval.ms\": 500     // Decrease from default 1000
}

Database-specific bottlenecks:

  • PostgreSQL: WAL generation faster than network can transfer
  • MySQL: Binlog I/O contention with regular queries
  • MongoDB: Oplog read contention with replica set sync

Downstream processing bottlenecks:

## Check Kafka consumer lag
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group connect-postgres-connector

## Monitor connector task status
curl -s localhost:8083/connectors/postgres-connector/status | jq '.tasks[].state'
Q

How do I recover from CDC failures?

A

Recovery strategy depends on how long CDC was down and how badly you're screwed:

Short outages (< 1 hour):

  • Connectors usually resume automatically
  • Monitor lag and let it catch up naturally
  • Check for any error messages in connector logs

Medium outages (1-8 hours):

  • PostgreSQL: WAL should still be available, resume normally
  • MySQL: Check if binlog files still exist, may need position reset
  • MongoDB: Resume tokens likely still valid

Long outages (> 8 hours):

  • PostgreSQL: May have exceeded max_slot_wal_keep_size, need fresh snapshot
  • MySQL: Binlog files probably purged, definitely need position reset
  • MongoDB: Resume tokens expired, will fall back to timestamp-based resume

Nuclear option (when everything's completely fucked):

  1. Stop all connectors (burn it all down):

    curl -X DELETE localhost:8083/connectors/postgres-connector
    
  2. Clean up database artifacts (nuke from orbit):

    -- PostgreSQL: Drop replication slot
    SELECT pg_drop_replication_slot('debezium');
    
    -- SQL Server: Disable and re-enable CDC
    EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'orders'
    EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders'
    
  3. Create fresh connectors with \"snapshot.mode\": \"initial\" and pray the re-snapshot doesn't take 18 hours

Data consistency check after recovery:

-- Compare row counts between source and target
SELECT COUNT(*) FROM source_table WHERE updated_at > '2025-09-03';
SELECT COUNT(*) FROM target_table WHERE updated_at > '2025-09-03';
Q

What's the most reliable CDC database platform?

A

PostgreSQL is the most reliable for CDC in production:

  • WAL-based replication is mature and battle-tested
  • Replication slots survive database restarts
  • Logical replication handles most schema changes gracefully
  • Good monitoring with standard SQL queries
  • Extensive documentation and community knowledge

MongoDB is second for reliability:

  • Change streams are designed for CDC from the ground up
  • Resume tokens provide excellent fault tolerance
  • Native integration with minimal configuration
  • Schema flexibility handles changes naturally

MySQL requires more operational overhead:

  • Binlog position tracking is fragile
  • GTID setup is complex but necessary
  • Schema changes can break replication
  • More failure modes than PostgreSQL

Oracle/SQL Server are enterprise-grade but complex:

  • Excellent features but high operational complexity
  • Expensive licensing and specialized expertise required
  • Robust but overkill for most use cases

Bottom line: Use PostgreSQL unless you have a gun to your head. The operational simplicity is worth way more than whatever theoretical performance advantages the other platforms claim. Trust me - you'll thank yourself when you're not debugging MySQL binlog position corruption at 3am.

Essential Database-Specific CDC Resources

Related Tools & Recommendations

compare
Recommended

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

Skip the bullshit. Here's what breaks in production.

PostgreSQL
/compare/postgresql/mysql/mongodb/cassandra/comprehensive-database-comparison
100%
tool
Similar content

CDC Enterprise Implementation Guide: Real-World Challenges & Solutions

I've implemented CDC at 3 companies. Here's what actually works vs what the vendors promise.

Change Data Capture (CDC)
/tool/change-data-capture/enterprise-implementation-guide
91%
tool
Similar content

Change Data Capture (CDC) Explained: Production & Debugging

Discover Change Data Capture (CDC): why it's essential, real-world production insights, performance considerations, and debugging tips for tools like Debezium.

Change Data Capture (CDC)
/tool/change-data-capture/overview
83%
tool
Similar content

CDC Tool Selection Guide: Pick the Right Change Data Capture

I've debugged enough CDC disasters to know what actually matters. Here's what works and what doesn't.

Change Data Capture (CDC)
/tool/change-data-capture/tool-selection-guide
81%
tool
Similar content

Change Data Capture (CDC) Integration Patterns for Production

Set up CDC at three companies. Got paged at 2am during Black Friday when our setup died. Here's what keeps working.

Change Data Capture (CDC)
/tool/change-data-capture/integration-deployment-patterns
81%
tool
Similar content

Change Data Capture (CDC) Troubleshooting Guide: Fix Common Issues

I've debugged CDC disasters at three different companies. Here's what actually breaks and how to fix it.

Change Data Capture (CDC)
/tool/change-data-capture/troubleshooting-guide
75%
tool
Similar content

Change Data Capture (CDC) Performance Optimization Guide

Demo worked perfectly. Then some asshole ran a 50M row import at 2 AM Tuesday and took down everything.

Change Data Capture (CDC)
/tool/change-data-capture/performance-optimization-guide
71%
howto
Recommended

MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide

Migrate MySQL to PostgreSQL without destroying your career (probably)

MySQL
/howto/migrate-mysql-to-postgresql-production/mysql-to-postgresql-production-migration
69%
howto
Recommended

I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too

Four Months of Pain, 47k Lost Sessions, and What Actually Works

MongoDB
/howto/migrate-mongodb-to-postgresql/complete-migration-guide
69%
tool
Recommended

MySQL Workbench - Oracle's Official MySQL GUI (That Eats Your RAM)

Free MySQL desktop app that tries to do everything and mostly succeeds at pissing you off

MySQL Workbench
/tool/mysql-workbench/overview
69%
integration
Recommended

Fix Your Slow-Ass Laravel + MySQL Setup

Stop letting database performance kill your Laravel app - here's how to actually fix it

MySQL
/integration/mysql-laravel/overview
69%
troubleshoot
Recommended

Fix MySQL Error 1045 Access Denied - Real Solutions That Actually Work

Stop fucking around with generic fixes - these authentication solutions are tested on thousands of production systems

MySQL
/troubleshoot/mysql-error-1045-access-denied/authentication-error-solutions
69%
tool
Recommended

Apache Kafka - The Distributed Log That LinkedIn Built (And You Probably Don't Need)

integrates with Apache Kafka

Apache Kafka
/tool/apache-kafka/overview
67%
tool
Similar content

Change Data Capture (CDC) Skills, Career & Team Building

The missing piece in your CDC implementation isn't technical - it's finding people who can actually build and maintain these systems in production without losin

Debezium
/tool/change-data-capture/cdc-skills-career-development
62%
tool
Similar content

CDC Security & Compliance Guide: Protect Your Data Pipelines

I've seen CDC implementations fail audits, leak PII, and violate GDPR. Here's how to secure your change data capture without breaking everything.

Change Data Capture (CDC)
/tool/change-data-capture/security-compliance-guide
58%
tool
Similar content

ClickHouse Overview: Analytics Database Performance & SQL Guide

When your PostgreSQL queries take forever and you're tired of waiting

ClickHouse
/tool/clickhouse/overview
56%
tool
Similar content

MySQL Workbench Performance Fixes: Crashes, Slowdowns, Memory

Stop wasting hours on crashes and timeouts - actual solutions for MySQL Workbench's most annoying performance problems

MySQL Workbench
/tool/mysql-workbench/fixing-performance-issues
54%
compare
Similar content

MongoDB vs. PostgreSQL vs. MySQL: 2025 Performance Benchmarks

Dive into real-world 2025 performance benchmarks for MongoDB, PostgreSQL, and MySQL. Discover which database truly excels under load for reads and writes, beyon

/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
52%
review
Similar content

Database Benchmark 2025: PostgreSQL, MySQL, MongoDB Review

Real-World Testing of PostgreSQL 17, MySQL 9.0, MongoDB 8.0 and Why Most Benchmarks Are Bullshit

/review/database-performance-benchmark/comprehensive-analysis
52%
howto
Similar content

Zero Downtime Database Migration: 2025 Tools That Actually Work

Stop Breaking Production - New Tools That Don't Suck

AWS Database Migration Service (DMS)
/howto/database-migration-zero-downtime/modern-tools-2025
50%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization