Currently viewing the AI version
Switch to human version

MySQL Replication: AI-Optimized Technical Reference

Overview

MySQL replication copies data from source servers to replica servers for high availability and read scaling. Critical for preventing data loss during hardware failures.

Configuration

Essential Binary Log Settings

-- Production-ready binlog configuration
log-bin = mysql-bin
binlog_format = ROW              # CRITICAL: Prevents non-deterministic replication issues
sync_binlog = 1                  # Ensures durability, reduces write performance by ~50%
binlog_expire_logs_seconds = 2592000  # 30 days retention minimum
max_binlog_size = 1G            # Prevents oversized binlog files

Critical Failure Point: Statement-based replication fails with functions like NOW() and RAND() producing different results on replicas.

GTID Configuration (Recommended)

gtid_mode = ON
enforce_gtid_consistency = ON
log_replica_updates = ON
replica_preserve_commit_order = ON

Migration Warning: Cannot enable GTID directly. Requires staged approach: OFF_PERMISSIVEON_PERMISSIVEON. Direct activation corrupts replication topology.

Parallel Replication Optimization

replica_parallel_workers = 4-8     # Sweet spot for most workloads
replica_parallel_type = LOGICAL_CLOCK
replica_preserve_commit_order = ON

Performance Reality:

  • More than 8 workers = coordination overhead kills performance
  • Hot-spot tables force single-threaded processing regardless of worker count
  • 32 workers can increase lag from 5 seconds to 3 minutes

Replication Types

Traditional Source-Replica

Characteristics:

  • One source handles writes, replicas handle reads
  • Single-threaded SQL thread by default (major bottleneck)
  • Reliable but can lag during large transactions

Failure Scenario: 10M row UPDATE blocks replication for hours while changes queue up.

Semi-Synchronous Replication

Trade-offs:

  • Source waits for replica acknowledgment before commit
  • Adds network latency to every write operation
  • Falls back to async mode during network issues WITHOUT notification

Configuration:

-- Source
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = 1;
SET GLOBAL rpl_semi_sync_source_timeout = 1000;  # 1 second timeout

-- Replica
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = 1;

Critical Monitoring: Rpl_semi_sync_source_status - shows when system falls back to async mode.

Group Replication

Use Cases: Multi-master setups requiring strong consistency
Reality Check:

  • Works well in controlled environments with sub-millisecond latency
  • Fails spectacularly during network partitions
  • Uses Paxos consensus - minority partitions become read-only
  • Performance degrades significantly above 5 nodes
  • Facebook, GitHub, YouTube don't use this for critical systems

Latency Impact: 1ms commits become 50ms commits with 10ms inter-node latency.

Production Monitoring

Critical Metrics

SHOW REPLICA STATUS\G

Key Fields:

  • Seconds_Behind_Master: Lag measurement (lies when SQL thread stopped)
  • Replica_IO_Running: Should be "Yes" (downloads binlogs)
  • Replica_SQL_Running: Should be "Yes" (applies changes)
  • Last_Error: Actual error messages
  • Executed_Gtid_Set: Transaction completion status (GTID only)

Monitoring Lies: Seconds_Behind_Master shows 0 when:

  • SQL thread is stopped (you're broken, not caught up)
  • Source is idle (doesn't reflect actual lag)
  • Parallel replication worker delays aren't accounted for

Better Monitoring Tools

  • Percona Monitoring and Management (PMM): Professional dashboards
  • pt-heartbeat: Accurate lag measurements
  • Orchestrator: Automated failover management
  • Custom GTID_SUBSET() checks: Compare executed transactions

Failover Procedures

Automated Failover (Orchestrator)

Timeline: 10-30 seconds for promotion
Requirements: GTID enabled, proper network connectivity
Failure Points: Network partitions can trigger unnecessary failovers

Manual Failover Checklist

  1. Stop writes to failed source immediately
  2. Identify most current replica via Executed_Gtid_Set
  3. STOP REPLICA; RESET REPLICA ALL; on promoted replica
  4. Point remaining replicas to new source
  5. Update application connection strings
  6. Verify write functionality

Time Budget: 2-5 minutes if prepared, 2+ hours if not.

Backup Strategy

Replica-Based Backups

# Percona XtraBackup from replica
xtrabackup --backup --target-dir=/backup/$(date +%Y%m%d) \
  --host=replica-server --user=backup --password=secret \
  --slave-info  # Captures replication position

Critical Gotcha: Backup age equals replica lag. 2-hour lagged replica = backup missing 2 hours of data.

Recovery Requirements:

  • Minimum 7-day binlog retention (binlog_expire_logs_seconds = 604800)
  • Consistent retention across all servers
  • Point-in-time recovery impossible if binlogs purged before replica catchup

Security Implementation

Replication User Setup

CREATE USER 'repl'@'replica-server' IDENTIFIED BY 'secure_password' REQUIRE SSL;
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'replica-server';

-- Replica configuration
CHANGE REPLICATION SOURCE TO 
  SOURCE_USER='repl',
  SOURCE_PASSWORD='secure_password',
  SOURCE_SSL=1,
  SOURCE_SSL_VERIFY_SERVER_CERT=1;

MySQL 8.4 Compatibility Issue: mysql_native_password disabled by default. Old replicas require explicit authentication method specification.

Common Failure Scenarios

Disk Space Exhaustion

Frequency: 90% of replication failures
Symptoms: Replication stops, Last_Error shows disk space issues
Resolution: Clean old binlogs, monitor disk usage continuously

Network Connectivity Issues

Symptoms: Replica_IO_Running = No
Diagnosis: Check firewall rules, DNS resolution, network latency
Prevention: Use private networks, avoid public internet for replication

Binary Log Corruption

Severity: High - requires restore from backup
Causes: Unclean shutdowns, storage failures
Prevention: sync_binlog = 1 (performance cost: ~50% write throughput)

Performance Characteristics

Write Performance Impact

  • Async replication: No impact on source
  • Semi-sync replication: 5-10% reduction
  • Group replication: 10-20% reduction
  • Parallel replication: Dependent on workload conflicts

Latency Expectations

  • Well-configured setup: 100-500ms lag
  • Default MySQL settings: 10-60 seconds lag
  • Misconfigured systems: Hours of lag

Resource Requirements

Hardware Specifications

  • CPU: Parallel replication benefits from 4-8 cores
  • Memory: Binlog cache sizing critical for write-heavy workloads
  • Storage: Fast disks for binlog writes, sufficient space for retention
  • Network: Low-latency connections for semi-sync and Group Replication

Operational Expertise

  • Setup complexity: Low (traditional) to High (Group Replication)
  • Troubleshooting difficulty: Medium to High during network issues
  • Time investment: 2-4 weeks to master, ongoing maintenance overhead

Decision Matrix

Requirement Recommended Solution Alternative Avoid
Read scaling Traditional async Semi-sync Group Replication
Zero data loss Semi-sync Group Replication Async only
Multi-master writes Application-level sharding Group Replication Multi-source
Automatic failover Orchestrator + GTID Manual procedures Position-based
Cross-datacenter Traditional with monitoring Cloud managed Group Replication

Critical Warnings

What Documentation Doesn't Tell You

  • Group Replication performance degrades significantly in real-world network conditions
  • Semi-sync silently falls back to async during network issues
  • Parallel replication coordination overhead can worsen performance
  • MySQL defaults are optimized for 2005 hardware and traffic patterns

Breaking Points

  • 1000+ concurrent connections: Connection handling becomes bottleneck
  • >10ms network latency: Group Replication becomes unusable
  • Hot-spot tables: Parallel replication reverts to single-threaded
  • Binlog retention <7 days: Point-in-time recovery impossible

Production Reality Checks

  • Companies with highest MySQL scale use traditional replication, not fancy features
  • Network issues cause more replication problems than MySQL bugs
  • Monitoring replication lag isn't enough - monitor lag measurement reliability
  • Automated failover tools require as much testing as manual procedures

Support and Community Quality

Enterprise vs Community

  • MySQL Enterprise: Professional support, enterprise features, backup tools
  • Percona: Strong community support, enhanced monitoring tools
  • MariaDB: Different replication implementations, compatibility concerns

Tool Ecosystem Maturity

  • Orchestrator: Production-ready, actively maintained
  • PMM: Comprehensive monitoring, good documentation
  • pt-toolkit: Proven operational tools, wide adoption
  • MySQL Shell: Official tooling, improving but limited compared to third-party options

This reference provides the technical foundation for implementing MySQL replication in production environments, with emphasis on operational realities and failure prevention rather than theoretical optimization.

Related Tools & Recommendations

compare
Recommended

PostgreSQL vs MySQL vs MongoDB vs Cassandra vs DynamoDB - Database Reality Check

Most database comparisons are written by people who've never deployed shit in production at 3am

PostgreSQL
/compare/postgresql/mysql/mongodb/cassandra/dynamodb/serverless-cloud-native-comparison
100%
compare
Similar content

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life

Compare PostgreSQL, MySQL, MariaDB, SQLite, and CockroachDB to pick the best database for your project. Understand performance, features, and team skill conside

/compare/postgresql-mysql-mariadb-sqlite-cockroachdb/database-decision-guide
99%
compare
Similar content

PostgreSQL vs MySQL vs MariaDB - Performance Analysis 2025

Which Database Will Actually Survive Your Production Load?

PostgreSQL
/compare/postgresql/mysql/mariadb/performance-analysis-2025
93%
howto
Similar content

MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide

Migrate MySQL to PostgreSQL without destroying your career (probably)

MySQL
/howto/migrate-mysql-to-postgresql-production/mysql-to-postgresql-production-migration
88%
tool
Similar content

Google Cloud SQL - Database Hosting That Doesn't Require a DBA

MySQL, PostgreSQL, and SQL Server hosting where Google handles the maintenance bullshit

Google Cloud SQL
/tool/google-cloud-sql/overview
88%
tool
Recommended

PostgreSQL WAL Tuning - Stop Getting Paged at 3AM

The WAL configuration guide for engineers who've been burned by shitty defaults

PostgreSQL Write-Ahead Logging (WAL)
/tool/postgresql-wal/wal-architecture-tuning
60%
compare
Recommended

PostgreSQL vs MySQL vs MariaDB - Developer Ecosystem Analysis 2025

PostgreSQL, MySQL, or MariaDB: Choose Your Database Nightmare Wisely

PostgreSQL
/compare/postgresql/mysql/mariadb/developer-ecosystem-analysis
60%
tool
Recommended

MySQL Workbench - Oracle's Official MySQL GUI (That Eats Your RAM)

Free MySQL desktop app that tries to do everything and mostly succeeds at pissing you off

MySQL Workbench
/tool/mysql-workbench/overview
59%
tool
Recommended

MySQL Workbench Performance Issues - Fix the Crashes, Slowdowns, and Memory Hogs

Stop wasting hours on crashes and timeouts - actual solutions for MySQL Workbench's most annoying performance problems

MySQL Workbench
/tool/mysql-workbench/fixing-performance-issues
59%
pricing
Recommended

How These Database Platforms Will Fuck Your Budget

competes with MongoDB Atlas

MongoDB Atlas
/pricing/mongodb-atlas-vs-planetscale-vs-supabase/total-cost-comparison
54%
compare
Recommended

MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend

competes with mongodb

mongodb
/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
54%
tool
Similar content

Database Replication - Keep Your Shit From Disappearing When Servers Die

Copy your database to multiple servers so when one crashes, your app doesn't shit the bed

AWS Database Migration Service (DMS)
/tool/database-replication/overview
52%
tool
Recommended

Fivetran: Expensive Data Plumbing That Actually Works

Data integration for teams who'd rather pay than debug pipelines at 3am

Fivetran
/tool/fivetran/overview
48%
tool
Recommended

PowerCenter - Expensive ETL That Actually Works

competes with Informatica PowerCenter

Informatica PowerCenter
/tool/informatica-powercenter/overview
48%
howto
Similar content

How I Migrated Our MySQL Database to PostgreSQL (And Didn't Quit My Job)

Real migration guide from someone who's done this shit 5 times

MySQL
/howto/migrate-legacy-database-mysql-postgresql-2025/beginner-migration-guide
47%
alternatives
Similar content

MySQL Alternatives - Time to Jump Ship?

MySQL silently corrupted our production data for the third time this year. That's when I started seriously looking at alternatives.

MySQL
/alternatives/mysql/migration-ready-alternatives
47%
troubleshoot
Similar content

Fix MySQL Error 1045 Access Denied - Real Solutions That Actually Work

Stop fucking around with generic fixes - these authentication solutions are tested on thousands of production systems

MySQL
/troubleshoot/mysql-error-1045-access-denied/authentication-error-solutions
44%
tool
Similar content

gh-ost - GitHub's MySQL Migration Tool That Doesn't Use Triggers

Migration tool that doesn't break everything when pt-osc shits the bed

gh-ost
/tool/gh-ost/overview
39%
troubleshoot
Recommended

Docker говорит permission denied? Админы заблокировали права?

depends on Docker

Docker
/ru:troubleshoot/docker-permission-denied-linux/permission-denied-solutions
34%
tool
Recommended

RHEL - For When Your Boss Asks 'What If This Breaks?'

depends on Red Hat Enterprise Linux

Red Hat Enterprise Linux
/tool/red-hat-enterprise-linux/overview
34%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization