MySQL Replication: AI-Optimized Technical Reference
Overview
MySQL replication copies data from source servers to replica servers for high availability and read scaling. Critical for preventing data loss during hardware failures.
Configuration
Essential Binary Log Settings
-- Production-ready binlog configuration
log-bin = mysql-bin
binlog_format = ROW # CRITICAL: Prevents non-deterministic replication issues
sync_binlog = 1 # Ensures durability, reduces write performance by ~50%
binlog_expire_logs_seconds = 2592000 # 30 days retention minimum
max_binlog_size = 1G # Prevents oversized binlog files
Critical Failure Point: Statement-based replication fails with functions like NOW()
and RAND()
producing different results on replicas.
GTID Configuration (Recommended)
gtid_mode = ON
enforce_gtid_consistency = ON
log_replica_updates = ON
replica_preserve_commit_order = ON
Migration Warning: Cannot enable GTID directly. Requires staged approach: OFF_PERMISSIVE
→ ON_PERMISSIVE
→ ON
. Direct activation corrupts replication topology.
Parallel Replication Optimization
replica_parallel_workers = 4-8 # Sweet spot for most workloads
replica_parallel_type = LOGICAL_CLOCK
replica_preserve_commit_order = ON
Performance Reality:
- More than 8 workers = coordination overhead kills performance
- Hot-spot tables force single-threaded processing regardless of worker count
- 32 workers can increase lag from 5 seconds to 3 minutes
Replication Types
Traditional Source-Replica
Characteristics:
- One source handles writes, replicas handle reads
- Single-threaded SQL thread by default (major bottleneck)
- Reliable but can lag during large transactions
Failure Scenario: 10M row UPDATE blocks replication for hours while changes queue up.
Semi-Synchronous Replication
Trade-offs:
- Source waits for replica acknowledgment before commit
- Adds network latency to every write operation
- Falls back to async mode during network issues WITHOUT notification
Configuration:
-- Source
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = 1;
SET GLOBAL rpl_semi_sync_source_timeout = 1000; # 1 second timeout
-- Replica
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = 1;
Critical Monitoring: Rpl_semi_sync_source_status
- shows when system falls back to async mode.
Group Replication
Use Cases: Multi-master setups requiring strong consistency
Reality Check:
- Works well in controlled environments with sub-millisecond latency
- Fails spectacularly during network partitions
- Uses Paxos consensus - minority partitions become read-only
- Performance degrades significantly above 5 nodes
- Facebook, GitHub, YouTube don't use this for critical systems
Latency Impact: 1ms commits become 50ms commits with 10ms inter-node latency.
Production Monitoring
Critical Metrics
SHOW REPLICA STATUS\G
Key Fields:
Seconds_Behind_Master
: Lag measurement (lies when SQL thread stopped)Replica_IO_Running
: Should be "Yes" (downloads binlogs)Replica_SQL_Running
: Should be "Yes" (applies changes)Last_Error
: Actual error messagesExecuted_Gtid_Set
: Transaction completion status (GTID only)
Monitoring Lies: Seconds_Behind_Master
shows 0 when:
- SQL thread is stopped (you're broken, not caught up)
- Source is idle (doesn't reflect actual lag)
- Parallel replication worker delays aren't accounted for
Better Monitoring Tools
- Percona Monitoring and Management (PMM): Professional dashboards
- pt-heartbeat: Accurate lag measurements
- Orchestrator: Automated failover management
- Custom GTID_SUBSET() checks: Compare executed transactions
Failover Procedures
Automated Failover (Orchestrator)
Timeline: 10-30 seconds for promotion
Requirements: GTID enabled, proper network connectivity
Failure Points: Network partitions can trigger unnecessary failovers
Manual Failover Checklist
- Stop writes to failed source immediately
- Identify most current replica via
Executed_Gtid_Set
STOP REPLICA; RESET REPLICA ALL;
on promoted replica- Point remaining replicas to new source
- Update application connection strings
- Verify write functionality
Time Budget: 2-5 minutes if prepared, 2+ hours if not.
Backup Strategy
Replica-Based Backups
# Percona XtraBackup from replica
xtrabackup --backup --target-dir=/backup/$(date +%Y%m%d) \
--host=replica-server --user=backup --password=secret \
--slave-info # Captures replication position
Critical Gotcha: Backup age equals replica lag. 2-hour lagged replica = backup missing 2 hours of data.
Recovery Requirements:
- Minimum 7-day binlog retention (
binlog_expire_logs_seconds = 604800
) - Consistent retention across all servers
- Point-in-time recovery impossible if binlogs purged before replica catchup
Security Implementation
Replication User Setup
CREATE USER 'repl'@'replica-server' IDENTIFIED BY 'secure_password' REQUIRE SSL;
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'replica-server';
-- Replica configuration
CHANGE REPLICATION SOURCE TO
SOURCE_USER='repl',
SOURCE_PASSWORD='secure_password',
SOURCE_SSL=1,
SOURCE_SSL_VERIFY_SERVER_CERT=1;
MySQL 8.4 Compatibility Issue: mysql_native_password
disabled by default. Old replicas require explicit authentication method specification.
Common Failure Scenarios
Disk Space Exhaustion
Frequency: 90% of replication failures
Symptoms: Replication stops, Last_Error
shows disk space issues
Resolution: Clean old binlogs, monitor disk usage continuously
Network Connectivity Issues
Symptoms: Replica_IO_Running = No
Diagnosis: Check firewall rules, DNS resolution, network latency
Prevention: Use private networks, avoid public internet for replication
Binary Log Corruption
Severity: High - requires restore from backup
Causes: Unclean shutdowns, storage failures
Prevention: sync_binlog = 1
(performance cost: ~50% write throughput)
Performance Characteristics
Write Performance Impact
- Async replication: No impact on source
- Semi-sync replication: 5-10% reduction
- Group replication: 10-20% reduction
- Parallel replication: Dependent on workload conflicts
Latency Expectations
- Well-configured setup: 100-500ms lag
- Default MySQL settings: 10-60 seconds lag
- Misconfigured systems: Hours of lag
Resource Requirements
Hardware Specifications
- CPU: Parallel replication benefits from 4-8 cores
- Memory: Binlog cache sizing critical for write-heavy workloads
- Storage: Fast disks for binlog writes, sufficient space for retention
- Network: Low-latency connections for semi-sync and Group Replication
Operational Expertise
- Setup complexity: Low (traditional) to High (Group Replication)
- Troubleshooting difficulty: Medium to High during network issues
- Time investment: 2-4 weeks to master, ongoing maintenance overhead
Decision Matrix
Requirement | Recommended Solution | Alternative | Avoid |
---|---|---|---|
Read scaling | Traditional async | Semi-sync | Group Replication |
Zero data loss | Semi-sync | Group Replication | Async only |
Multi-master writes | Application-level sharding | Group Replication | Multi-source |
Automatic failover | Orchestrator + GTID | Manual procedures | Position-based |
Cross-datacenter | Traditional with monitoring | Cloud managed | Group Replication |
Critical Warnings
What Documentation Doesn't Tell You
- Group Replication performance degrades significantly in real-world network conditions
- Semi-sync silently falls back to async during network issues
- Parallel replication coordination overhead can worsen performance
- MySQL defaults are optimized for 2005 hardware and traffic patterns
Breaking Points
- 1000+ concurrent connections: Connection handling becomes bottleneck
- >10ms network latency: Group Replication becomes unusable
- Hot-spot tables: Parallel replication reverts to single-threaded
- Binlog retention <7 days: Point-in-time recovery impossible
Production Reality Checks
- Companies with highest MySQL scale use traditional replication, not fancy features
- Network issues cause more replication problems than MySQL bugs
- Monitoring replication lag isn't enough - monitor lag measurement reliability
- Automated failover tools require as much testing as manual procedures
Support and Community Quality
Enterprise vs Community
- MySQL Enterprise: Professional support, enterprise features, backup tools
- Percona: Strong community support, enhanced monitoring tools
- MariaDB: Different replication implementations, compatibility concerns
Tool Ecosystem Maturity
- Orchestrator: Production-ready, actively maintained
- PMM: Comprehensive monitoring, good documentation
- pt-toolkit: Proven operational tools, wide adoption
- MySQL Shell: Official tooling, improving but limited compared to third-party options
This reference provides the technical foundation for implementing MySQL replication in production environments, with emphasis on operational realities and failure prevention rather than theoretical optimization.
Related Tools & Recommendations
PostgreSQL vs MySQL vs MongoDB vs Cassandra vs DynamoDB - Database Reality Check
Most database comparisons are written by people who've never deployed shit in production at 3am
PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life
Compare PostgreSQL, MySQL, MariaDB, SQLite, and CockroachDB to pick the best database for your project. Understand performance, features, and team skill conside
PostgreSQL vs MySQL vs MariaDB - Performance Analysis 2025
Which Database Will Actually Survive Your Production Load?
MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide
Migrate MySQL to PostgreSQL without destroying your career (probably)
Google Cloud SQL - Database Hosting That Doesn't Require a DBA
MySQL, PostgreSQL, and SQL Server hosting where Google handles the maintenance bullshit
PostgreSQL WAL Tuning - Stop Getting Paged at 3AM
The WAL configuration guide for engineers who've been burned by shitty defaults
PostgreSQL vs MySQL vs MariaDB - Developer Ecosystem Analysis 2025
PostgreSQL, MySQL, or MariaDB: Choose Your Database Nightmare Wisely
MySQL Workbench - Oracle's Official MySQL GUI (That Eats Your RAM)
Free MySQL desktop app that tries to do everything and mostly succeeds at pissing you off
MySQL Workbench Performance Issues - Fix the Crashes, Slowdowns, and Memory Hogs
Stop wasting hours on crashes and timeouts - actual solutions for MySQL Workbench's most annoying performance problems
How These Database Platforms Will Fuck Your Budget
competes with MongoDB Atlas
MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend
competes with mongodb
Database Replication - Keep Your Shit From Disappearing When Servers Die
Copy your database to multiple servers so when one crashes, your app doesn't shit the bed
Fivetran: Expensive Data Plumbing That Actually Works
Data integration for teams who'd rather pay than debug pipelines at 3am
PowerCenter - Expensive ETL That Actually Works
competes with Informatica PowerCenter
How I Migrated Our MySQL Database to PostgreSQL (And Didn't Quit My Job)
Real migration guide from someone who's done this shit 5 times
MySQL Alternatives - Time to Jump Ship?
MySQL silently corrupted our production data for the third time this year. That's when I started seriously looking at alternatives.
Fix MySQL Error 1045 Access Denied - Real Solutions That Actually Work
Stop fucking around with generic fixes - these authentication solutions are tested on thousands of production systems
gh-ost - GitHub's MySQL Migration Tool That Doesn't Use Triggers
Migration tool that doesn't break everything when pt-osc shits the bed
Docker говорит permission denied? Админы заблокировали права?
depends on Docker
RHEL - For When Your Boss Asks 'What If This Breaks?'
depends on Red Hat Enterprise Linux
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization