Currently viewing the human version
Switch to AI version

![MySQL Logo](https://www.svgrepo.com/show/303251/mysql-logo.svg)

MySQL Logo

Why gh-ost Exists: Because pt-online-schema-change Will Ruin Your Weekend

GitHub built gh-ost after getting burned too many times by pt-online-schema-change taking down production during routine migrations. Released in 2016, it's their solution to the "why is our database locked up again?" problem.

The Real Problem with Triggers

Every other MySQL migration tool uses triggers. Triggers are fucking evil because they run in the same transaction as your application queries. First, they create metadata lock contention that can freeze your entire database - I've seen 30-second ALTER TABLE operations turn into 10-minute deadlocks because triggers couldn't be installed. Then every INSERT, UPDATE, and DELETE on your table has to execute trigger code, adding massive overhead to every transaction (I've seen 40-80% performance hits) on a table doing 10k writes/second. And once pt-osc starts? You can't actually stop it without data loss because the "throttle" option only pauses the copy phase while triggers keep firing and eating CPU. Your only choice is to let it finish or restore from backup.

Alright, Here's How gh-ost Fixes This Clusterfuck

OK, rant over. gh-ost reads from the MySQL binary log instead of using triggers, which fixes all this shit:

  • Actually Pausable: Run echo throttle | socat - /tmp/gh-ost.sock and it immediately stops all writes. Your database load drops to normal instantly. See interactive commands documentation.

  • No Lock Contention: No triggers to install/remove means no metadata locks. The migration runs completely outside your application transactions. Read about MySQL's locking mechanisms.

  • Real Throttling: You can pause, resume, change chunk sizes, and even abort mid-migration without leaving your database in a broken state. Check the throttling documentation for all options.

The Catch

gh-ost isn't magic. It requires Row-Based Replication and MySQL 5.7+. Foreign keys are a no-go - you have to drop them before migration and recreate after. But honestly, if you're running MySQL in production at any scale, you should already have RBR enabled and foreign keys removed for performance reasons.

Who Actually Uses This

Version 1.1.7 was released December 20, 2024 with performance improvements and several bug fixes. GitHub uses it for their own production (obviously), along with companies like Shopify and others who got tired of pt-osc ruining their weekends. It has 12.9k stars on GitHub, which in database tool terms means "actually battle-tested." You can find the complete requirements and limitations in the official documentation.

What gh-ost Actually Does (And Where It'll Bite You)

How It Really Works

gh-ost creates a "ghost" table with your new schema, then copies data in chunks while tailing the binary log to catch live changes. The interactive commands let you actually control it while it's running, unlike pt-osc which is basically "start it and pray."

Here's how it's supposed to work (spoiler: reality is messier): It copies your shit in chunks, watches the binary log for new changes, then swaps tables when done. That's it. Well, technically it creates a your_table_gho ghost table first, copies 1000 rows at a time by default, tails the binlog from a replica to catch live changes, applies those to the ghost table, and finally does a quick table swap. But you get the idea.

The Stuff That Actually Works

You Can Actually Pause It: echo throttle | socat - /tmp/gh-ost.sock stops all writes immediately. This saved my ass when load spiked during a migration and I needed to back off instantly. Full command reference here.

Replica Testing: The --test-on-replica flag lets you run the entire migration on a replica first. It stops replication at the end so you can verify the data looks right before risking production.

Lag Monitoring: It watches replication lag and throttles automatically if replicas fall behind. You can set --max-lag-millis=1500 and it'll pause when lag hits that threshold.

Where It All Goes Wrong in Production

Binary Log Position Can Get Fucked: On our AWS RDS setup running MySQL 8.0.28, if the binary log gets corrupted or the position becomes invalid, your migration dies with a cryptic error about binlog coordinates. Had this migration die on us when the logs got purged mid-run because some genius set expire_logs_days=1. Took forever to figure out what happened, ended up having to restart the whole thing. Not fun when you're trying to hit a maintenance window.

Foreign Keys = Manual Work: No foreign key support means you drop them all before migration, then recreate after. For a table with 15 FKs pointing to it, this becomes a whole project involving application code changes and careful ordering.

Cleanup Is Manual: When a migration fails (and they do), you're left with ghost tables and changelog tables to clean up manually. The --serve-socket-file disappears, so you can't even ask it what went wrong.

MySQL Version Gotchas:

Real Performance Numbers

From Percona's benchmarks on a 3GB sysbench table:

  • Idle load: pt-osc finished nearly twice as fast as gh-ost
  • Light load: pt-osc still ~2x faster than gh-ost
  • Heavy load (40% capacity): gh-ost couldn't complete, pt-osc finished with 12% performance impact

The reality? gh-ost's single-threaded binlog processing hits a wall on write-heavy workloads. It's great for safety and control, but pt-osc can handle higher write volumes.

When It Breaks

"Error 1236: Could not find first log file name in binary log index file": Happens when your binary log files got rotated during migration. If you're fucked, you're fucked - there's no recovery from this. We had expire_logs_days=1 set too aggressively. Discovered this during peak traffic, naturally.

"Cannot determine table's PRIMARY KEY": Happens on tables with no PK or only composite keys. You need a single-column primary key or unique index. Yes, even in 2025 there are tables without proper PKs. We had to add a surrogate PK column to a 200GB audit log table - that was a fun conversation with the architect.

"Connection was killed": MySQL 8.0.28 has a nasty bug where long-running SELECT queries get killed during gh-ost operations. Set --mysql-timeout=300 or you'll randomly fail after hours of work. Nothing like watching 8 hours of progress vanish.

Replication lag spikes: The binary log tailing can fall behind on write-heavy workloads. Monitor SHOW SLAVE STATUS and be ready to increase --chunk-size if needed. AWS RDS has its own special way of calculating lag that doesn't always make sense.

gh-ost vs pt-online-schema-change: The Reality

Factor

gh-ost

pt-online-schema-change

Reality Check

Triggers

None

Creates 3 triggers per table

pt-osc triggers fuck with performance

Actual Pause

Stops all writes instantly

Pauses copy, triggers keep firing

gh-ost you can actually stop

Foreign Keys

Drop them first

Handles them (mostly)

pt-osc FK handling is buggy

MySQL Versions

5.7+ only

5.5+ (ancient)

Most people should be on 5.7+ anyway

When It Breaks

Clean artifacts, no data loss

Can corrupt triggers, restore from backup

gh-ost fails safer

Setup Complexity

RBR + replica access

Works out of box

pt-osc is easier to start

Lock Contention

Almost none

Metadata locks during trigger creation

pt-osc locks are a nightmare

Load Impact

15-20% on master

40-60% on busy tables

Measured on real workloads

Questions People Actually Ask About gh-ost

Q

Why does gh-ost randomly fail with "binlog position not found" errors?

A

Happens when MySQL tosses your binary logs while you're still reading from them. Check your `expire_logs_days` setting - if your migration takes longer than the log retention period, you're fucked. Set it to at least 7 days for big migrations.

Quick fix: Restart with --exact-rowcount for better progress tracking.

Q

How do I clean up when gh-ost dies halfway through?

A

When gh-ost crashes, it leaves behind ghost tables you need to clean up manually:

DROP TABLE IF EXISTS your_table_gho;  -- Ghost table
DROP TABLE IF EXISTS your_table_ghc;  -- Changelog table

The original table is untouched until cutover succeeds, so you can just restart the migration. No official recovery tool exists - you're on your own.

Q

Why does everyone still use pt-osc if gh-ost is so much better?

A

Because pt-osc works with foreign keys and ancient MySQL versions. Most companies have legacy schemas with foreign key clusterfucks they can't touch. gh-ost requires dropping FKs first = application code changes.

Plus pt-osc is simpler - install Percona Toolkit and run it. gh-ost needs RBR setup, replica access, planning.

Q

Can I run gh-ost on RDS or cloud databases?

A

AWS RDS: Works but you need log_bin_trust_function_creators=1 and replica access. Some RDS configurations don't give you full binlog access. Found out the hard way that RDS's "automated backups" can interfere with gh-ost's binlog reading if you're not careful.

Google Cloud SQL: Works with point-in-time recovery enabled. But their binlog retention is aggressive - saw a migration fail when GCP rotated logs after 3 days instead of the expected 7.

Azure MySQL: Requires specific configuration for binlog access. Their documentation is shit, expect to spend a day figuring out the exact permission settings.

The catch: Cloud providers love to rotate logs aggressively or lock down binlog access, breaking long migrations without warning.

Q

What's the maximum table size gh-ost can handle?

A

There's no hard limit, but practical constraints kick in:

  • < 100GB: Usually fine, 2-4 hours
  • 100GB - 1TB: Doable but plan for 8-24 hours
  • > 1TB: You're in "pray nothing goes wrong" territory

Biggest reported success: 5TB table took 3 days. The risk is something breaking during a multi-day migration - binlog rotation, replica failures, network issues, etc.

Q

Does gh-ost work with MySQL 8.0's instant DDL?

A

No, gh-ost doesn't use MySQL 8.0's instant DDL features. It still creates a full copy. For simple operations like adding nullable columns, MySQL's instant DDL is faster if you can tolerate the replication lag.

Q

Why can't I just use online DDL instead of gh-ost?

A

MySQL's online DDL blocks replication. The ALTER runs on master, then hits your replicas as one massive statement that locks them up. gh-ost keeps replicas responsive.

Docs That Don't Suck

Related Tools & Recommendations

tool
Recommended

pt-online-schema-change - The Tool That Prevents 3AM Disaster Calls

Change MySQL table schemas without locking everyone out for 6 hours.

pt-online-schema-change
/tool/pt-online-schema-change/overview
73%
howto
Popular choice

Migrate JavaScript to TypeScript Without Losing Your Mind

A battle-tested guide for teams migrating production JavaScript codebases to TypeScript

JavaScript
/howto/migrate-javascript-project-typescript/complete-migration-guide
57%
tool
Popular choice

jQuery - The Library That Won't Die

Explore jQuery's enduring legacy, its impact on web development, and the key changes in jQuery 4.0. Understand its relevance for new projects in 2025.

jQuery
/tool/jquery/overview
55%
pricing
Recommended

How These Database Platforms Will Fuck Your Budget

compatible with MongoDB Atlas

MongoDB Atlas
/pricing/mongodb-atlas-vs-planetscale-vs-supabase/total-cost-comparison
55%
tool
Recommended

PlanetScale - MySQL That Actually Scales Without The Pain

Database Platform That Handles The Nightmare So You Don't Have To

PlanetScale
/tool/planetscale/overview
55%
compare
Recommended

These 4 Databases All Claim They Don't Suck

I Spent 3 Months Breaking Production With Turso, Neon, PlanetScale, and Xata

Turso
/review/compare/turso/neon/planetscale/xata/performance-benchmarks-2025
55%
tool
Popular choice

OpenAI Browser Implementation Challenges

Every developer question about actually using this thing in production

OpenAI Browser
/tool/openai-browser/implementation-challenges
52%
review
Popular choice

Cursor Enterprise Security Assessment - What CTOs Actually Need to Know

Real Security Analysis: Code in the Cloud, Risk on Your Network

Cursor
/review/cursor-vs-vscode/enterprise-security-review
50%
tool
Popular choice

Istio - Service Mesh That'll Make You Question Your Life Choices

The most complex way to connect microservices, but it actually works (eventually)

Istio
/tool/istio/overview
47%
tool
Recommended

MySQL Replication - How to Keep Your Database Alive When Shit Goes Wrong

depends on MySQL Replication

MySQL Replication
/tool/mysql-replication/overview
45%
compare
Recommended

MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend

depends on mysql

mysql
/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
45%
alternatives
Recommended

MySQL Alternatives That Don't Suck - A Migration Reality Check

Oracle's 2025 Licensing Squeeze and MySQL's Scaling Walls Are Forcing Your Hand

MySQL
/alternatives/mysql/migration-focused-alternatives
45%
integration
Recommended

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

How to Wire Together the Modern DevOps Stack Without Losing Your Sanity

go
/integration/docker-kubernetes-argocd-prometheus/gitops-workflow-integration
45%
alternatives
Recommended

MongoDB Alternatives: Choose the Right Database for Your Specific Use Case

Stop paying MongoDB tax. Choose a database that actually works for your use case.

MongoDB
/alternatives/mongodb/use-case-driven-alternatives
45%
integration
Recommended

Kafka + MongoDB + Kubernetes + Prometheus Integration - When Event Streams Break

When your event-driven services die and you're staring at green dashboards while everything burns, you need real observability - not the vendor promises that go

Apache Kafka
/integration/kafka-mongodb-kubernetes-prometheus-event-driven/complete-observability-architecture
45%
tool
Recommended

GitHub Desktop - Git with Training Wheels That Actually Work

Point-and-click your way through Git without memorizing 47 different commands

GitHub Desktop
/tool/github-desktop/overview
45%
compare
Recommended

AI Coding Assistants 2025 Pricing Breakdown - What You'll Actually Pay

GitHub Copilot vs Cursor vs Claude Code vs Tabnine vs Amazon Q Developer: The Real Cost Analysis

GitHub Copilot
/compare/github-copilot/cursor/claude-code/tabnine/amazon-q-developer/ai-coding-assistants-2025-pricing-breakdown
45%
integration
Recommended

I've Been Juggling Copilot, Cursor, and Windsurf for 8 Months

Here's What Actually Works (And What Doesn't)

GitHub Copilot
/integration/github-copilot-cursor-windsurf/workflow-integration-patterns
45%
pricing
Popular choice

What Enterprise Platform Pricing Actually Looks Like When the Sales Gloves Come Off

Vercel, Netlify, and Cloudflare Pages: The Real Costs Behind the Marketing Bullshit

Vercel
/pricing/vercel-netlify-cloudflare-enterprise-comparison/enterprise-cost-analysis
45%
tool
Popular choice

MariaDB - What MySQL Should Have Been

Discover MariaDB, the powerful open-source alternative to MySQL. Learn why it was created, how to install it, and compare its benefits for your applications.

MariaDB
/tool/mariadb/overview
42%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization