Standard MySQL ALTER TABLE
on a 50GB table? Hope you like explaining to your boss why the site was down for 6 hours. Here's what MySQL actually does when you run ALTER TABLE users ADD COLUMN email VARCHAR(255)
:
- MySQL creates a new table with your schema change
- Copies every single row from the old table to the new one
- Locks the entire table while this happens
- Swaps the tables when done
On my 200GB user table, this took 8 hours and locked out every single user. Learned that lesson the hard way. The MySQL docs explain the process but conveniently skip the 'your users will hate you for half a day' part.
How pt-online-schema-change Actually Works
pt-osc does the same thing but doesn't fuck your table with locks:
Step 1: Create the Shadow Table
Makes an empty copy of your table with the schema changes applied. Your original table keeps working normally.
Step 2: Copy Data in Chunks
Copies 1,000 rows at a time by default. Pauses between batches so your app doesn't die. Takes forever but won't murder your database. The chunking algorithm uses the primary key to efficiently iterate through data.
Step 3: Keep Everything in Sync
Creates triggers on your original table to capture any writes happening during the copy. These get replayed on the shadow table so nothing gets lost. This is the critical difference from gh-ost, which uses binary log parsing instead.
Step 4: The Atomic Swap
When copying is done, it does a RENAME TABLE
operation that swaps the old and new tables. This part takes milliseconds, not hours. The MySQL RENAME TABLE operation is atomic and metadata-only.
The Real Gotchas Nobody Tells You
Disk Space: You need 3x your table size free. The official docs say 2x but that's horseshit. Last time I ran out of space at 95% on a 180GB table when the logs filled up. Spent Saturday explaining to the CTO why we needed emergency disk space at 2AM. Run df -h
first or suffer.
Primary Key Required: If your table doesn't have a primary key, you're fucked. The tool needs it for the DELETE trigger. Found this out on a legacy table that took 2 days to add a PK to. This MySQL bug report explains why triggers require PKs.
Foreign Keys Are Hell: The `--alter-foreign-keys-method` options are all terrible in different ways. drop_swap
disables FK checks (scary), rebuild_constraints
can fail spectacularly, and auto
just picks one of these disasters. Percona's FK handling guide explains the nightmare.
Triggers Can Cause Replication Lag: Heavy write traffic means lots of trigger execution. I've seen 30-second replication lag spikes that made monitoring scream. MySQL trigger performance degrades with table size.
The Progress Bar Lies: When it says 99%, that could still be 2 hours. The last chunk often takes forever because of lock contention. This forum thread covers the common hang scenarios.
This tool works, but test it on a replica first or you'll be debugging production at 3AM like I was last Tuesday. Percona's testing guide covers replica testing if you want to do this right.
Anyway, here's when to actually use this thing vs the other options.