Currently viewing the human version
Switch to AI version

Why Standard ALTER TABLE Will Ruin Your Day

MySQL Logo

Standard MySQL ALTER TABLE on a 50GB table? Hope you like explaining to your boss why the site was down for 6 hours. Here's what MySQL actually does when you run ALTER TABLE users ADD COLUMN email VARCHAR(255):

  1. MySQL creates a new table with your schema change
  2. Copies every single row from the old table to the new one
  3. Locks the entire table while this happens
  4. Swaps the tables when done

On my 200GB user table, this took 8 hours and locked out every single user. Learned that lesson the hard way. The MySQL docs explain the process but conveniently skip the 'your users will hate you for half a day' part.

How pt-online-schema-change Actually Works

Database Schema Process

pt-osc does the same thing but doesn't fuck your table with locks:

Step 1: Create the Shadow Table
Makes an empty copy of your table with the schema changes applied. Your original table keeps working normally.

Step 2: Copy Data in Chunks
Copies 1,000 rows at a time by default. Pauses between batches so your app doesn't die. Takes forever but won't murder your database. The chunking algorithm uses the primary key to efficiently iterate through data.

Step 3: Keep Everything in Sync
Creates triggers on your original table to capture any writes happening during the copy. These get replayed on the shadow table so nothing gets lost. This is the critical difference from gh-ost, which uses binary log parsing instead.

Step 4: The Atomic Swap
When copying is done, it does a RENAME TABLE operation that swaps the old and new tables. This part takes milliseconds, not hours. The MySQL RENAME TABLE operation is atomic and metadata-only.

ALTER TABLE Process

The Real Gotchas Nobody Tells You

Disk Space: You need 3x your table size free. The official docs say 2x but that's horseshit. Last time I ran out of space at 95% on a 180GB table when the logs filled up. Spent Saturday explaining to the CTO why we needed emergency disk space at 2AM. Run df -h first or suffer.

Primary Key Required: If your table doesn't have a primary key, you're fucked. The tool needs it for the DELETE trigger. Found this out on a legacy table that took 2 days to add a PK to. This MySQL bug report explains why triggers require PKs.

Foreign Keys Are Hell: The `--alter-foreign-keys-method` options are all terrible in different ways. drop_swap disables FK checks (scary), rebuild_constraints can fail spectacularly, and auto just picks one of these disasters. Percona's FK handling guide explains the nightmare.

Triggers Can Cause Replication Lag: Heavy write traffic means lots of trigger execution. I've seen 30-second replication lag spikes that made monitoring scream. MySQL trigger performance degrades with table size.

The Progress Bar Lies: When it says 99%, that could still be 2 hours. The last chunk often takes forever because of lock contention. This forum thread covers the common hang scenarios.

This tool works, but test it on a replica first or you'll be debugging production at 3AM like I was last Tuesday. Percona's testing guide covers replica testing if you want to do this right.

Anyway, here's when to actually use this thing vs the other options.

pt-online-schema-change vs The Other Options

Feature

pt-online-schema-change

gh-ost

Standard ALTER TABLE

Will it lock your table?

No

No

Yes (for hours)

Do you need triggers?

Yes (3 of them)

No

No

Foreign keys work?

Kinda (it's painful)

Nope

Yes

Minimum MySQL version

5.0.2+

5.7+

Any

Production impact

Medium (triggers slow things)

Low (reads from replica)

High (everything stops)

How fucked are you if it breaks?

Medium

Low

Very

Can you resume if it crashes?

Yes (with --history flag)

Yes

No

Progress reporting

Lies to you

Also lies but prettier

No reporting

Learning curve

Medium

Steep

None (just don't use it)

Rollback difficulty

Hard

Easier

Hard

Real Talk

Works but slow

Fancy but complicated

Fast way to unemployment

Actually Using pt-online-schema-change (Without Destroying Production)

MySQL Command Line

Installation

pt-osc comes with Percona Toolkit. Latest is 3.7.0 (Dec 23, 2024) with MySQL 8.4 support. Install from Percona's repos:

## Ubuntu/Debian
apt install percona-toolkit

## CentOS/RHEL
yum install percona-toolkit

## macOS
brew install percona-toolkit

Or download packages or use Docker if you're into that. Works with MySQL 5.7-8.4 and MariaDB 10.x.

Before You Run Anything

Check Your Table Has a Primary Key:

SHOW INDEX FROM your_table WHERE Key_name = 'PRIMARY';

No primary key = no pt-osc. You'll need to add one first. MySQL's documentation explains adding primary keys, but be aware this itself requires a schema change.

Check Disk Space:

df -h

You need 3x your table size free. The official docs lie about "2x" - trust me. I've run out of space too many times. Also check InnoDB file-per-table if you're using shared tablespaces.

Test Your ALTER Statement:

pt-online-schema-change --alter \"ADD COLUMN email VARCHAR(255)\" --dry-run D=test,t=users

If dry-run fails, fix it before wasting 6 hours on a broken command like I did once. The dry-run documentation explains what gets tested (spoiler: not the stuff that actually breaks in production).

Basic Command That Actually Works

pt-online-schema-change \
  --alter \"ADD COLUMN email VARCHAR(255)\" \
  --execute \
  --max-lag=5s \
  --max-load=Threads_running=50 \
  --critical-load=Threads_running=100 \
  --chunk-size=1000 \
  --sleep=0.1 \
  D=yourdb,t=yourtable

Real-World Time Estimates

Database Performance Optimization

The progress percentage is complete bullshit. Here's reality:

  • 1GB table: 30 minutes to 2 hours (depending on write traffic)
  • 10GB table: 2-8 hours (plan for the high end)
  • 100GB table: 8-24 hours (seriously, run this overnight)
  • 500GB+ table: Days. Schedule vacation time.

The last 10% often takes as long as the first 90% due to lock contention.

When It Will Fail

Lock Wait Timeouts:

DBD::mysql::db do failed: Lock wait timeout exceeded

This means long-running queries are blocking the tool. Use `SHOW PROCESSLIST` to find blocking queries and kill them, or run during low traffic. Adjust `innodb_lock_wait_timeout` if needed.

Out of Disk Space:
The tool doesn't check disk space because that would make sense. I've watched it fill drives at 95% and crash. MySQL 8.0.24 had a bug where it wouldn't fail gracefully on ENOSPC - just died. Watch disk usage with `df` or `iostat`. Percona's PMM covers monitoring if you want to do this right.

Replica Discovery Hangs:
If your replicas aren't reachable, pt-osc hangs forever trying to connect. Use `--recursion-method=none` to skip replica discovery. Essential for AWS RDS environments.

Foreign Key Hell:
Foreign keys make everything harder. The `--alter-foreign-keys-method` options are:

  • auto: Lets the tool decide (usually wrong)
  • rebuild_constraints: Rebuilds FKs (can fail with complex schemas)
  • drop_swap: Disables FK checks temporarily (scary but works)

This Percona blog post explains why FK handling is fundamentally broken.

Production Survival Guide

Database Monitoring Dashboard

Before You Start:

  1. Run during low traffic hours (2-6 AM)
  2. Alert your team - this will cause replication lag
  3. Have the rollback plan ready (hint: there isn't really one)
  4. Monitor disk space, CPU, and replication lag

While It's Running:

  • Watch SHOW PROCESSLIST for blocking queries
  • Monitor replication lag on all slaves
  • Don't panic when progress stalls at 99%

If It Breaks:
The original table gets renamed to _old. You can swap it back, but that requires downtime:

RENAME TABLE users TO users_broken, users_old TO users;

The Ugly Truth

This tool works, but it's not magic. I've run it on 500GB tables and it always takes longer than you think. Budget 2x your estimate, then double that when your manager asks.

Test on a replica first. Dry-run lies - it'll pass dry-run and fail in production because MySQL hates you.

Here's what actually breaks at 2AM.

Questions DBAs Actually Ask (At 3AM)

Q

Why is this thing stuck at 99% for 6 hours?

A

The progress bar is lying to you. 99% could mean 6 more hours. The last chunk gets fucked by lock contention, especially with long queries blocking everything. Kill those queries or suffer. Normal but infuriating.

Q

It says "Lock wait timeout exceeded" - now what?

A

Some query is holding a lock and blocking pt-osc. Run SHOW PROCESSLIST and kill anything that's been running for more than a few seconds. Or increase innodb_lock_wait_timeout if you're feeling brave.

Q

How do I stop this thing if it's going sideways?

A

Send Ctrl+C or kill the process. It'll finish the current chunk and clean up. Your original table is safe

  • pt-osc doesn't touch it until the very end.
Q

Can I run this on a table without a primary key?

A

No. The DELETE trigger needs a primary key to identify rows. Add a primary key first, but that's another schema change that might need pt-osc. It's schema changes all the way down.

Q

Why did it just eat all my disk space?

A

Because you didn't check free space first. pt-osc creates a full copy of your table plus overhead. You need 3x your table size free, not the 2x the docs claim.

Q

Foreign keys are breaking everything - what do I do?

A

Foreign keys are a nightmare with pt-osc. Your options all suck:

  • --alter-foreign-keys-method=drop_swap (disables FK checks - scary but works)
  • --alter-foreign-keys-method=rebuild_constraints (rebuilds them - often fails)
  • Remove the foreign keys manually, run pt-osc, add them back
Q

Can I run multiple pt-osc operations at once?

A

Don't. You'll run out of disk space and create lock hell. I ran pt-osc on three tables on a Friday thinking I was clever. First one hung at 99% and I spent my weekend watching SHOW PROCESSLIST like a database masochist. One at a time during low traffic.

Q

How long will this actually take?

A

Double whatever you estimated. For a 100GB table with moderate write traffic, plan for 12-24 hours. Heavy writes? Add another 50%. The tool is conservative about not breaking your database.

Q

The dry-run worked but the real run failed - WTF?

A

Happens all the time. Dry-run doesn't test what matters. Common failures dry-run misses:

  • Disk space (doesn't create the full table)
  • Lock timeouts (production has more shit going on)
  • Foreign key disasters (only show up with real data)
Q

How do I know if it actually worked?

A

Check row counts match:

SELECT COUNT(*) FROM old_table_name;
SELECT COUNT(*) FROM new_table_name;

Run a checksum if you're paranoid:

pt-table-checksum --databases=yourdb --tables=yourtable
Q

Does this work on AWS RDS?

A

Yeah, but replica discovery hangs because RDS networking is fucked. Amazon's stack doesn't play nice with pt-osc discovery. Use --recursion-method=none to skip replica checks. This August 2025 guide covers RDS gotchas.

Q

What about MySQL 8.4 compatibility?

A

Fully supported as of Percona Toolkit 3.7.0 (December 2024). Earlier versions had issues with MySQL 8.x authentication and metadata changes.

Actually Useful Resources (Not Marketing Crap)

Related Tools & Recommendations

tool
Recommended

MySQL Replication - How to Keep Your Database Alive When Shit Goes Wrong

alternative to MySQL Replication

MySQL Replication
/tool/mysql-replication/overview
95%
compare
Recommended

MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend

alternative to mysql

mysql
/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
95%
alternatives
Recommended

MySQL Alternatives That Don't Suck - A Migration Reality Check

Oracle's 2025 Licensing Squeeze and MySQL's Scaling Walls Are Forcing Your Hand

MySQL
/alternatives/mysql/migration-focused-alternatives
95%
tool
Recommended

gh-ost - GitHub's MySQL Migration Tool That Doesn't Use Triggers

Migration tool that doesn't break everything when pt-osc shits the bed

gh-ost
/tool/gh-ost/overview
73%
tool
Recommended

Liquibase Pro - Database Migrations That Don't Break Production

Policy checks that actually catch the stupid stuff before you drop the wrong table in production, rollbacks that work more than 60% of the time, and features th

Liquibase Pro
/tool/liquibase/overview
66%
tool
Popular choice

v0 by Vercel - Code Generator That Sometimes Works

Tool that generates React code from descriptions. Works about 60% of the time.

v0 by Vercel
/tool/v0/overview
60%
howto
Popular choice

How to Run LLMs on Your Own Hardware Without Sending Everything to OpenAI

Stop paying per token and start running models like Llama, Mistral, and CodeLlama locally

Ollama
/howto/setup-local-llm-development-environment/complete-setup-guide
55%
tool
Recommended

Flyway Enterprise - Stop Writing Database Migrations by Hand

Automatic script generation for teams tired of manual ALTER statements

Flyway Enterprise
/tool/flyway-enterprise/enterprise-guide
55%
tool
Recommended

Flyway - Just Run SQL Scripts In Order

Database migrations without the XML bullshit or vendor lock-in

Flyway
/tool/flyway/overview
55%
news
Popular choice

Framer Hits $2B Valuation: No-Code Website Builder Raises $100M - August 29, 2025

Amsterdam-based startup takes on Figma with 500K monthly users and $50M ARR

NVIDIA GPUs
/news/2025-08-29/framer-2b-valuation-funding
50%
howto
Popular choice

Migrate JavaScript to TypeScript Without Losing Your Mind

A battle-tested guide for teams migrating production JavaScript codebases to TypeScript

JavaScript
/howto/migrate-javascript-project-typescript/complete-migration-guide
47%
compare
Recommended

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life

depends on mariadb

mariadb
/compare/postgresql-mysql-mariadb-sqlite-cockroachdb/database-decision-guide
45%
compare
Recommended

PostgreSQL vs MySQL vs MariaDB - Performance Analysis 2025

Which Database Will Actually Survive Your Production Load?

PostgreSQL
/compare/postgresql/mysql/mariadb/performance-analysis-2025
45%
tool
Recommended

MariaDB - What MySQL Should Have Been

depends on MariaDB

MariaDB
/tool/mariadb/overview
45%
tool
Popular choice

jQuery - The Library That Won't Die

Explore jQuery's enduring legacy, its impact on web development, and the key changes in jQuery 4.0. Understand its relevance for new projects in 2025.

jQuery
/tool/jquery/overview
45%
tool
Recommended

Atlassian Confluence - Wiki That Wants to Be Everything Else

The Team Documentation Tool That Engineers Love to Hate

Atlassian Confluence
/tool/atlassian-confluence/overview
44%
alternatives
Recommended

Your MongoDB Atlas Bill Just Doubled Overnight. Again.

compatible with MongoDB Atlas

MongoDB Atlas
/alternatives/mongodb-atlas/migration-focused-alternatives
44%
pricing
Recommended

How These Database Platforms Will Fuck Your Budget

compatible with MongoDB Atlas

MongoDB Atlas
/pricing/mongodb-atlas-vs-planetscale-vs-supabase/total-cost-comparison
44%
tool
Popular choice

OpenAI Browser Implementation Challenges

Every developer question about actually using this thing in production

OpenAI Browser
/tool/openai-browser/implementation-challenges
42%
review
Popular choice

Cursor Enterprise Security Assessment - What CTOs Actually Need to Know

Real Security Analysis: Code in the Cloud, Risk on Your Network

Cursor
/review/cursor-vs-vscode/enterprise-security-review
40%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization