Discovery Phase:
Finding Out How Fucked You Really Are
Look, I get it.
Management wants this "simple database migration" done by Friday. You're thinking "how hard can it be?" Let me save you some pain: this discovery shit will save you months of debugging and prevent the kind of data loss that ends careers.
Before you even think about touching production, you need to understand what clusterfuck you're actually dealing with.
Database Inventory:
The Shit Nobody Told You About
Start with the obvious, then prepare to be horrified. That "simple" application your PM described? It probably has more databases than a fucking casino.
Map Your MySQL Ecosystem (And Cry)
Run this query on each MySQL server to see how many databases are hiding in your infrastructure:
SELECT
SCHEMA_NAME,
SUM(DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 / 1024 as SIZE_GB,
COUNT(*) as TABLE_COUNT
FROM information_schema.
SCHEMATA s
JOIN information_schema.TABLES t ON s.SCHEMA_NAME = t.TABLE_SCHEMA
WHERE SCHEMA_NAME NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
GROUP BY SCHEMA_NAME
ORDER BY SIZE_GB DESC;
What you'll discover (and why you'll need therapy):
Databases named
test_db_final_REALLY_FINAL_v2
that are somehow still in productionLog tables eating 80% of your storage that haven't been rotated since 2018
One database shared by three different applications because "it was easier at the time"
Development databases mixed with production because the previous DBA was a masochist
Skyvia found that 60% of migrations uncover "forgotten" databases still being used by legacy systems.
Translation: that API from 2019 that "nobody uses anymore" is actually hit 10,000 times a day.
MySQL Workbench will crash if you try to map more than 50 tables, so good luck there. pt-query-digest takes forever and the output looks like it was designed by someone who hates readable reports.
Find MySQL Features That'll Make PostgreSQL Laugh At You
PostgreSQL doesn't give a shit about MySQL's weird quirks and will reject half your data.
Here's how to find the stuff that'll break:
-- Find My
SQL-specific data types
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM information_schema.
COLUMNS
WHERE DATA_TYPE IN ('ENUM', 'SET', 'YEAR', 'TINYINT', 'MEDIUMINT');
-- Find MySQL-specific functions in stored procedures
SELECT ROUTINE_SCHEMA, ROUTINE_NAME, ROUTINE_DEFINITION
FROM information_schema.ROUTINES
WHERE ROUTINE_DEFINITION LIKE '%NOW()%''
OR ROUTINE_DEFINITION LIKE '%STR_TO_DATE%''
OR ROUTINE_DEFINITION LIKE '%GROUP_CONCAT%';
-- Find tables with MySQL-specific options
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT
FROM information_schema.TABLES
WHERE ENGINE = 'MyISAM' OR AUTO_INCREMENT IS NOT NULL;
The shit that'll break first:
ENUM
columns- PostgreSQL has ENUMs but they're rigid as hell.
You can't just add values without pain (PostgreSQL ENUM docs)
AUTO_INCREMENT
becomes sequences that you'll fuck up at least twice (sequence docs)
MySQL's zero dates (
0000-00-00
)PostgreSQL will laugh and reject your entire table (datetime docs)
Case-insensitive comparisons
My
SQL is casual about case, PostgreSQL is a pedantic asshole (collation docs)
The [PostgreSQL wiki](https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_Postgre
SQL#MySQL) has conversion info, but it's incomplete and assumes you know what you're doing. Estuary's guide is better but still glosses over the painful edge cases.
Schema Analysis:
Where Dreams Go To Die
Data Type Compatibility Hell
Every developer thinks data types are simple until they start a migration. MySQL and PostgreSQL disagree on basically everything. Here's the shit you'll spend weeks debugging:
| My
SQL Type | PostgreSQL Type | Migration Notes |
---|---|---|
TINYINT(1) |
BOOLEAN |
MySQL uses 0/1, PostgreSQL uses true/false |
DATETIME |
TIMESTAMP |
Handle timezone conversions |
ENUM('a','b','c') |
VARCHAR CHECK (...) |
Create constraints for each enum |
MEDIUMTEXT |
TEXT |
Direct mapping, but verify character limits |
DOUBLE |
DOUBLE PRECISION |
Precision handling may differ |
Find The Garbage Data That'll Kill Your Migration
PostgreSQL has standards, unlike MySQL which accepts any bullshit you feed it.
Fix this crap before migration or watch pgloader crash spectacularly:
-- Find invalid dates that will crash PostgreSQL
SELECT COUNT(*) FROM your_table WHERE date_column = '0000-00-00 00:00:00';
-- Find potential character encoding issues
SELECT COUNT(*) FROM your_table WHERE column_name REGEXP '[^\\x00-\\x7F]';
-- Check for constraint violations PostgreSQL won't accept
SELECT * FROM child_table c
LEFT JOIN parent_table p ON c.parent_id = p.id
WHERE p.id IS NULL;
The Percona pgloader guide shows that 40% of migrations fail initially due to data quality issues that were ignored in MySQL but PostgreSQL enforces strictly.
Use pgloader's data transformation capabilities and PostgreSQL's constraint validation tools to handle these issues systematically.
Performance Baseline:
Know What You're Leaving Behind
Step 5: Document Current My
SQL Performance
Before migration, establish performance baselines using MySQL Performance Schema and pt-query-digest so you can validate that PostgreSQL performs adequately with pg_stat_statements and pgbench:
Query Performance:
-- Enable MySQL slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
-- Run your application's typical workload for 24 hours
-- Then analyze the slow query log
Resource Usage:
-- Check current connection counts
SHOW STATUS LIKE 'Threads_connected';
-- Monitor memory usage
SHOW STATUS LIKE 'Key_buffer_size';
-- Check buffer pool utilization
SHOW ENGINE INNODB STATUS\\G
Application Dependencies:
Document all applications connecting to each MySQL database
Note connection pooling configurations
Identify batch jobs and their schedules
Map out read replica usage patterns
Step 6:
Test Your Backup and Recovery Process
CRITICAL: Test your MySQL backup restoration BEFORE starting migration.
The GitLab incident happened partly because their backups hadn't been tested.
# Create a full backup
mysqldump --all-databases --routines --triggers --single-transaction > full_backup.sql
# Test restore on a separate MySQL instance
mysql -u root -p < full_backup.sql
# Verify data integrity
mysql -e \"SELECT COUNT(*) FROM important_table;\"
AWS Database Migration Service (DMS) Overview:
Migration Strategy Decision Matrix
Based on your discovery findings, choose your migration approach:
For databases < 100GB with simple schemas:
Tool: pgloader
Downtime: 2-8 hours
Complexity: Medium
Best for: One-time migrations, development environments
For databases > 100GB or complex applications:
Tool: AWS DMS or custom replication
Downtime: 30 minutes to 2 hours
Complexity: High
Best for: Production systems requiring minimal downtime
For applications requiring zero downtime:
Tool: Dual-write with gradual cutover
Downtime: Theoretically zero (realistically 10-30 minutes)
Complexity: Very high
Best for: Mission-critical systems with 24/7 availability requirements
The assessment phase typically reveals that your "simple" migration is actually complex, but it's better to know this upfront than discover it during the production cutover window.