pgLoader: Database Migration Tool - AI-Optimized Reference
Core Function
pgLoader migrates data from MySQL, SQLite, Oracle, and MSSQL databases to PostgreSQL with automatic schema conversion and parallel bulk loading.
Critical Success Factors
Performance Architecture
- Uses PostgreSQL COPY protocol (not row-by-row INSERTs)
- Parallel processing: Start with 2-4 workers, increase if network/hardware supports
- Bulk loading capability: Handles millions of rows efficiently
- Memory requirement: Minimum 1GB maintenance_work_mem for large databases
Error Handling Design
- Rejection files: Bad data saved to
.reject.dat
and.reject.log
files instead of crashing - Transactional: Migration either completes fully or rolls back completely
- No resume capability: Failed migrations restart from beginning
Installation Reality
Recommended Approach
docker pull ghcr.io/dimitri/pgloader:latest
Why Docker wins: Package manager versions are 2-3 versions behind, source compilation requires Common Lisp dependency nightmare.
Package Manager Issues
- Ubuntu packages: Typically outdated
- Homebrew: Better but has macOS compatibility issues
- Source compilation: Common Lisp dependency hell
Configuration Requirements
Basic Migration
pgloader mysql://user:pass@source-host/dbname postgresql://user:pass@target-host/dbname
Success rate: ~60% for simple cases
Production Configuration
LOAD DATABASE
FROM mysql://source-user:password@source-host/source-db
INTO postgresql://target-user:password@target-host/target-db
WITH include drop, create tables, create indexes, reset sequences,
workers = 8, concurrency = 2
SET PostgreSQL PARAMETERS
maintenance_work_mem to '1GB',
work_mem to '256MB'
SET MySQL PARAMETERS
net_read_timeout = '31536000',
net_write_timeout = '31536000',
mysql_charset = 'utf8mb4';
Critical Failure Modes
Memory Exhaustion
- Symptom: pgLoader consumes all available RAM
- Solution: Set maintenance_work_mem appropriately (1GB minimum for large DBs)
- Monitoring: Use pg_stat_activity
Network Timeouts
- Cause: Remote migrations timeout frequently
- Solution: Set MySQL timeout parameters to 31536000 seconds (1 year)
- Additional: Check TCP keepalive parameters
Character Encoding Corruption
- High-risk scenario: MySQL latin1 charset storing 'UTF-8' data
- Detection: Use
hexdump -C yourfile.csv | head
to identify encoding issues - Prevention: Always set
mysql_charset = 'utf8mb4'
Data Validation Failures
- MySQL invalid data:
0000-00-00
dates, empty strings in NOT NULL columns - PostgreSQL rejection: Rightfully rejects invalid data
- Hidden danger: 'Successful' migrations with silently dropped data
- Validation required: Always check .reject.dat files
Performance Expectations vs Reality
Advertised vs Actual Performance
- Marketing claim: Fast bulk loading
- Reality factors: Network latency, source DB performance, data quality
- Time estimation rule: Plan for 2x estimated time minimum
- Large database example: 100GB database = 6-12 hours (not 2 hours estimated)
Resource Requirements
- RAM: 1GB+ maintenance_work_mem for large databases
- Network: Stable connection critical for remote migrations
- Storage: Additional space for reject files and temporary data
- Expertise: Technical knowledge required, not point-and-click
Migration Scope Limitations
What pgLoader Handles
- Data migration: Tables, indexes, foreign keys, sequences
- Schema conversion: Automatic type mapping
- Supported sources: MySQL, SQLite, Oracle, MSSQL, CSV, compressed archives
- Bulk operations: Efficient large-scale data transfer
What pgLoader Cannot Handle
- Business logic: Stored procedures, triggers, functions require manual conversion
- Custom data types: Database-specific extensions need manual handling
- Incremental updates: No built-in change data capture
- Resume capability: No checkpoint/restart functionality
Comparative Tool Analysis
Tool | Strength | Weakness | Best Use Case |
---|---|---|---|
pgLoader | Fast bulk loading, handles bad data gracefully | Config syntax, no stored procedures | MySQL/Oracle/MSSQL → PostgreSQL |
pg_dump/pg_restore | Bulletproof reliability | PostgreSQL-only, no transformation | PostgreSQL → PostgreSQL exact replication |
ora2pg | Oracle complexity handling | Perl-based, scattered documentation | Complex Oracle with stored procedures |
AWS DMS | Enterprise support | Vendor lock-in, cost | Enterprise with deep pockets |
Production Readiness Assessment
Risk Factors
- Data corruption risk: Silent data loss from unvalidated reject files
- Downtime risk: No resume capability means restart on failure
- Expertise requirement: Technical knowledge essential for troubleshooting
- Testing requirement: Extensive validation necessary before production use
Success Factors
- Continuous testing: Run migrations repeatedly during development
- Validation procedures: Row counts, checksums, application testing
- Rollback planning: Full backup and restore procedures
- Monitoring setup: PostgreSQL stats, reject file monitoring
Real-World Failure Scenarios
Case Study: Silent Data Corruption
- Scenario: 'Successful' migration with 18,000 rejected records
- Root cause: MySQL latin1 charset storing UTF-8 data
- Impact: 3 months chasing 'mysterious' application bugs
- Resolution: Database rebuild from backups
- Prevention: Mandatory reject file validation
Case Study: Performance Degradation
- Scenario: 200GB migration, estimated 4 hours, actual 11 hours
- Root cause: MySQL timeouts on large tables, 15 different character encodings
- Result: 47,000 rejected rows requiring manual cleanup
- Lesson: Plan for 2-3x estimated time, validate source data quality
Time and Resource Investment
Development Phase
- Learning curve: 1-2 days for basic competency
- Testing setup: 2-5 days for comprehensive validation
- Configuration tuning: 1-3 days depending on complexity
Migration Execution
- Small databases (<10GB): 2-6 hours including validation
- Medium databases (10-100GB): 6-24 hours
- Large databases (>100GB): Days to weeks, consider chunked approach
Post-Migration
- Validation time: 25-50% of migration time
- Issue resolution: Highly variable based on data quality
- Application testing: Plan for extensive QA cycle
Critical Success Checklist
Pre-Migration
- Source data quality assessment
- Character encoding verification
- Network timeout configuration
- PostgreSQL memory settings
- Test migration on copy database
During Migration
- Monitor reject file generation
- Track memory usage
- Monitor network stability
- Log migration progress
Post-Migration
- Validate row counts
- Check referential integrity
- Review all reject files
- Test application functionality
- Performance baseline comparison
Support Resources
Primary Documentation
- pgLoader ReadTheDocs: Official documentation with practical examples
- MySQL Casting Rules: Essential for debugging rejected data
- GitHub Issues: Real-world problem solutions
Community Knowledge
- Percona Migration Blog: Production-tested procedures
- Stack Overflow: Theoretical answers (limited practical value)
- Vendor forums: Database-specific migration experiences
Decision Criteria
Choose pgLoader When
- Migrating from MySQL/Oracle/MSSQL to PostgreSQL
- Need automatic schema conversion
- Have technical expertise available
- Budget constraints favor open source
- Can tolerate learning curve and manual troubleshooting
Choose Commercial Alternative When
- Limited technical expertise
- Critical migration with tight deadlines
- Need vendor support and SLA
- Budget allows for commercial tools
- Risk tolerance is low
Choose Manual Migration When
- Complex business logic requirements
- Unique data transformation needs
- Small dataset size
- Full control over migration process required
Useful Links for Further Investigation
Resources That Actually Help (Unlike Most Documentation)
Link | Description |
---|---|
pgLoader ReadTheDocs | The only docs that matter. Start with the MySQL section because that's what you're probably migrating from. The type conversion tables actually saved my ass when debugging charset issues. Skip the theoretical bullshit and go straight to the examples. |
GitHub Issues | Where you'll find solutions to the weird problems the docs don't cover. Search here first when stuff breaks. Real people posting real problems with real solutions. Much better than Stack Overflow's theoretical answers from people who never migrated anything bigger than a toy database. |
Percona Migration Blog | One of the few tutorials written by someone who actually did a real migration, not a toy example. Covers the gotchas like charset encoding and timeout issues that matter in production. These guys know their shit. |
MySQL Casting Rules | Bookmark this page. You'll need it when your migration 'succeeds' but produces garbage data. Shows exactly what pgLoader does with MySQL's weird data types and invalid values. Essential for debugging rejected rows. |
Related Tools & Recommendations
MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend
integrates with postgresql
AWS Database Migration Service - When You Need to Move Your Database Without Getting Fired
competes with AWS Database Migration Service
Why I Finally Dumped Cassandra After 5 Years of 3AM Hell
integrates with MongoDB
I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too
Four Months of Pain, 47k Lost Sessions, and What Actually Works
MySQL Replication - How to Keep Your Database Alive When Shit Goes Wrong
compatible with MySQL Replication
MySQL Alternatives That Don't Suck - A Migration Reality Check
Oracle's 2025 Licensing Squeeze and MySQL's Scaling Walls Are Forcing Your Hand
SQLite - The Database That Just Works
Zero Configuration, Actually Works
SQLite Performance: When It All Goes to Shit
Your database was fast yesterday and slow today. Here's why.
PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life
compatible with sqlite
SQL Server 2025 - Vector Search Finally Works (Sort Of)
compatible with Microsoft SQL Server 2025
Docker Alternatives That Won't Break Your Budget
Docker got expensive as hell. Here's how to escape without breaking everything.
GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus
How to Wire Together the Modern DevOps Stack Without Losing Your Sanity
I Tested 5 Container Security Scanners in CI/CD - Here's What Actually Works
Trivy, Docker Scout, Snyk Container, Grype, and Clair - which one won't make you want to quit DevOps
Fix Kubernetes ImagePullBackOff Error - The Complete Battle-Tested Guide
From "Pod stuck in ImagePullBackOff" to "Problem solved in 90 seconds"
Fix Git Checkout Branch Switching Failures - Local Changes Overwritten
When Git checkout blocks your workflow because uncommitted changes are in the way - battle-tested solutions for urgent branch switching
YNAB API - Grab Your Budget Data Programmatically
REST API for accessing YNAB budget data - perfect for automation and custom apps
NVIDIA Earnings Become Crucial Test for AI Market Amid Tech Sector Decline - August 23, 2025
Wall Street focuses on NVIDIA's upcoming earnings as tech stocks waver and AI trade faces critical evaluation with analysts expecting 48% EPS growth
Longhorn - Distributed Storage for Kubernetes That Doesn't Suck
Explore Longhorn, the distributed block storage solution for Kubernetes. Understand its architecture, installation steps, and system requirements for your clust
How to Set Up SSH Keys for GitHub Without Losing Your Mind
Tired of typing your GitHub password every fucking time you push code?
Braintree - PayPal's Payment Processing That Doesn't Suck
The payment processor for businesses that actually need to scale (not another Stripe clone)
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization