Currently viewing the AI version
Switch to human version

pgLoader: Database Migration Tool - AI-Optimized Reference

Core Function

pgLoader migrates data from MySQL, SQLite, Oracle, and MSSQL databases to PostgreSQL with automatic schema conversion and parallel bulk loading.

Critical Success Factors

Performance Architecture

  • Uses PostgreSQL COPY protocol (not row-by-row INSERTs)
  • Parallel processing: Start with 2-4 workers, increase if network/hardware supports
  • Bulk loading capability: Handles millions of rows efficiently
  • Memory requirement: Minimum 1GB maintenance_work_mem for large databases

Error Handling Design

  • Rejection files: Bad data saved to .reject.dat and .reject.log files instead of crashing
  • Transactional: Migration either completes fully or rolls back completely
  • No resume capability: Failed migrations restart from beginning

Installation Reality

Recommended Approach

docker pull ghcr.io/dimitri/pgloader:latest

Why Docker wins: Package manager versions are 2-3 versions behind, source compilation requires Common Lisp dependency nightmare.

Package Manager Issues

  • Ubuntu packages: Typically outdated
  • Homebrew: Better but has macOS compatibility issues
  • Source compilation: Common Lisp dependency hell

Configuration Requirements

Basic Migration

pgloader mysql://user:pass@source-host/dbname postgresql://user:pass@target-host/dbname

Success rate: ~60% for simple cases

Production Configuration

LOAD DATABASE
    FROM mysql://source-user:password@source-host/source-db
    INTO postgresql://target-user:password@target-host/target-db

WITH include drop, create tables, create indexes, reset sequences,
     workers = 8, concurrency = 2

SET PostgreSQL PARAMETERS
    maintenance_work_mem to '1GB',
    work_mem to '256MB'

SET MySQL PARAMETERS
    net_read_timeout  = '31536000',
    net_write_timeout = '31536000',
    mysql_charset = 'utf8mb4';

Critical Failure Modes

Memory Exhaustion

  • Symptom: pgLoader consumes all available RAM
  • Solution: Set maintenance_work_mem appropriately (1GB minimum for large DBs)
  • Monitoring: Use pg_stat_activity

Network Timeouts

  • Cause: Remote migrations timeout frequently
  • Solution: Set MySQL timeout parameters to 31536000 seconds (1 year)
  • Additional: Check TCP keepalive parameters

Character Encoding Corruption

  • High-risk scenario: MySQL latin1 charset storing 'UTF-8' data
  • Detection: Use hexdump -C yourfile.csv | head to identify encoding issues
  • Prevention: Always set mysql_charset = 'utf8mb4'

Data Validation Failures

  • MySQL invalid data: 0000-00-00 dates, empty strings in NOT NULL columns
  • PostgreSQL rejection: Rightfully rejects invalid data
  • Hidden danger: 'Successful' migrations with silently dropped data
  • Validation required: Always check .reject.dat files

Performance Expectations vs Reality

Advertised vs Actual Performance

  • Marketing claim: Fast bulk loading
  • Reality factors: Network latency, source DB performance, data quality
  • Time estimation rule: Plan for 2x estimated time minimum
  • Large database example: 100GB database = 6-12 hours (not 2 hours estimated)

Resource Requirements

  • RAM: 1GB+ maintenance_work_mem for large databases
  • Network: Stable connection critical for remote migrations
  • Storage: Additional space for reject files and temporary data
  • Expertise: Technical knowledge required, not point-and-click

Migration Scope Limitations

What pgLoader Handles

  • Data migration: Tables, indexes, foreign keys, sequences
  • Schema conversion: Automatic type mapping
  • Supported sources: MySQL, SQLite, Oracle, MSSQL, CSV, compressed archives
  • Bulk operations: Efficient large-scale data transfer

What pgLoader Cannot Handle

  • Business logic: Stored procedures, triggers, functions require manual conversion
  • Custom data types: Database-specific extensions need manual handling
  • Incremental updates: No built-in change data capture
  • Resume capability: No checkpoint/restart functionality

Comparative Tool Analysis

Tool Strength Weakness Best Use Case
pgLoader Fast bulk loading, handles bad data gracefully Config syntax, no stored procedures MySQL/Oracle/MSSQL → PostgreSQL
pg_dump/pg_restore Bulletproof reliability PostgreSQL-only, no transformation PostgreSQL → PostgreSQL exact replication
ora2pg Oracle complexity handling Perl-based, scattered documentation Complex Oracle with stored procedures
AWS DMS Enterprise support Vendor lock-in, cost Enterprise with deep pockets

Production Readiness Assessment

Risk Factors

  • Data corruption risk: Silent data loss from unvalidated reject files
  • Downtime risk: No resume capability means restart on failure
  • Expertise requirement: Technical knowledge essential for troubleshooting
  • Testing requirement: Extensive validation necessary before production use

Success Factors

  • Continuous testing: Run migrations repeatedly during development
  • Validation procedures: Row counts, checksums, application testing
  • Rollback planning: Full backup and restore procedures
  • Monitoring setup: PostgreSQL stats, reject file monitoring

Real-World Failure Scenarios

Case Study: Silent Data Corruption

  • Scenario: 'Successful' migration with 18,000 rejected records
  • Root cause: MySQL latin1 charset storing UTF-8 data
  • Impact: 3 months chasing 'mysterious' application bugs
  • Resolution: Database rebuild from backups
  • Prevention: Mandatory reject file validation

Case Study: Performance Degradation

  • Scenario: 200GB migration, estimated 4 hours, actual 11 hours
  • Root cause: MySQL timeouts on large tables, 15 different character encodings
  • Result: 47,000 rejected rows requiring manual cleanup
  • Lesson: Plan for 2-3x estimated time, validate source data quality

Time and Resource Investment

Development Phase

  • Learning curve: 1-2 days for basic competency
  • Testing setup: 2-5 days for comprehensive validation
  • Configuration tuning: 1-3 days depending on complexity

Migration Execution

  • Small databases (<10GB): 2-6 hours including validation
  • Medium databases (10-100GB): 6-24 hours
  • Large databases (>100GB): Days to weeks, consider chunked approach

Post-Migration

  • Validation time: 25-50% of migration time
  • Issue resolution: Highly variable based on data quality
  • Application testing: Plan for extensive QA cycle

Critical Success Checklist

Pre-Migration

  • Source data quality assessment
  • Character encoding verification
  • Network timeout configuration
  • PostgreSQL memory settings
  • Test migration on copy database

During Migration

  • Monitor reject file generation
  • Track memory usage
  • Monitor network stability
  • Log migration progress

Post-Migration

  • Validate row counts
  • Check referential integrity
  • Review all reject files
  • Test application functionality
  • Performance baseline comparison

Support Resources

Primary Documentation

  • pgLoader ReadTheDocs: Official documentation with practical examples
  • MySQL Casting Rules: Essential for debugging rejected data
  • GitHub Issues: Real-world problem solutions

Community Knowledge

  • Percona Migration Blog: Production-tested procedures
  • Stack Overflow: Theoretical answers (limited practical value)
  • Vendor forums: Database-specific migration experiences

Decision Criteria

Choose pgLoader When

  • Migrating from MySQL/Oracle/MSSQL to PostgreSQL
  • Need automatic schema conversion
  • Have technical expertise available
  • Budget constraints favor open source
  • Can tolerate learning curve and manual troubleshooting

Choose Commercial Alternative When

  • Limited technical expertise
  • Critical migration with tight deadlines
  • Need vendor support and SLA
  • Budget allows for commercial tools
  • Risk tolerance is low

Choose Manual Migration When

  • Complex business logic requirements
  • Unique data transformation needs
  • Small dataset size
  • Full control over migration process required

Useful Links for Further Investigation

Resources That Actually Help (Unlike Most Documentation)

LinkDescription
pgLoader ReadTheDocsThe only docs that matter. Start with the MySQL section because that's what you're probably migrating from. The type conversion tables actually saved my ass when debugging charset issues. Skip the theoretical bullshit and go straight to the examples.
GitHub IssuesWhere you'll find solutions to the weird problems the docs don't cover. Search here first when stuff breaks. Real people posting real problems with real solutions. Much better than Stack Overflow's theoretical answers from people who never migrated anything bigger than a toy database.
Percona Migration BlogOne of the few tutorials written by someone who actually did a real migration, not a toy example. Covers the gotchas like charset encoding and timeout issues that matter in production. These guys know their shit.
MySQL Casting RulesBookmark this page. You'll need it when your migration 'succeeds' but produces garbage data. Shows exactly what pgLoader does with MySQL's weird data types and invalid values. Essential for debugging rejected rows.

Related Tools & Recommendations

compare
Recommended

MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend

integrates with postgresql

postgresql
/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
100%
tool
Recommended

AWS Database Migration Service - When You Need to Move Your Database Without Getting Fired

competes with AWS Database Migration Service

AWS Database Migration Service
/tool/aws-database-migration-service/overview
58%
alternatives
Recommended

Why I Finally Dumped Cassandra After 5 Years of 3AM Hell

integrates with MongoDB

MongoDB
/alternatives/mongodb-postgresql-cassandra/cassandra-operational-nightmare
57%
howto
Recommended

I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too

Four Months of Pain, 47k Lost Sessions, and What Actually Works

MongoDB
/howto/migrate-mongodb-to-postgresql/complete-migration-guide
57%
tool
Recommended

MySQL Replication - How to Keep Your Database Alive When Shit Goes Wrong

compatible with MySQL Replication

MySQL Replication
/tool/mysql-replication/overview
57%
alternatives
Recommended

MySQL Alternatives That Don't Suck - A Migration Reality Check

Oracle's 2025 Licensing Squeeze and MySQL's Scaling Walls Are Forcing Your Hand

MySQL
/alternatives/mysql/migration-focused-alternatives
57%
tool
Recommended

SQLite - The Database That Just Works

Zero Configuration, Actually Works

SQLite
/tool/sqlite/overview
57%
tool
Recommended

SQLite Performance: When It All Goes to Shit

Your database was fast yesterday and slow today. Here's why.

SQLite
/tool/sqlite/performance-optimization
57%
compare
Recommended

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life

compatible with sqlite

sqlite
/compare/postgresql-mysql-mariadb-sqlite-cockroachdb/database-decision-guide
57%
tool
Recommended

SQL Server 2025 - Vector Search Finally Works (Sort Of)

compatible with Microsoft SQL Server 2025

Microsoft SQL Server 2025
/tool/microsoft-sql-server-2025/overview
57%
alternatives
Recommended

Docker Alternatives That Won't Break Your Budget

Docker got expensive as hell. Here's how to escape without breaking everything.

Docker
/alternatives/docker/budget-friendly-alternatives
52%
integration
Recommended

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

How to Wire Together the Modern DevOps Stack Without Losing Your Sanity

docker
/integration/docker-kubernetes-argocd-prometheus/gitops-workflow-integration
52%
compare
Recommended

I Tested 5 Container Security Scanners in CI/CD - Here's What Actually Works

Trivy, Docker Scout, Snyk Container, Grype, and Clair - which one won't make you want to quit DevOps

docker
/compare/docker-security/cicd-integration/docker-security-cicd-integration
52%
troubleshoot
Popular choice

Fix Kubernetes ImagePullBackOff Error - The Complete Battle-Tested Guide

From "Pod stuck in ImagePullBackOff" to "Problem solved in 90 seconds"

Kubernetes
/troubleshoot/kubernetes-imagepullbackoff/comprehensive-troubleshooting-guide
50%
troubleshoot
Popular choice

Fix Git Checkout Branch Switching Failures - Local Changes Overwritten

When Git checkout blocks your workflow because uncommitted changes are in the way - battle-tested solutions for urgent branch switching

Git
/troubleshoot/git-local-changes-overwritten/branch-switching-checkout-failures
48%
tool
Popular choice

YNAB API - Grab Your Budget Data Programmatically

REST API for accessing YNAB budget data - perfect for automation and custom apps

YNAB API
/tool/ynab-api/overview
46%
news
Popular choice

NVIDIA Earnings Become Crucial Test for AI Market Amid Tech Sector Decline - August 23, 2025

Wall Street focuses on NVIDIA's upcoming earnings as tech stocks waver and AI trade faces critical evaluation with analysts expecting 48% EPS growth

GitHub Copilot
/news/2025-08-23/nvidia-earnings-ai-market-test
43%
tool
Popular choice

Longhorn - Distributed Storage for Kubernetes That Doesn't Suck

Explore Longhorn, the distributed block storage solution for Kubernetes. Understand its architecture, installation steps, and system requirements for your clust

Longhorn
/tool/longhorn/overview
41%
howto
Popular choice

How to Set Up SSH Keys for GitHub Without Losing Your Mind

Tired of typing your GitHub password every fucking time you push code?

Git
/howto/setup-git-ssh-keys-github/complete-ssh-setup-guide
39%
tool
Popular choice

Braintree - PayPal's Payment Processing That Doesn't Suck

The payment processor for businesses that actually need to scale (not another Stripe clone)

Braintree
/tool/braintree/overview
37%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization