AWS Database Migration Service: Production Reality & Operational Intelligence
Executive Summary
AWS DMS is a managed ETL service for database migrations that works reliably for simple homogeneous migrations but becomes exponentially more complex and expensive for heterogeneous migrations. Real production costs typically run 2-3x initial estimates due to undisclosed data transfer fees and extended troubleshooting time.
Configuration That Actually Works in Production
Instance Sizing Requirements (Based on Real Experience)
- < 100GB databases: dms.r6i.large ($0.30/hour minimum)
- 100GB-1TB databases: dms.r6i.xlarge ($0.60/hour)
- > 1TB databases: dms.r6i.2xlarge ($1.20/hour)
- Multi-AZ deployment: Costs 2x but prevents single points of failure during critical migrations
Critical Task Settings
MaxFullLoadSubTasks: 16 (for large tables)
TransactionConsistencyTimeout: Increase for long-running transactions
ParallelLoadThreads: Enable for multiple smaller tables
Network Architecture Requirements
- VPC connectivity setup: Budget 2-3 weeks for corporate firewall/VPN integration
- Bandwidth planning: Network latency matters more than bandwidth (50ms latency bottlenecks 1Gbps connections)
- Port requirements: Coordinate with security teams early - principle of least privilege will force redesigns
Real Cost Structure
Base Costs
- Instance costs: $0.018/hour (dms.t3.micro - inadequate for production) to $3.50/hour (dms.r6i.xlarge)
- Data transfer fees (the hidden cost multiplier):
- Cross-region: $0.09/GB
- Cross-AZ: $0.01/GB
- Example: 1TB cross-region migration = $90 in transfer fees alone
Actual Cost Examples
- Quoted cost: $300 for 200GB migration
- Final AWS bill: $1,247
- Cost multiplier: 4.2x due to data transfer fees and troubleshooting time
Migration Pattern Success Rates
Migration Type | Success Rate | Time Multiplier | Major Pain Points |
---|---|---|---|
MySQL → MySQL | High | 1x | Aurora-specific settings conflicts |
Oracle → Oracle | High | 1.2x | Network connectivity issues |
Oracle → PostgreSQL | Medium | 3-6x | PL/SQL conversion, custom data types |
SQL Server → PostgreSQL | Low | 6-12x | T-SQL to PL/pgSQL rewrite required |
MongoDB complex schemas | Low | 4-8x | Document structure edge cases |
Critical Failure Modes
High-Severity Failures (Mission-Critical Impact)
- CDC replication stops during high transaction periods: 5-10 minute lag spikes during peak loads
- Memory exhaustion on replication instances: Complete task failure requiring restart and 8+ hour recovery
- Network connectivity timeouts: Corporate VPN/firewall issues cause complete migration failure
Medium-Severity Issues (Extended Timeline Impact)
- Schema conversion failures: 30-50% manual conversion required for Oracle with custom packages
- Data validation edge cases: Subtle data corruption not caught by built-in validation
- AWS instance health issues: Random instance failures require Multi-AZ deployment
Resource Requirements & Time Investment
Skill Requirements
- Network engineering expertise: Essential for corporate connectivity issues
- Database-specific knowledge: Oracle PL/SQL, T-SQL conversion expertise required
- AWS operations experience: CloudWatch monitoring, troubleshooting essential
Time Investment Reality
- Planning phase: 2-3 weeks for network architecture alone
- Small migrations (< 100GB): 2-8 hours if nothing breaks, 2-3 days when issues occur
- Medium migrations (100GB-1TB): 6-24 hours planned, 1-2 weeks actual with troubleshooting
- Large migrations (> 1TB): Days to weeks, often 3+ months including application testing
Human Resource Costs
- Weekend troubleshooting: Expect 2-3 weekends of 2am debugging sessions
- Consultant requirements: Enterprise Oracle migrations typically require specialized consultants
- Opportunity cost: Development team blocked during migration windows
Performance Thresholds & Breaking Points
Memory Limits
- dms.r6i.large fails at: 500GB+ table full loads
- Symptom: Out of memory errors during bulk operations
- Solution: Scale to dms.r6i.xlarge minimum for large tables
CDC Replication Limits
- Normal lag: 5-10 seconds
- High load lag: 2-3 minutes during batch operations
- Breaking point: 5-10 minute delays during peak transaction periods (e.g., Black Friday sales)
Network Performance
- Latency impact: 50ms latency bottlenecks high-bandwidth connections
- Transfer speed: Limited by latency, not bandwidth capacity
Schema Conversion Reality
"90% Automation" Marketing vs Reality
- Vanilla databases: 70-80% actual automation
- Oracle with custom packages: 30-50% automation
- SQL Server with CLR assemblies: Near-zero automation
- MongoDB complex documents: Requires extensive manual testing
Specific Conversion Failures
- Oracle PL/SQL packages: Manual rewrite required
- Custom data types: Not supported by automated conversion
- Triggers with complex logic: Manual recreation necessary
- Stored procedures: Database-specific functionality requires rewrites
Monitoring & Operational Intelligence
Critical CloudWatch Metrics
CDCLatencySource: Monitor for > 60 seconds
CDCLatencyTarget: Monitor for > 60 seconds
FreeableMemory: Alert when < 1GB remaining
FullLoadThroughput: Track for completion estimates
Failure Indicators
- Memory trending to zero: Imminent task failure
- CDC lag increasing steadily: Replication falling behind
- Connection timeout patterns: Network infrastructure issues
Decision Criteria & Alternatives Comparison
When DMS Makes Sense
- Homogeneous migrations: Same database engine (MySQL → MySQL)
- Simple schemas: Basic CRUD operations, minimal stored procedures
- Small to medium databases: < 1TB with straightforward connectivity
- Time availability: Can accommodate 2-3x timeline extensions
When to Avoid DMS
- Complex heterogeneous migrations: Oracle → PostgreSQL with custom packages
- Tight migration windows: Cannot accommodate troubleshooting delays
- Complex network environments: Multiple VPN hops, restrictive firewalls
- Mission-critical systems: Cannot risk extended downtime from failures
Alternative Solutions
- Oracle GoldenGate: Oracle-to-Oracle migrations (expensive licensing)
- Native dump/restore: Acceptable downtime, better reliability
- Azure Database Migration: Better for SQL Server migrations
- Manual ETL development: Complex transformations, full control
Breaking Point Warnings
Network Connectivity
- Corporate VPN complexity: Plan minimum 2-3 weeks for firewall coordination
- AWS support response: "Try turning off your firewall" - not actionable for enterprise environments
- Security group configuration: Too many open ports will require security team redesign
Data Volume Scaling
- UI becomes unusable: > 1000 spans makes debugging impossible
- Memory exhaustion: 500GB+ tables crash dms.r6i.large instances
- CDC lag accumulation: High transaction volumes cause exponential lag buildup
Cost Escalation Triggers
- Cross-region data transfer: $0.09/GB can multiply costs by 4x
- Extended troubleshooting time: Weekend debugging sessions add significant consultant costs
- Instance uptime during delays: Hourly costs accumulate during extended troubleshooting
Recovery & Contingency Procedures
Nuclear Option (When Everything Fails)
- Export task configuration (critical step before destruction)
- Delete entire replication instance
- Recreate from scratch (5 minutes if AWS cooperates, 2 hours during AWS service issues)
- This fixes 90% of weird state issues that restart cannot resolve
Backup Strategies
- Schema backup: Export converted schemas before starting migration
- Data validation scripts: Prepare row count and checksum comparisons
- Rollback procedures: Document exact steps for production rollback
- Communication plan: Stakeholder notification for extended downtime scenarios
Migration Planning Checklist
Pre-Migration (2-3 months before)
- Network architecture review and firewall coordination
- Schema conversion assessment (realistic automation expectations)
- Cost estimation with 3x buffer for data transfer fees
- Backup and rollback procedure documentation
- Team training on CloudWatch monitoring and troubleshooting
During Migration
- Multi-AZ replication instance deployment (non-negotiable for production)
- Real-time CDC lag monitoring (< 60 seconds acceptable)
- Memory utilization alerts (< 1GB remaining triggers scaling)
- Network connectivity monitoring (timeout pattern detection)
- Manual data validation on critical business tables
Post-Migration
- Application performance testing (Aurora query optimizer differences)
- Business logic validation (data type conversion impacts)
- Extended monitoring period (2-4 weeks for edge case detection)
- Cost analysis and documentation for future migrations
This operational intelligence represents real production experience with AWS DMS, including failure modes, cost multipliers, and decision criteria that official documentation does not adequately address.
Useful Links for Further Investigation
Resources That Actually Help (When DMS Breaks)
Link | Description |
---|---|
AWS DMS User Guide | Official documentation for AWS DMS, comprehensive but may not cover all edge cases like Oracle's CLOB handling issues. |
DMS Best Practices | Essential reading before starting any DMS migration to avoid common pitfalls and save weeks of troubleshooting later. |
Troubleshooting Guide | A critical resource for resolving issues during DMS migrations, especially useful during unexpected outages or errors. |
DMS Step-by-Step Guides | Practical walkthroughs that provide clear, actionable steps for various AWS DMS configurations and use cases. |
Enhanced Monitoring Dashboard | Crucial for effectively monitoring the progress and health of large-scale database migrations using AWS DMS. |
Virtual Target Mode Guide | Introduces a recent feature designed to accelerate migration planning by allowing schema conversion without provisioning a target database. |
DMS Pricing Calculator | Provides estimates for AWS DMS costs, though it's advisable to add a significant buffer for more realistic budget planning. |
Cost Optimization Guide Part 1 | Offers genuinely useful tips and strategies to avoid budget overruns and manage expenses effectively during DMS migrations. |
Cost Optimization Guide Part 2 | Further guidance on maintaining reasonable costs for AWS DMS, helping to prevent unexpected and high billing statements. |
AWS Free Tier | Provides 750 hours of dms.t3.micro, which typically covers about 2 hours of real work, or maybe 3 if you're lucky. |
AWS re:Post DMS Questions | A collection of real problems and solutions from people who have encountered issues with AWS DMS. |
AWS Database Migration Samples | Provides working code examples for AWS Database Migration Service, offering practical implementations rather than theoretical marketing content. |
DMS SQL Server Samples | Contains specific code examples and configurations tailored for SQL Server database migrations using AWS DMS. |
DMS Terraform Examples | Offers infrastructure as code examples using Terraform for setting up and managing AWS DMS configurations. |
AWS DMS Stack Overflow Questions | A community-driven platform featuring real developer problems and practical solutions related to AWS Database Migration Service. |
AWS CLI DMS Commands | Documentation for command line interface tools specifically designed for managing and automating AWS DMS migrations. |
CloudFormation DMS Templates | Provides infrastructure as code templates using AWS CloudFormation for deploying and managing AWS DMS resources. |
Terraform DMS Module | A community-maintained Terraform module for provisioning and managing AWS DMS resources efficiently. |
DMS CDK Constructs | Offers AWS CDK constructs for programmatically defining and deploying AWS DMS setups and configurations. |
AWS DMS Service Partners | A directory of AWS partners who possess specialized expertise and experience in performing AWS DMS migrations. |
AWS Migration Consulting Partners | Provides a list of professional migration services that, while potentially expensive, are highly knowledgeable about common pitfalls and complexities in AWS migrations. |
Database Migration Consulting | Offers access to third-party consultants available through the AWS Marketplace who have extensive experience with various database migration scenarios. |
DMS CloudWatch Metrics | Details the essential Amazon CloudWatch metrics that should be monitored closely during AWS DMS migrations for performance and health. |
DMS Key Troubleshooting Metrics | A blog post highlighting key troubleshooting metrics and performance enhancers for AWS DMS, potentially saving significant debugging time. |
Setting Up CloudWatch Alarms | Guide on automating monitoring for AWS DMS resources by setting up Amazon CloudWatch alarms using the AWS CLI, proactively preventing issues. |
AWS Database Specialty Certification | Information about the AWS Database Specialty Certification, which covers DMS topics and is useful for gaining a deeper understanding of its internal workings. |
DMS Immersion Day Workshop | A hands-on workshop providing practical training and real-world scenarios for working with AWS Database Migration Service. |
Related Tools & Recommendations
PostgreSQL vs MySQL vs MongoDB vs Cassandra vs DynamoDB - Database Reality Check
Most database comparisons are written by people who've never deployed shit in production at 3am
MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide
Migrate MySQL to PostgreSQL without destroying your career (probably)
MySQL Alternatives That Don't Suck - A Migration Reality Check
Oracle's 2025 Licensing Squeeze and MySQL's Scaling Walls Are Forcing Your Hand
Azure Migrate - Microsoft's Tool for Moving Your Crap to the Cloud
Microsoft's free migration tool that actually works - helps you discover what you have on-premises, figure out what it'll cost in Azure, and move it without bre
Airbyte - Stop Your Data Pipeline From Shitting The Bed
Tired of debugging Fivetran at 3am? Airbyte actually fucking works
PostgreSQL WAL Tuning - Stop Getting Paged at 3AM
The WAL configuration guide for engineers who've been burned by shitty defaults
SQL Server 2025 - Vector Search Finally Works (Sort Of)
compatible with Microsoft SQL Server 2025
Fivetran: Expensive Data Plumbing That Actually Works
Data integration for teams who'd rather pay than debug pipelines at 3am
Oracle GoldenGate - Database Replication That Actually Works
Database replication for enterprises who can afford Oracle's pricing
How These Database Platforms Will Fuck Your Budget
compatible with MongoDB Atlas
MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend
compatible with mongodb
Kafka Will Fuck Your Budget - Here's the Real Cost
Don't let "free and open source" fool you. Kafka costs more than your mortgage.
Apache Kafka - The Distributed Log That LinkedIn Built (And You Probably Don't Need)
compatible with Apache Kafka
Debezium - Database Change Capture Without the Pain
Watches your database and streams changes to Kafka. Works great until it doesn't.
Zero Downtime Database Migration: Don't Get Fired Edition
How to Migrate Your Production Database Without Getting Fired (Or Losing Your Mind)
Azure - Microsoft's Cloud Platform (The Good, Bad, and Expensive)
Explore Microsoft Azure's cloud platform, its key services, and real-world usage. Get a candid look at Azure's pros, cons, and costs, plus comparisons to AWS an
Database Migration Tool Pricing is a Fucking Minefield
The vendors lie about costs, documentation is garbage, and every migration takes 3x longer than promised. Here's what actually happens when you try to move ente
Why I Finally Dumped Cassandra After 5 Years of 3AM Hell
Discover the operational challenges of Apache Cassandra, from GC overhead errors to 3 AM alerts. Learn from real-world migration experiences to MongoDB & Postgr
MySQL Alternatives - Time to Jump Ship?
MySQL silently corrupted our production data for the third time this year. That's when I started seriously looking at alternatives.
Database Migration Without Losing Your Shit (2025 Tools That Actually Work)
Stop Breaking Production - New Tools That Don't Suck
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization