AWS Database Migration Service - When You Need to Move Your Database Without Getting Fired

What DMS Actually Is (And What It Costs You)

DMS is AWS's database migration service. Sometimes it works great, sometimes you'll want to throw your laptop out the window. The marketing says it's migrated "1.5 million databases" but they don't mention how many of those took 3x longer than expected or made developers cry over data transfer bills.

AWS Database Migration Service Architecture

DMS is basically a fancy ETL tool running on EC2 instances you pay for by the hour. Reads from your old database, transforms shit if needed, writes to your new database. Simple concept, but the devil's in the 47 different configuration parameters AWS didn't bother explaining properly.

How It Actually Works

Setup's pretty straightforward - endpoints for source and target databases, spin up a replication instance, create a migration task. The replication instance is just an EC2 box running DMS software that does the actual work.

Full Load: Copies all your existing data. Works fine for small databases under 100GB. Anything bigger and you're looking at hours or days depending on your network.

CDC (Change Data Capture): Keeps reading transaction logs from your source database and applying changes to the target. This is where it gets tricky - CDC lag can spike during high transaction periods, and you'll need to monitor CloudWatch metrics like a hawk.

Full Load + CDC: What you'll actually end up doing. Bulk copy first, then flip on CDC to keep things in sync while you test and eventually cut over.

What Works (And What Doesn't)

Homogeneous migrations work well: MySQL to MySQL, Oracle to Oracle. The schema conversion tool handles basic stuff automatically.

Heterogeneous migrations are where things get expensive: Oracle to PostgreSQL sounds simple until you hit edge cases with stored procedures, custom data types, or triggers. The Schema Conversion Tool claims "90% automation" - that's complete bullshit unless your database is vanilla as unsalted crackers.

Network connectivity will make you want to quit tech and raise goats: Getting DMS to talk through corporate firewalls, VPNs, and security groups is half the battle. I've seen grown developers cry trying to get this shit working through a corporate proxy.

Version-Specific Gotchas

DMS 3.6.1 (May 2024) finally added PostgreSQL 17 support and IAM database auth for MySQL/PostgreSQL. DMS 3.5.4 introduced data masking which is actually useful for compliance, but also fucked up some existing transformations for Oracle → PostgreSQL migrations. Learned that one the hard way during a weekend cutover.

Data Resync capability got added recently - saves your ass when you need to re-sync specific tables without starting over. Virtual Target Mode in Schema Conversion lets you start migration planning without spinning up target databases first - actually saves money during the "will this work?" phase.

AWS DMS Architecture Diagram

The current architecture supports about 20 database engines, but "supports" and "works reliably in production" are different things. MySQL, PostgreSQL, and Oracle tend to be the most stable. MongoDB support exists but has quirks with complex document structures.

Useful Resources for Getting Started:

What Actually Works vs What Doesn't (Real Experience)

Aspect	AWS DMS	Azure Database Migration	Google Cloud DMS	Oracle GoldenGate
What Works	MySQL/PostgreSQL migrations, homogeneous Oracle	SQL Server to Azure SQL works well	Simple MySQL → Cloud SQL	Oracle to Oracle is bulletproof
What Breaks	Complex Oracle schemas, large MongoDB collections	Cross-platform migrations suck	Anything with triggers or stored procedures	Non-Oracle sources are painful
Real Costs	$50-500/month + data transfer fees that hurt	Similar instance costs, lower transfer	Cheapest for small migrations	License costs will bankrupt you
Setup Pain Level	Network connectivity is 50% of the work	Easier if you're already in Azure	Simpler than AWS, but fewer options	Enterprise consultants required
When It Actually Fails	CDC lag spikes, schema conversion edge cases	Complex stored procedures	Limited source database support	Works great until it doesn't
Support Reality	AWS support takes hours, community helps more	Microsoft support is hit-or-miss	GCP support responds in 2 hours, AWS support... good luck	Oracle licensing costs more than your mortgage payment

Real Production Migration Stories (What Actually Happens)

Here's what actually happens when you try to migrate databases with DMS in production. Spoiler: it's messier than the marketing bullshit suggests.

The Three Ways People Actually Use DMS

Full Load Only: Copy everything and accept downtime. We used this for our staging database migration - 200GB Oracle to PostgreSQL took about 8 hours. Would've been faster but network connectivity through our corporate firewall kept timing out.

CDC Only: Replicate changes from an existing synchronized database. This works if you've already got your data moved somehow else. Used this after a manual dump/restore to keep things in sync during testing. CDC lag averaged 10-15 seconds, spiked to 2-3 minutes during batch operations.

Full Load + CDC: What you'll actually end up doing because you're not insane enough to accept downtime. We did this for our production Oracle → Aurora PostgreSQL migration. Full load took 36 hours for 2TB. CDC kept crapping out whenever Oracle did large batch updates - had to restart the fucking task 3 times during the cutover weekend.

AWS DMS Enhanced Monitoring Dashboard

Migration Patterns That Actually Work

Legacy Oracle to Aurora PostgreSQL: Spent 3 months on this bullshit. Schema Conversion Tool got about 70% of our stored procedures right. The other 30% required manual rewrites because Oracle's PL/SQL is a special snowflake. Custom data types were a complete nightmare. Budget extra time for testing - our app had subtle bugs from datatype conversion differences that only showed up in production.

MySQL to Aurora MySQL: This one was straightforward because it's the same damn database engine. Homogeneous migration meant schema conversion was mostly automatic. Main gotchas were custom MySQL settings that Aurora doesn't support. Migration worked but app performance was different because Aurora's query optimizer thinks it knows better.

SQL Server to PostgreSQL: Don't do this unless someone's holding a gun to your head. DMS Schema Conversion choked on SQL Server-specific functions. T-SQL to PL/pgSQL conversion was basically rewriting the entire database layer. Took 6 months total including the inevitable app rewrites.

AWS DMS Source Table with Modulus Column

Network Architecture Reality Check

VPC Connectivity: Plan for this clusterfuck to take 2-3 weeks. Getting DMS to talk to your on-premises database through corporate firewalls, VPNs, and security groups is half the battle. We opened too many ports initially and security made us redo the whole thing because "principle of least privilege" or some shit.

Bandwidth Planning: Data transfer speed gets limited by network latency, not bandwidth. Our 1Gbps connection sat there doing jack shit because of 50ms latency to the source database. Learned that lesson the expensive way. Consider using AWS DataSync for initial loads if you have mountains of data.

Multi-AZ Pain Points: Multi-AZ replication instances cost double but saved our asses twice when instances randomly decided to take a nap. AWS support ticket resolution took 4 hours both times while we sat there refreshing the console like idiots. Single-AZ is cheaper but you'll regret it when shit breaks at 3am.

What Breaks in Production

Memory Issues: DMS tasks eat memory like it's going out of style during full loads. We hit memory limits on a dms.r6i.large with a 500GB table. Had to scale up to dms.r6i.xlarge and restart the whole fucking migration. Lost 8 hours.

CDC Lag Spikes: Change Data Capture lag goes to shit during high transaction periods. Our e-commerce database saw 5-10 minute delays during Black Friday sales. Monitoring CloudWatch metrics like CDCLatencySource and CDCLatencyTarget became a full-time job.

Schema Conversion Edge Cases: Custom Oracle packages, triggers with complex logic, and proprietary data types all told the automated conversion to go fuck itself. The "90% automation" claim is pure fantasy for anything beyond basic CRUD operations.

Configuration That Actually Matters

Task Settings: Default settings are conservative as fuck. We fixed performance by:

Cranking MaxFullLoadSubTasks to 16 for large tables (learned from Oracle support forums)
Bumping TransactionConsistencyTimeout for long-running transactions that kept timing out
Using ParallelLoadThreads for multiple smaller tables instead of doing them one by one like idiots

Instance Sizing: Start bigger than you think or you'll hate life. We blew through our migration window twice using undersized instances because we're cheap. A dms.r6i.xlarge cost $1.50/hour but saved 20 hours of migration time and my sanity.

Error Handling: Enable detailed logging from day one or you'll regret it. When migrations break at 2am (and they will), CloudWatch logs are your only friend. Set up CloudWatch alarms for CDC lag and error rates before shit hits the fan.

AWS DMS Performance Comparison: Single vs Multiple Tasks

DMS works, but it'll take longer and cost more than you expect. Plan accordingly.

Essential Reading for Production Migrations:

Questions You'll Actually Need Answered

How much does this shit actually cost?

Way more than you think once those sneaky data transfer fees kick in.

The "free tier" covers maybe 2-3 hours of actual work with the tiniest instance. Instance costs range from $0.018/hour (dms.t3.micro

useless for anything real) to $3.50/hour for dms.r6i.xlarge. But the real ball-kicker is data transfer: $0.09/GB cross-region, $0.01/GB cross-AZ. A 1TB migration across regions costs $90 just in transfer fees before you even start. Budget 2-3x your initial estimate or prepare to explain the overage to your boss.

Will this work for my shitty legacy database?

Probably not without you crying into your coffee at 3am fixing things manually. The "90% automation" marketing is pure fantasy unless your database is more vanilla than a suburban ice cream shop. Oracle with custom packages, triggers, and PL/SQL stored procedures? Expect 30-50% manual conversion and lots of swearing. SQL Server with CLR assemblies and complex T-SQL? Start polishing your resume. MongoDB with complex document structures? Test everything twice and pray to whatever deity you believe in.

How long will this actually take?

Add 50% to whatever the AWS Calculator estimates, then double that because you're optimistic. Small databases (< 100GB) usually finish in 2-8 hours if nothing breaks (spoiler: something always breaks). Medium databases (100GB-1TB) take 6-24 hours plus the inevitable "oh shit we forgot about this table" moments. Large databases (> 1TB) take days to weeks and will consume your soul. Network issues, schema conversion clusterfucks, and CDC lag will fuck up every timeline you've ever made. Our 2TB Oracle migration took 3 months including testing and the app rewrites nobody planned for.

What breaks during migration?

Everything you didn't test, plus some stuff you did. Common failures:

Network connectivity timeouts (plan for VPN/firewall issues)
Memory limits on replication instances during full loads
CDC replication stops working during high transaction periods
Schema conversion fails on custom data types and stored procedures
Target database locks causing task failures
AWS instances randomly becoming unhealthy

Can I trust the data validation?

DMS validation catches obvious problems but won't find subtle data corruption. Row counts match? Good. Data types converted correctly? Maybe. Business logic preserved? You need to verify that yourself. Run your own checksums, compare critical tables manually, and test application functionality thoroughly. "Zero data loss" doesn't mean "zero business impact."

What about production downtime?

"Minimal downtime" depends on your definition. CDC replication typically has 5-10 second lag, spikes to minutes during heavy loads. Plan for several hours of downtime for final cutover, application configuration changes, and testing. We had to restart our CDC task 3 times during production migration due to lag spikes and connection issues.

Is the monitoring actually useful?

CloudWatch metrics help but don't tell you why things failed. Key metrics to watch:

CDCLatencySource and CDCLatencyTarget for replication lag
FreeableMemory on replication instances (they crash when this hits zero)
FullLoadThroughput to estimate completion times

Enable detailed logging from day one. When migrations break at 2am, CloudWatch logs are your only friend.

AWS DMS CDC Target Latency Spikes

How do I size replication instances?

Start bigger than you think. We burned through migration windows twice using undersized instances. Memory is usually the bottleneck during full loads. Minimum recommendations based on experience:

< 100GB: dms.r6i.large ($0.30/hour)
100GB-1TB: dms.r6i.xlarge ($0.60/hour)
1TB: dms.r6i.2xlarge ($1.20/hour)

Multi-AZ costs double but saved us when instances randomly failed.

AWS DMS Task Performance Monitoring

What's the nuclear option when everything breaks?

Delete the replication instance and start over. Seriously. Sometimes DMS tasks get into weird fucked-up states where restarting doesn't help and AWS support just shrugs. We've done this twice during production migrations and it fixed problems that had us stumped for hours. Export your task configuration first because you'll need it. The DMS equivalent of "delete node_modules and try again" is to nuke the whole replication instance. Takes 5 minutes if you're lucky, 2 hours if AWS is having one of those days where nothing works.

Should I use DMS Serverless?

Maybe for small, infrequent migrations or when you don't want to deal with instance sizing. Serverless auto-scales but can be more expensive for large sustained workloads. It's good for CDC workloads with variable patterns

you only pay for capacity used. Traditional instances give you more control over sizing and costs. If you're doing a one-time migration over 500GB, stick with traditional instances for predictable pricing.

What about Virtual Target Mode?

Virtual Target Mode in Schema Conversion lets you start schema assessment without provisioning target databases first, saving costs during planning phases. Actually useful for early migration assessment when you're still figuring out if DMS is the right choice. Helps with cost estimates and schema complexity analysis before you commit to spinning up target infrastructure.

Resources That Actually Help (When DMS Breaks)

39%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation

How It Actually Works

What Works (And What Doesn't)

Version-Specific Gotchas

The Three Ways People Actually Use DMS

Migration Patterns That Actually Work

Network Architecture Reality Check

What Breaks in Production

Configuration That Actually Matters

How much does this shit actually cost?

Will this work for my shitty legacy database?

How long will this actually take?

What breaks during migration?

Can I trust the data validation?

What about production downtime?

Is the monitoring actually useful?

How do I size replication instances?

What's the nuclear option when everything breaks?

Should I use DMS Serverless?

What about Virtual Target Mode?

Related Tools & Recommendations

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

Amazon EC2 Overview: Elastic Cloud Compute Explained

MongoDB Atlas Alternatives: Escape High Costs & Migrate Easily

MySQL to PostgreSQL Production Migration: Complete Guide with pgloader

MongoDB to PostgreSQL Migration: The Complete Survival Guide

Zero Downtime Database Migration Strategies: AWS DMS Guide

Zero Downtime Database Migration: 2025 Tools That Actually Work

Oracle Zero Downtime Migration (ZDM): Free Database Migration Tool Overview

Amazon SageMaker: AWS ML Platform Overview & Features Guide

AWS API Gateway: The API Service That Actually Works

pgLoader Overview: Migrate MySQL, Oracle, MSSQL to PostgreSQL

Amazon DynamoDB - AWS NoSQL Database That Actually Scales

Pulumi Cloud for Platform Engineering: Build Self-Service IDP

Amazon Q Business vs. Developer: AWS AI Comparison & Pricing Guide

AWS vs Azure vs GCP Developer Tools: Real Cost & Pricing Analysis

Oracle GoldenGate - Database Replication That Actually Works

Fix Your Slow-Ass Laravel + MySQL Setup

Fix MySQL Error 1045 Access Denied - Real Solutions That Actually Work

Fivetran: Expensive Data Plumbing That Actually Works

MongoDB Atlas Enterprise Deployment Guide