I've Migrated 15 Production Systems from AWS to GCP

Skip the Discovery Bullshit - Here's How to Actually Assess Your AWS Mess

The Truth About AWS Discovery Tools

Forget AWS Application Discovery Service - it's garbage. I spent 2 weeks waiting for it to map dependencies only to discover it missed half our services and flagged a Redis cache as a "critical database dependency."

Instead, SSH into every box and run these commands:

df -h - see what storage you're actually using
free -m - check memory allocation vs what AWS thinks you need
netstat -tulpn - find what services are talking to what (or use ss on newer systems)
systemctl list-units --type=service --state=running - see what's actually running

Takes maybe 2 hours vs the bullshit discovery process consultants wanted to charge us - what was it, like 50 grand? For a fucking Excel sheet.

What You'll Actually Find During Assessment

GCP Migration Architecture

Compute Resources

Half your EC2 instances are over-provisioned because "we needed them for Black Friday 3 years ago." Your custom AMIs probably have security patches from 2019. Security groups will make you cry - somebody allowed 0.0.0.0/0 on port 22 because "it was faster than figuring out the actual CIDR."

Storage

S3 buckets with public read because someone needed to "quickly test something." EBS volumes that haven't been attached to instances in 18 months but still cost $200/month. EFS filesystems that someone created for a project that got cancelled but never cleaned up. Use AWS Config to find orphaned storage - saved me 3 hours of clicking through the console like an idiot.

Databases

RDS instances running MySQL 5.7 because "if it ain't broke, don't fix it" - except it IS broke, you just don't know it yet. DynamoDB tables with provisioned capacity that scales to handle traffic from 2018. ElastiCache clusters that nobody remembers the purpose of.

Network Clusterfuck

VPCs that were supposed to be temporary but became production. Route53 records pointing to ALBs that don't exist. NAT gateways costing $500/month to route traffic for a cron job.

Real Migration Objectives (Not Marketing Bullshit)

Cost Reality Check

The magical savings Google marketing loves to quote? Total bullshit. Our first GCP bill was fucking huge - like, higher than AWS because we just moved everything over without thinking. Took forever to clean up and fight with their weird pricing models before we saw any savings.

Performance Reality

GCP's network IS faster - until you hit cross-zone traffic, then latency sucks. BigQuery is amazing for analytics but terrible for transactional workloads (learned this during a late-night on-call when our reporting queries brought down production).

AI/ML Dreams vs Reality

Vertex AI sounds great until you realize your data is still in the wrong format and your team doesn't know Python. AutoML works for demos, breaks in production.

Building a Team That Won't Quit

Don't hire consultants

One senior engineer who's done this before - worth 5 consultants
Someone who knows your applications - database connections will break in ways you can't imagine
A networking person - DNS will fuck you harder than you think
Someone with GCP experience - IAM policies make AWS look simple

Plan 3x longer than you think. Our "2-week migration" took 3 months because nobody mentioned the mobile app hard-coded AWS IP addresses everywhere, the email service still pointed to Route53, and payment processor webhook URLs were all wrong. Found that last one when customers started complaining about failed payments during dinner. Fun times.

Ready to start? Let's dive into the migration process that actually works.

AWS to GCP Reality Check - What Actually Works vs What Breaks

AWS Service	GCP "Equivalent"	Migration Reality	Why It Sucks
EC2	Compute Engine	Migrate tool chokes on Windows VMs with custom drivers	Windows migrations = manual rebuild
S3	Cloud Storage	Transfer Service works until you hit the 5TB/day limit	Large migrations take weeks, not hours
RDS MySQL	Cloud SQL	DMS works but breaks on stored procedures	Plan to rewrite all your triggers
RDS PostgreSQL	Cloud SQL	Actually works well	Rare W for Google
Lambda	Cloud Functions	Complete rewrite required different runtime limits	15min vs 9min timeout will break things
Lambda	Cloud Run	Better choice but requires containerization	Add 2 weeks to learn Docker
EKS	GKE	Container Migration misses half your YAML	Expect to recreate 60% of configs
DynamoDB	Firestore	No migration path exists	Budget 6+ months for complete rewrite
DynamoDB	Cloud Spanner	Costs 10x more, different query model	Your bill will shock you
ElastiCache	Memorystore	Manual setup breaks Redis Cluster mode	Single-node Redis only, performance drops
Route 53	Cloud DNS	DNS propagation can take 3 days	Always have backup URLs ready
CloudFormation	Deployment Manager	Deprecated Google killed it	Use Terraform or suffer
IAM	Google Cloud IAM	Makes AWS IAM look simple	3 different ways to deny access
ALB	Load Balancing	SSL cert provisioning takes 24-72 hours	HTTPS breaks during migration
CloudWatch	Cloud Monitoring	Agent conflicts with existing monitoring	Datadog will fight with Ops Agent
SQS/SNS	Pub/Sub	Message ordering works differently	Expect subtle bugs in production
Redshift	BigQuery	Transfer works but costs explode	Query costs are unpredictable

The Actual Migration Process - What Works When Everything Goes Wrong

Google Cloud Platform

Phase 1: Setting Up GCP (And Watching Things Break)

GCP Organization - The Hierarchy That Makes No Sense

GCP's resource hierarchy makes no fucking sense. They want Organization → Folders → Projects → Resources, but try creating a folder before you have the right API enabled. You can't enable APIs without a project, but you need a folder to organize projects. Chicken and egg bullshit.

What Actually Works:

Start with ONE project called "migration-hell-2025"
Get billing accounts set up FIRST - nothing works without this
Enable IAM - prepare for 3 different ways to deny access to the same resource
Create VPCs in every region you think you might need - adding them later breaks everything
Set up monitoring immediately - you'll need it when things inevitably break

Data Migration - The Storage Transfer Lies

Container Migration Meme

Storage Transfer Service sounds great until you discover:

5TB/day rate limit hits you with QUOTA_EXCEEDED errors at 2am
Transfer jobs fail silently if source buckets have versioning enabled - no error, just missing files. Discovered this when our user avatars disappeared and support tickets started flooding in
Cross-region transfers cost 3x more than they quote because they charge egress AND ingress
Object metadata gets fucked up - suddenly all your files are application/octet-stream instead of proper MIME types

What Actually Works for Big Data:

Use gsutil -m for parallel uploads
Run multiple transfer jobs with different prefixes to bypass daily limits
Test with 100GB first, not 100TB - you'll find the stupid shit faster
Budget like a month for multi-TB migrations, not the "few hours" bullshit Google claims

Database Migration - When DMS Actually Works

Database Migration Service is the one Google service that isn't complete garbage:

PostgreSQL migrations: Work perfectly 90% of the time. The other 10% involves custom extensions that don't exist in Cloud SQL. You'll get a cryptic error like ERROR: extension "pg_similarity" is not available and realize you need to rewrite half your search functionality.

MySQL migrations: Work until you discover your app uses stored procedures that DMS can't migrate. Error: ERROR 1419 (HY000): You do not have the SUPER privilege and binary logging is enabled. Plan to rewrite all your triggers or suffer through MySQL's weird privilege system.

Oracle migrations: Don't. Just rebuild the whole thing on Cloud Spanner and suffer through the query rewrites.

The Process That Actually Works:

Create oversized Cloud SQL instance (downsize later)
Run DMS migration to staging environment first
Test EVERY stored procedure, trigger, and foreign key constraint
Plan cutover for weekends when nobody's watching - shit always breaks
Keep AWS RDS running for 30 days minimum

Phase 2: Application Migration (The Pain Begins)

Compute Migration - When Migrate for Compute Engine Lies to You

Migrate for Compute Engine works perfectly in demos, breaks in production:

Linux VMs: 80% success rate if you're running standard distributions
Windows VMs: 30% success rate - custom drivers, Windows licensing, and registry settings will fuck you
VMs with multiple network interfaces: Forget it, manual rebuild required

Kubernetes Architecture

What Actually Works:

Use Packer to build fresh Compute Engine images - don't migrate old crusty AMIs
Script your application deployments with Ansible or Terraform
Test networking FIRST - VPC peering, firewall rules, and Cloud NAT work differently than AWS
Plan for DNS hell - Cloud DNS takes forever for NS record propagation

Container Migration - GKE vs EKS Reality

Migrate for GKE is experimental garbage. Better to manually recreate everything:

EKS → GKE differences that will break your deployments:

Load balancer annotations are completely different
Persistent volume provisioning requires different storage classes
RBAC policies need complete rewrites
Ingress controllers behave differently with SSL termination

GKE Autopilot Reality Check:
Autopilot sounds amazing until you need:

Custom node configurations
Specific kernel modules
DaemonSets that aren't pre-approved
Any control over the underlying nodes

Phase 3: The DNS Nightmare and Security Clusterfuck

DNS Migration - When Everything Goes to Shit

DNS is where migrations go to die. Cloud DNS has different TTL behavior than Route 53:

What Always Breaks:

Mobile apps with hardcoded IP addresses. Found out the mobile team hardcoded IPs when half our users couldn't log in. Took us 3 hours to figure out why while customers were screaming on social media
CDN configurations pointing to old origins - CloudFlare kept serving stale content for 6 hours
Third-party integrations with webhook URLs buried in config files nobody documented
SSL certificate domain validation during the switch - because of course it picks the worst possible time to fail

The Process That Minimizes Pain:

Document EVERY DNS record - Cloud Asset Inventory won't catch them all
Lower TTLs to 300 seconds 48 hours before cutover
Set up parallel DNS in Cloud DNS and test with hosts file entries
Have rollback plan ready - NS record changes take forever to propagate
Monitor error budgets religiously during cutover

SSL Certificate Hell

Google-managed SSL certificates take forever. Sometimes Google just gives up with FAILED_NOT_VISIBLE and you restart the process. I've seen them fail for 3 days straight because of DNS propagation issues nobody at Google could explain.

Survival Strategy:

Use Let's Encrypt with cert-manager for fast certificate provisioning
Have ugly temporary URLs ready (users hate downtime more than ugly URLs)
Test certificate provisioning in staging FIRST - I've seen them fail for 3 days straight because of DNS propagation issues
Budget downtime for HTTPS services - it's going to happen

Your migration isn't done when traffic starts flowing. The real fun begins when you realize what broke and what you need to fix. But first, let's address the questions you'll be asking when things inevitably go wrong:

Real Questions from Developers Who've Been Through Migration Hell

Why is my AWS bill still huge after "migrating" everything to GCP?

Because you didn't actually migrate everything, you duplicated it. AWS Data Transfer charges will continue as long as ANY service is pulling data from S3. That "temporary" RDS instance you kept for "safety"? Still costing you thousands.

Fix: Use AWS Cost Explorer to identify every active service. Set up budget alerts for $0 on services you think you've migrated. Trust me, you missed something stupid.

How do I explain to my boss that the "2-week migration" is now month 4?

By showing them the 47 services that weren't documented anywhere but are apparently "business critical." The mobile app hardcoded AWS IPs. The CRM integration webhook pointed to Route 53. The intern's Lambda function from 2019 that processes invoices.

Survival Kit: Document EVERYTHING with Cloud Asset Inventory. Create dependency maps using Application Dependency Mapping. Schedule weekly check-ins to surface new "critical" dependencies.

What do I do when Google support says "works as designed" for something that's clearly broken?

Welcome to GCP support. Google Cloud Support has different tiers:

Basic: No human contact, just documentation links
Production: Humans who read documentation to you
Premium: Humans who might escalate your ticket

Reality Check: Post in Stack Overflow or Reddit r/googlecloud. Community often has better solutions than official support.

When will GCP IAM stop making me want to throw things?

At least 37. GCP IAM has 3 ways to deny access to every resource:

Organization-level policies
Project-level policies
Resource-level policies

Your service account can have compute.instanceAdmin but still can't start an instance because of an organization policy you didn't know existed.

Debugging IAM:

gcloud auth login your-service-account@project.iam.gserviceaccount.com
gcloud projects get-iam-policy PROJECT_ID
gcloud resource-manager org-policies list --project=PROJECT_ID

Why does my Cloud SQL connection keep timing out?

Because Cloud SQL Proxy is garbage, and private IP connections require VPC peering setup that breaks when you sneeze.

What Actually Works:

Use connection pooling with PgBouncer/ProxySQL
Set connection timeouts to 30+ seconds, not the 5 seconds from your AWS setup
Monitor Cloud SQL metrics obsessively

When will SSL certificates stop ruining my weekends?

Never. Google-managed SSL certificates fail for mysterious reasons:

Domain validation fails silently
Takes 24 hours to provision, sometimes just doesn't
Fails on wildcard domains with no error message

Salvation: Use Let's Encrypt with cert-manager on GKE. Provisions in 2 minutes, not 2 days.

Why is my GCP bill costing me 3x what the calculator said?

Because the GCP Calculator doesn't include half the shit that actually costs money:

Network egress costs (surprise: they're expensive)
Cloud NAT gateway charges
Load balancer per-rule charges
BigQuery query costs when your analysts go crazy

Defense: Set up budget alerts at 50%, 80%, and 100% of your estimate. Export billing to BigQuery to understand where your money is disappearing.

How do I handle the "this worked in AWS" complaints?

By reminding your team that "worked in AWS" meant:

RDS connections timing out randomly
Lambda cold starts lasting 30 seconds
S3 eventual consistency causing data loss
EBS volumes that randomly became read-only

Translation Guide:

"It was simpler in AWS" = "I already knew how AWS was broken"
"AWS documentation was better" = "I memorized the workarounds"
"This would never happen in AWS" = "This exact thing happened in AWS last month"

When does the DNS propagation nightmare end?

After you learn to work around it. Cloud DNS propagation takes longer than Route 53:

Timeline:

NS record changes: 24-48 hours minimum
A record changes: 2-6 hours
CNAME changes: 1-2 hours

Survival Strategy: Keep AWS DNS active until GCP DNS is 100% propagated. Use DNS Checker to verify propagation globally before cutting over.

After Migration - 3 Months of Firefighting Await You

Database Migration Diagram

The Shit That Breaks After You Think You're Done

After migration, expect 3 months of firefighting. Nothing works exactly like AWS, your monitoring is broken, and you'll discover 47 services you forgot you were using when they stop working.

Performance "Optimization" - Fixing What Broke

Google Cloud's recommender will immediately tell you to downsize everything because "usage is low." Ignore this bullshit. Your CPU usage is low because half your applications are broken and not processing requests.

What Actually Needs Fixing:

Disaster Recovery Planning

Compute Sizing Reality: Custom machine types sound great until you realize they cost 20% more than predefined types for no reason. Your "optimized" 7-core, 13GB RAM instance is actually more expensive than the 8-core, 16GB standard instance. Google's pricing team must be laughing their asses off.

Storage Clusterfuck: Cloud Storage lifecycle policies will move your critical files to Coldline storage overnight, then charge you retrieval fees when your app needs them. Set up monitoring alerts BEFORE implementing lifecycle policies or prepare to get fucked by surprise fees.

Network Performance Lies: Google's premium network tier is faster - until you hit cross-zone traffic. Your database in us-central1-a talking to your app in us-central1-b will have higher latency than AWS cross-AZ traffic.

Database Performance Hell: Cloud SQL performance is unpredictable. The same query takes 100ms one hour, 2 seconds the next. Enable Query Insights immediately or you'll be debugging phantom performance issues for months while your users complain about slow page loads.

Cost Management - Why Your Bill Doubled

Sustained use discounts only apply to identical instance types running 24/7. Your auto-scaling instances that change size based on load? No discounts for you.

Committed use contracts lock you into spending commitments. Miss your commitment by $100? Pay $100 extra. Exceed it by $1000? Still pay the commitment, plus the overage.

Cost Monitoring That Doesn't Suck:

Budget alerts at 25%, 50%, 75%, and 90% - not just 100%
Billing export to BigQuery for detailed cost analysis
Daily Slack alerts showing top 10 costs - surprises suck less when they're small

Hidden Cost Bombs:

Cloud Migration Cost Calculator

Network egress: $0.12/GB adds up fast
Cloud NAT: $45/gateway/month + data processing fees
Load balancing: Per forwarding rule charges nobody mentions
BigQuery: Analysts running $500 queries "just to check something"

Security Hardening - IAM Hell Continues

GCP IAM makes AWS look simple. There are 3 different ways to deny access to every resource, and figuring out which one is blocking you requires a PhD in Google documentation.

Service Account Nightmare: Your application worked with AWS IAM roles. GCP service accounts work differently:

No automatic instance metadata service access
JSON key files that need to be managed (and rotated)
Workload Identity for GKE that requires 17 configuration steps

Security Tools That Are Actually Useful:

VPC Flow Logs: Expensive but essential for debugging network issues
Cloud Audit Logs: Works better than AWS CloudTrail
Security Command Center: Pretty dashboards, questionable alerting

Monitoring - Everything is Broken

Cloud Monitoring metrics don't map 1:1 to CloudWatch metrics. Your dashboards are useless, your alerts fire constantly, and SLOs become SLO-ry time.

Monitoring Migration Reality:

Metrics Collection: Ops Agent conflicts with existing monitoring (Datadog, New Relic)
Custom Metrics: Different namespace, different format, different costs
Alerting: Alerting policies have different trigger conditions than CloudWatch alarms
Dashboards: Start from scratch, existing dashboards don't import from AWS CloudWatch

What Actually Works:

Keep existing monitoring running for 3 months minimum
Use Cloud Trace for distributed tracing - it's actually good
Cloud Profiler works but requires application code changes

Disaster Recovery - Planning for the Next Disaster

Cloud Storage backup is reliable, Persistent Disk snapshots work well, but restore testing will reveal fun surprises.

Database Backup Reality: Cloud SQL automated backups happen during peak usage hours by default. Change this immediately or suffer performance degradation during business hours.

Multi-Region Complexity: Multi-region deployments sound great until you deal with:

Data replication delays
Network latency between regions
Cross-region firewall complexity
Load balancer health checks failing during network partitions

Ongoing "Optimization" - The Never-Ending Project

You're not done when traffic is flowing. You're done when you stop getting fucked by surprise AWS charges 6 months later.

Monthly Cleanup Tasks:

Check Cloud Asset Inventory for orphaned resources
Review Recommender suggestions (ignore most of them)
Hunt down the AWS resources you forgot about using billing alerts
Update runbooks with new GCP-specific procedures

Google releases new capabilities monthly that break your existing setup. Stay current or suffer, but test everything in staging first.

You survived. Now what?

Six months from now, when you're not getting paged at 3am about AWS billing surprises, you'll realize this pain was worth it. GCP isn't perfect - no cloud is. But if you followed this guide, you avoided the worst migration disasters and built something maintainable.

The next time some consultant tries to sell you a "seamless 2-week migration," laugh in their face and show them this guide. Real migrations are messy, expensive, and take 3x longer than anyone estimates. But if you follow what actually works, you can minimize the damage and live to fight another day.

Need more resources that might actually help? Check these out:

Quick Navigation

The Truth About AWS Discovery Tools

What You'll Actually Find During Assessment

Compute Resources

Storage

Databases

Network Clusterfuck

Real Migration Objectives (Not Marketing Bullshit)

Cost Reality Check

Performance Reality

AI/ML Dreams vs Reality

Building a Team That Won't Quit

Don't hire consultants

Phase 1: Setting Up GCP (And Watching Things Break)

GCP Organization - The Hierarchy That Makes No Sense

Data Migration - The Storage Transfer Lies

Database Migration - When DMS Actually Works

Phase 2: Application Migration (The Pain Begins)

Compute Migration - When Migrate for Compute Engine Lies to You

Container Migration - GKE vs EKS Reality

Phase 3: The DNS Nightmare and Security Clusterfuck

DNS Migration - When Everything Goes to Shit

SSL Certificate Hell

Why is my AWS bill still huge after "migrating" everything to GCP?

How do I explain to my boss that the "2-week migration" is now month 4?

What do I do when Google support says "works as designed" for something that's clearly broken?

When will GCP IAM stop making me want to throw things?

Why does my Cloud SQL connection keep timing out?

When will SSL certificates stop ruining my weekends?

Why is my GCP bill costing me 3x what the calculator said?

How do I handle the "this worked in AWS" complaints?

When does the DNS propagation nightmare end?

The Shit That Breaks After You Think You're Done

Performance "Optimization" - Fixing What Broke

Cost Management - Why Your Bill Doubled

Security Hardening - IAM Hell Continues

Monitoring - Everything is Broken

Disaster Recovery - Planning for the Next Disaster

Ongoing "Optimization" - The Never-Ending Project

Related Tools & Recommendations

AWS MGN: Server Migration to AWS - What to Expect & Costs

AWS MGN Enterprise Production Deployment: Security, Scale & Automation Guide

Google Cloud Migration Center: Simplify Your Cloud Migration

AWS Database Migration Service: Real-World Migrations & Costs

Amazon EC2 Overview: Elastic Cloud Compute Explained

Migrate VMs to Google Cloud with Migrate to Virtual Machines Overview

Meta Spends $10B on Google Cloud: AI Infrastructure Crisis

AWS vs Azure vs GCP Developer Tools: Real Cost & Pricing Analysis

Zero Downtime Database Migration Strategies: AWS DMS Guide

H&R Block Azure Migration: Enterprise Tax Platform on Azure

Cloud AI Cost Comparison: AWS, Azure, GCP Pricing Guide

Kubernetes Pricing: Uncover Hidden K8s Costs & Skyrocketing Bills

Oracle Zero Downtime Migration (ZDM): Free Database Migration Tool Overview

AWS CodeBuild Overview: Managed Builds, Real-World Issues

CloudHealth: Is This Expensive Multi-Cloud Cost Tool Worth It?

Amazon AWS Invests $4.4B in New Zealand Region: ap-southeast-6 Live

AWS Lambda DynamoDB: Serverless Data Processing in Production

Amazon SageMaker: AWS ML Platform Overview & Features Guide

Accenture Drops Half a Billion on AI Consultants Because Everyone's Going Crazy for ChatGPT

Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)