Why Teams Actually Migrate (Spoiler:
It's Not Strategy)
Here's the truth: Nobody migrates for fun.
Our OpenAI bill hit $47k/month and accounting started asking uncomfortable questions. Azure ML kept crashing our training jobs
- 3 times in one week during a demo to investors. Google shut down another product we used (surprise!).
That's why you're here reading this instead of shipping features.
The Real AWS Migration Experience
I'm writing this after helping Acme Corp migrate from OpenAI to Bedrock. Took 8 months instead of the promised 3. Cost $180k in engineering time. But we did cut our inference costs by 60% and haven't had a single outage since.
What Actually Forces Migration
OpenAI Bill Shock: Started at $300/month for our prototype.
Hit $47k when we scaled. Finance team wasn't amused. AWS Bedrock pricing is clearer and about 40% cheaper for Claude models.
The OpenAI pricing model gets expensive fast at scale, especially with GPT-4 API costs.
Azure ML Reliability Problems: Our training pipelines failed every few weeks.
No good error messages. Support tickets took 5 days. We spent more time debugging Azure Machine Learning than training models.
The Azure ML service outages happened monthly and Azure support response times are inconsistent.
Google Product Anxiety: They killed AI Platform Notebooks.
Then deprecated ML Engine.
Check the Google Graveyard
- they've killed 200+ products. AWS service longevity has killed exactly zero major ML services in 10 years.
Vendor Lock-In Reality: Try switching from OpenAI's fine-tuned models to anything else.
Good luck. At least with AWS you can run the same models locally if needed.
AWS Services That Actually Matter (September 2025)
Forget the marketing bullshit. Here's what you'll actually use:
Bedrock (The Models):
- Claude 3.5 Sonnet
- Best for complex reasoning, costs ~$15/1M input tokens
- Nova Pro
- Amazon's model, decent quality at $8/1M tokens
- Llama 3
- Open source option, cheapest at $2.65/1M tokens
SageMaker (The Platform):
- Real-time inference endpoints (what you'll use most)
- Training jobs with spot instances (saves 70% on training costs)
- Pipelines if you hate yourself and love complexity
Basic AI Services:
- Rekognition for image tagging (actually works well)
- Textract for OCR (better than Google's Cloud Vision)
- Comprehend for sentiment (meh, just use a model)
Skip the rest unless you have specific needs.
Migration Timelines That Won't Get You Fired
Simple API Switch (OpenAI → Bedrock): 6-8 weeks
- 2 weeks IAM setup and credential hell
- 2 weeks rewriting API calls and error handling
- 2 weeks testing and prompt optimization
- 2 weeks for the inevitable fuckups you didn't plan for
ML Platform Migration (Azure ML → SageMaker): 4-6 months
- 4 weeks learning SageMaker (it's different from everything else)
- 8 weeks rebuilding training pipelines
- 6 weeks data migration and testing
- 6 weeks fixing everything that breaks in production
Complex Multi-Platform: 6-12 months Just don't.
Migrate one thing at a time or you'll lose your sanity.
Migration Architecture Overview
The Shit Nobody Tells You About
IAM Permissions Are Hell: You'll spend 2 weeks minimum just figuring out who can access what.
The error messages are useless:
AccessDenied:
User is not authorized to perform bedrock:InvokeModel
Translation:
You're missing one of 12 different permissions and we won't tell you which one.
Service Quotas Are Pathetic: Bedrock starts with 10 requests per minute.
Production traffic? That's cute. Submit increase requests immediately
- they take 3-5 business days.
Regional Availability Sucks: Want Claude 3.5 Sonnet?
Only available in us-east-1 and us-west-2. Hope your compliance team is flexible.
Billing Surprises: Data transfer costs add up fast.
That 100GB model you're downloading costs $9 each time. Nobody mentions this until the bill arrives.
What Actually Saves Money
Bedrock vs OpenAI:
- Claude 3.5 Sonnet: ~$15/1M tokens vs OpenAI's $30/1M for GPT-4
- Nova Pro: ~$8/1M tokens, quality is 85% of GPT-4
- You'll save 40-60% on inference costs if you optimize prompts
SageMaker Reality Check:
- Training costs: 70% cheaper with spot instances (when they don't get interrupted)
- Real-time inference:
Actually expensive
- $50-200/month per endpoint
- Batch inference: Much cheaper option if you can wait
Hidden Costs That'll Bite You:
- Data transfer: $0.09/GB out of AWS
- Storage:
Model artifacts add up over months
- CloudWatch logs: Can exceed compute costs if you're not careful
The Migration That Nearly Got Me Fired
Company decided to migrate everything from Azure to AWS in 90 days.
I said 6 months minimum. They went with 90 days.
Week 6: IAM still not working, dev team can't access anything
Week 10:
Training jobs failing, no idea why Week 12: Emergency meeting with CEO, demo completely broken
Ended up running both platforms for 5 months at double the cost.
Finally worked after month 7. CEO wasn't happy but we shipped working software.
Lesson: Never promise heroic timelines.
Always run parallel systems longer than you think.
When NOT to Migrate
Don't migrate if:
- You're spending less than $5k/month on current platform
- Your OpenAI integration is just basic API calls with no custom models
- Your team has never used AWS (learning curve is brutal)
- You don't have 3+ months of engineering bandwidth
Especially don't migrate if:
- Your current system actually works well
- You're under deadline pressure
- You think it'll be "easy" (narrator: it wasn't easy)
Migration is a 6-month commitment minimum. If you can't commit to that, don't start.
Ready to dive into the specifics? The next section breaks down exactly what each migration path looks like
- including the real timelines and costs that'll actually happen (not the marketing bullshit).