Skip the theory bullshit. Here's why we had to migrate and what you need to know upfront.
The Wake-Up Call: Our $1,200 Monthly Bill
Our AI features started as a side project - a few GPT-3.5 calls for content generation. Fast forward 8 months and we're burning through $1,200/month on GPT-4 calls. The breaking point was when a single user's data analysis request cost us $47 in tokens.
Management's exact words: "Find something cheaper or we're cutting AI features entirely."
Claude's Pricing Actually Makes Sense
As of September 2025, Claude 3.5 Sonnet pricing is roughly half what we were paying for GPT-4. The official rate limits documentation shows you can get up to $3 per million input tokens for Sonnet 4. But the real kicker was Claude's longer context window - we could send entire documents without the chunking nightmare that was eating our token budget.
Real numbers from our migration:
- Before: $1,200/month (mostly GPT-4 calls)
- After: $580-650/month (Claude 3.5 Sonnet)
- Savings: 45-50% depending on usage patterns
But here's the catch nobody tells you: Claude's safety filters will reject prompts that OpenAI happily processes. Took us days to figure out why our legal document analysis kept failing.
What You Need to Know Before You Start
First: Figure Out How Screwed You Are
This command will show you exactly what you're dealing with:
## Find every OpenAI call in your codebase
rg -t py -t js -t ts \"openai\.|OpenAI\(\" --context=2
Look for these red flags that made our migration hell:
- Function calling (Claude's implementation is different and will break)
- Streaming responses (the format is completely different)
- Fine-tuned models (you can't migrate these, sorry)
- DALL-E or Whisper calls (Claude doesn't have these, find alternatives first)
Get Your Claude API Key (The Easy Part)
- Go to console.anthropic.com and sign up
- Create a new API key (they start with "sk-ant-")
- Add it to your environment:
Check the complete Claude API integration guide for detailed setup instructions.
export ANTHROPIC_API_KEY=\"sk-ant-your-key-here\"
Pro tip: Claude keys look different from OpenAI keys. If you see errors about invalid API key format, you probably mixed them up.
Install the SDK Without Breaking Everything
For Python (this is what I used):
pip install anthropic
## Keep openai installed for now, you'll need both during migration
The official Claude Python SDK is well-documented and actively maintained.
For Node.js:
npm install @anthropic-ai/sdk
## Again, keep openai package until migration is done
See the Claude Code best practices guide for advanced integration patterns.
Migration Timeline (Based on Reality, Not Wishful Thinking)
Phase 1: Test Environment (1-2 days)
Get basic calls working, discover all the ways Claude is different from OpenAI.
Phase 2: Non-Critical Features (1 week)
Migrate background jobs, internal tools. Things where failures won't wake you up at 3am.
Phase 3: User-Facing Features (2-3 weeks)
This is where you'll find all the edge cases. Budget extra time for prompt rewrites.
Phase 4: Critical Systems (As long as it takes)
Don't rush this. We spent 4 weeks on our core document processing because Claude kept rejecting legal language.
Set Up Feature Flags (You Will Need These)
import os
import logging
## This saved our ass multiple times
USE_CLAUDE = os.getenv(\"USE_CLAUDE_API\", \"false\").lower() == \"true\"
def get_ai_response(prompt):
try:
if USE_CLAUDE:
return call_claude_api(prompt)
else:
return call_openai_api(prompt)
except Exception as e:
logging.error(f\"AI API failed: {e}\")
# Fallback to the other service
if USE_CLAUDE:
return call_openai_api(prompt) # Emergency fallback
else:
return \"AI service temporarily unavailable\"
The API Differences That Will Break Your Code
What Your OpenAI Code Looks Like Now
import openai
## This is probably what you have
response = openai.ChatCompletion.create(
model=\"gpt-4\",
messages=[
{\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},
{\"role\": \"user\", \"content\": \"Hello!\"}
],
max_tokens=150,
temperature=0.7
)
text = response.choices[0].message.content
What It Looks Like After Migration to Claude
import anthropic
import os
client = anthropic.Anthropic(api_key=os.environ[\"ANTHROPIC_API_KEY\"])
## Notice system prompt is separate now
response = client.messages.create(
model=\"claude-sonnet-4-20250514\", # Current Sonnet 4 model
system=\"You are a helpful assistant.\", # Moved out of messages
messages=[
{\"role\": \"user\", \"content\": \"Hello!\"} # No system role here
],
max_tokens=150
)
## Response format is different - this took me forever to figure out
text = response.content[0].text
The Gotchas That Will Trip You Up
1. System Prompts Work Differently
Claude doesn't include system messages in the messages array. Move them to the separate system
parameter or your calls will fail.
2. Response Format is Weird
Claude returns response.content[0].text
instead of response.choices[0].message.content
. Why is content an array? Nobody knows.
3. Model Names Are Completely Different
- OpenAI:
gpt-4
,gpt-3.5-turbo
- Claude:
claude-sonnet-4-20250514
,claude-opus-4-20250514
,claude-opus-4-1-20250805
4. Temperature Defaults
OpenAI defaults to 1.0, Claude defaults to... actually not sure, the docs are unclear. Set it explicitly. Check the Claude vs OpenAI developer comparison for detailed parameter differences.
Testing Your Migration (Learn from My Mistakes)
Write a Simple Adapter First
Don't try to be clever. Make a wrapper that translates between the two APIs:
class AIAdapter:
def __init__(self):
self.use_claude = os.getenv(\"USE_CLAUDE\", \"false\") == \"true\"
if self.use_claude:
self.claude = anthropic.Anthropic()
else:
self.openai = openai.OpenAI()
def generate(self, system_prompt, user_prompt, model=\"default\"):
try:
if self.use_claude:
response = self.claude.messages.create(
model=\"claude-sonnet-4-20250514\",
system=system_prompt,
messages=[{\"role\": \"user\", \"content\": user_prompt}],
max_tokens=1000
)
return response.content[0].text
else:
# Your existing OpenAI code
response = self.openai.chat.completions.create(
model=\"gpt-4\",
messages=[
{\"role\": \"system\", \"content\": system_prompt},
{\"role\": \"user\", \"content\": user_prompt}
],
max_tokens=1000
)
return response.choices[0].message.content
except Exception as e:
print(f\"AI call failed: {e}\") # You'll see this a lot during migration
raise
Test With Your Actual Data
Forget unit tests. Use your real prompts because that's where you'll find the problems:
## Test with prompts that actually broke for us
problem_prompts = [
\"Analyze this legal document for compliance issues\", # Claude rejected this
\"Generate SQL for user table with PII fields\", # Safety filters triggered
\"Debug this JavaScript code with eval()\", # Another rejection
]
for prompt in problem_prompts:
try:
result = ai.generate(\"You are a helpful assistant\", prompt)
print(f\"✅ {prompt[:50]}... worked\")
except Exception as e:
print(f\"❌ {prompt[:50]}... failed: {e}\")
# This is where you'll spend most of your time
Monitor What Actually Matters
Skip the fancy metrics. Watch these during migration:
- Error rate (you'll have lots initially)
- Response time (Claude can be slower for complex prompts)
- Cost per day (track daily spend, not monthly averages)
- User complaints (they'll notice quality differences)
The goal isn't perfection - it's "working well enough to save money."