Why Our Migration Wasn't Optional (And Why Yours Might Not Be Either)

Skip the theory bullshit. Here's why we had to migrate and what you need to know upfront.

The Wake-Up Call: Our $1,200 Monthly Bill

Our AI features started as a side project - a few GPT-3.5 calls for content generation. Fast forward 8 months and we're burning through $1,200/month on GPT-4 calls. The breaking point was when a single user's data analysis request cost us $47 in tokens.

Management's exact words: "Find something cheaper or we're cutting AI features entirely."

Claude's Pricing Actually Makes Sense

As of September 2025, Claude 3.5 Sonnet pricing is roughly half what we were paying for GPT-4. The official rate limits documentation shows you can get up to $3 per million input tokens for Sonnet 4. But the real kicker was Claude's longer context window - we could send entire documents without the chunking nightmare that was eating our token budget.

Real numbers from our migration:

  • Before: $1,200/month (mostly GPT-4 calls)
  • After: $580-650/month (Claude 3.5 Sonnet)
  • Savings: 45-50% depending on usage patterns

But here's the catch nobody tells you: Claude's safety filters will reject prompts that OpenAI happily processes. Took us days to figure out why our legal document analysis kept failing.

What You Need to Know Before You Start

First: Figure Out How Screwed You Are

This command will show you exactly what you're dealing with:

## Find every OpenAI call in your codebase
rg -t py -t js -t ts \"openai\.|OpenAI\(\" --context=2

Look for these red flags that made our migration hell:

  • Function calling (Claude's implementation is different and will break)
  • Streaming responses (the format is completely different)
  • Fine-tuned models (you can't migrate these, sorry)
  • DALL-E or Whisper calls (Claude doesn't have these, find alternatives first)

Get Your Claude API Key (The Easy Part)

  1. Go to console.anthropic.com and sign up
  2. Create a new API key (they start with "sk-ant-")
  3. Add it to your environment:

Check the complete Claude API integration guide for detailed setup instructions.

export ANTHROPIC_API_KEY=\"sk-ant-your-key-here\"

Pro tip: Claude keys look different from OpenAI keys. If you see errors about invalid API key format, you probably mixed them up.

Install the SDK Without Breaking Everything

For Python (this is what I used):

pip install anthropic
## Keep openai installed for now, you'll need both during migration

The official Claude Python SDK is well-documented and actively maintained.

For Node.js:

npm install @anthropic-ai/sdk
## Again, keep openai package until migration is done

See the Claude Code best practices guide for advanced integration patterns.

Migration Timeline (Based on Reality, Not Wishful Thinking)

Phase 1: Test Environment (1-2 days)
Get basic calls working, discover all the ways Claude is different from OpenAI.

Phase 2: Non-Critical Features (1 week)
Migrate background jobs, internal tools. Things where failures won't wake you up at 3am.

Phase 3: User-Facing Features (2-3 weeks)
This is where you'll find all the edge cases. Budget extra time for prompt rewrites.

Phase 4: Critical Systems (As long as it takes)
Don't rush this. We spent 4 weeks on our core document processing because Claude kept rejecting legal language.

Set Up Feature Flags (You Will Need These)

import os
import logging

## This saved our ass multiple times
USE_CLAUDE = os.getenv(\"USE_CLAUDE_API\", \"false\").lower() == \"true\"

def get_ai_response(prompt):
    try:
        if USE_CLAUDE:
            return call_claude_api(prompt)
        else:
            return call_openai_api(prompt)
    except Exception as e:
        logging.error(f\"AI API failed: {e}\")
        # Fallback to the other service
        if USE_CLAUDE:
            return call_openai_api(prompt)  # Emergency fallback
        else:
            return \"AI service temporarily unavailable\"

The API Differences That Will Break Your Code

What Your OpenAI Code Looks Like Now

import openai

## This is probably what you have
response = openai.ChatCompletion.create(
    model=\"gpt-4\",
    messages=[
        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},
        {\"role\": \"user\", \"content\": \"Hello!\"}
    ],
    max_tokens=150,
    temperature=0.7
)

text = response.choices[0].message.content

What It Looks Like After Migration to Claude

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ[\"ANTHROPIC_API_KEY\"])

## Notice system prompt is separate now
response = client.messages.create(
    model=\"claude-sonnet-4-20250514\",  # Current Sonnet 4 model
    system=\"You are a helpful assistant.\",  # Moved out of messages
    messages=[
        {\"role\": \"user\", \"content\": \"Hello!\"}  # No system role here
    ],
    max_tokens=150
)

## Response format is different - this took me forever to figure out
text = response.content[0].text

The Gotchas That Will Trip You Up

1. System Prompts Work Differently
Claude doesn't include system messages in the messages array. Move them to the separate system parameter or your calls will fail.

2. Response Format is Weird
Claude returns response.content[0].text instead of response.choices[0].message.content. Why is content an array? Nobody knows.

3. Model Names Are Completely Different

  • OpenAI: gpt-4, gpt-3.5-turbo
  • Claude: claude-sonnet-4-20250514, claude-opus-4-20250514, claude-opus-4-1-20250805

4. Temperature Defaults
OpenAI defaults to 1.0, Claude defaults to... actually not sure, the docs are unclear. Set it explicitly. Check the Claude vs OpenAI developer comparison for detailed parameter differences.

Testing Your Migration (Learn from My Mistakes)

Write a Simple Adapter First

Don't try to be clever. Make a wrapper that translates between the two APIs:

class AIAdapter:
    def __init__(self):
        self.use_claude = os.getenv(\"USE_CLAUDE\", \"false\") == \"true\"
        if self.use_claude:
            self.claude = anthropic.Anthropic()
        else:
            self.openai = openai.OpenAI()
    
    def generate(self, system_prompt, user_prompt, model=\"default\"):
        try:
            if self.use_claude:
                response = self.claude.messages.create(
                    model=\"claude-sonnet-4-20250514\",
                    system=system_prompt,
                    messages=[{\"role\": \"user\", \"content\": user_prompt}],
                    max_tokens=1000
                )
                return response.content[0].text
            else:
                # Your existing OpenAI code
                response = self.openai.chat.completions.create(
                    model=\"gpt-4\",
                    messages=[
                        {\"role\": \"system\", \"content\": system_prompt},
                        {\"role\": \"user\", \"content\": user_prompt}
                    ],
                    max_tokens=1000
                )
                return response.choices[0].message.content
        except Exception as e:
            print(f\"AI call failed: {e}\")  # You'll see this a lot during migration
            raise

Test With Your Actual Data

Forget unit tests. Use your real prompts because that's where you'll find the problems:

## Test with prompts that actually broke for us
problem_prompts = [
    \"Analyze this legal document for compliance issues\",  # Claude rejected this
    \"Generate SQL for user table with PII fields\",        # Safety filters triggered
    \"Debug this JavaScript code with eval()\",             # Another rejection
]

for prompt in problem_prompts:
    try:
        result = ai.generate(\"You are a helpful assistant\", prompt)
        print(f\"✅ {prompt[:50]}... worked\")
    except Exception as e:
        print(f\"❌ {prompt[:50]}... failed: {e}\")
        # This is where you'll spend most of your time

Monitor What Actually Matters

Skip the fancy metrics. Watch these during migration:

  • Error rate (you'll have lots initially)
  • Response time (Claude can be slower for complex prompts)
  • Cost per day (track daily spend, not monthly averages)
  • User complaints (they'll notice quality differences)

The goal isn't perfection - it's "working well enough to save money."

What Nobody Tells You About The Costs (Real Numbers)

What We Actually Pay

OpenAI (Before)

Claude (After)

The Catch

Input tokens (per million)

$30 for GPT-4

$3 for Claude 3.5 Sonnet

Claude's safety filters mean more rejected requests

Output tokens (per million)

$60 for GPT-4

$15 for Claude 3.5 Sonnet

Claude outputs can be longer (costs more)

Our monthly bill

$1,200

$580-650

Varies based on how many prompts get rejected

Context window

128K tokens

200K tokens

Amazing for document processing

Function calling

Works fine

Different JSON format

Broke all our existing functions

Streaming

Easy to implement

Pain in the ass

Completely different API

The Implementation That Actually Works (After 3 Weeks of Debugging)

Here's the code that survived our production deployment. It's not perfect, but it works. Based on the official migration documentation and practical integration examples.

The Basic Adapter That Doesn't Suck

Start with this. It handles the main differences without trying to be too clever:

import os
import anthropic
import openai
import logging

## Set up logging because you'll need it
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class AIClient:
    def __init__(self):
        self.use_claude = os.getenv("USE_CLAUDE", "false").lower() == "true"
        
        if self.use_claude:
            self.claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
            logger.info("Using Claude API")
        else:
            self.openai = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
            logger.info("Using OpenAI API")
    
    def generate(self, system_prompt, user_prompt, model="default"):
        """Generate a response. Keep it simple."""
        try:
            if self.use_claude:
                return self._call_claude(system_prompt, user_prompt)
            else:
                return self._call_openai(system_prompt, user_prompt)
        except Exception as e:
            logger.error(f"AI call failed: {e}")
            # Don't try to be smart with fallbacks initially
            raise
    
    def _call_claude(self, system_prompt, user_prompt):
        """Claude API call - the response format is different"""
        try:
            response = self.claude.messages.create(
                model="claude-sonnet-4-20250514",  # Current Sonnet 4 model
                system=system_prompt,  # Separate system prompt
                messages=[
                    {"role": "user", "content": user_prompt}  # NOT "human"!
                ],
                max_tokens=1000
            )
            
            # This took me forever to figure out
            return response.content[0].text
            
        except anthropic.RateLimitError as e:
            logger.error(f"Claude rate limit hit: {e}")
            raise
        except anthropic.APIError as e:
            logger.error(f"Claude API error: {e}")
            raise
        except Exception as e:
            logger.error(f"Claude unknown error: {e}")
            raise
    
    def _call_openai(self, system_prompt, user_prompt):
        """OpenAI API call - the old way"""
        try:
            response = self.openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                max_tokens=1000
            )
            
            return response.choices[0].message.content
            
        except openai.RateLimitError as e:
            logger.error(f"OpenAI rate limit: {e}")
            raise
        except openai.APIError as e:
            logger.error(f"OpenAI API error: {e}")
            raise

## Test it before you use it in production
if __name__ == "__main__":
    ai = AIClient()
    
    try:
        result = ai.generate(
            "You are a helpful assistant",
            "Say hello"
        )
        print(f"Success: {result}")
    except Exception as e:
        print(f"Failed: {e}")
        # This will happen a lot during migration

Streaming (If You Need Real-Time Responses)

Streaming is where Claude gets annoying. The API is completely different. Check the streaming documentation for complete details and real-world streaming examples.

def stream_claude(self, system_prompt, user_prompt):
    """Streaming with Claude - prepare for frustration"""
    try:
        with self.claude.messages.stream(
            model="claude-sonnet-4-20250514",
            system=system_prompt,
            messages=[{"role": "user", "content": user_prompt}],
            max_tokens=1000
        ) as stream:
            for text in stream.text_stream:
                yield text  # This part actually works well
                
    except Exception as e:
        logger.error(f"Claude streaming failed: {e}")
        # Fallback to non-streaming
        response = self._call_claude(system_prompt, user_prompt)
        yield response

def stream_openai(self, system_prompt, user_prompt):
    """OpenAI streaming - the old way"""
    try:
        stream = self.openai.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=1000,
            stream=True
        )
        
        for chunk in stream:
            if chunk.choices[0].delta.content:
                yield chunk.choices[0].delta.content
                
    except Exception as e:
        logger.error(f"OpenAI streaming failed: {e}")
        raise

## Use it like this (if you must)
def get_streaming_response(system_prompt, user_prompt):
    ai = AIClient()
    
    if ai.use_claude:
        return ai.stream_claude(system_prompt, user_prompt)
    else:
        return ai.stream_openai(system_prompt, user_prompt)

Function Calling (This Will Break Everything)

Function calling is where Claude decided to be completely different. Budget extra time for this.

def call_with_tools_claude(self, system_prompt, user_prompt, tools):
    """Claude's tool calling - different JSON schema"""
    try:
        response = self.claude.messages.create(
            model="claude-sonnet-4-20250514",
            system=system_prompt,
            messages=[{"role": "user", "content": user_prompt}],
            tools=tools,  # Different format than OpenAI
            max_tokens=1000
        )
        
        # Check if Claude wants to use a tool
        for block in response.content:
            if block.type == "tool_use":
                # Claude found a function to call
                return {
                    "tool_name": block.name,
                    "tool_args": block.input,
                    "tool_id": block.id  # You'll need this for the response
                }
        
        # No tool use, just return text
        return {"text": response.content[0].text}
        
    except Exception as e:
        logger.error(f"Claude tool calling failed: {e}")
        raise

## Example tool definition (Claude format)
claude_tools = [
    {
        "name": "get_weather",
        "description": "Get weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
]

## Your OpenAI functions need to be converted to this format
## This is manual work, no magic converter exists

Testing Your Migration (The Hard Way)

Forget fancy test frameworks. Just test the prompts that actually broke:

## Test with your real prompts that will probably fail
def test_problem_prompts():
    ai = AIClient()
    
    # These are prompts that broke during our migration
    problem_cases = [
        ("Analyze this contract for legal issues", "claude_safety_filter"),
        ("Generate SQL with user input validation", "claude_safety_filter"),
        ("Debug this JavaScript with eval()", "claude_safety_filter"),
        ("Write Python code for data parsing", "should_work"),
        ("Summarize this technical document", "should_work"),
    ]
    
    for prompt, expected in problem_cases:
        try:
            result = ai.generate("You are a helpful assistant", prompt)
            if expected == "claude_safety_filter" and ai.use_claude:
                print(f"⚠️  {prompt[:30]}... worked (surprising!)")
            else:
                print(f"✅ {prompt[:30]}... worked")
        except Exception as e:
            if expected == "claude_safety_filter":
                print(f"❌ {prompt[:30]}... rejected (expected): {e}")
            else:
                print(f"❌ {prompt[:30]}... failed unexpectedly: {e}")

## Run this before you deploy anything
if __name__ == "__main__":
    test_problem_prompts()

Production Rollout (Don't Rush This)

Use feature flags and start small. This controller saved us from multiple disasters:

import os
import hashlib

class MigrationController:
    def __init__(self):
        # Start at 0%, increase gradually
        self.rollout_percentage = int(os.getenv("CLAUDE_ROLLOUT", "0"))
        
    def should_use_claude(self, user_id=None):
        """Consistent user assignment based on hash"""
        if not user_id:
            return False  # Don't use Claude for anonymous users initially
        
        # Hash user ID to get consistent assignment
        user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
        return (user_hash % 100) < self.rollout_percentage
    
    def get_client(self, user_id=None):
        use_claude = self.should_use_claude(user_id)
        
        # Log which API we're using (you'll need this data)
        logger.info(f"User {user_id}: {'Claude' if use_claude else 'OpenAI'}")
        
        return AIClient() if not use_claude else AIClient()  # Set USE_CLAUDE env var

## Your app integration
controller = MigrationController()

def process_user_request(user_id, prompt):
    client = controller.get_client(user_id)
    
    try:
        start_time = time.time()
        result = client.generate("You are helpful", prompt)
        duration = time.time() - start_time
        
        # Log success metrics (essential for monitoring)
        logger.info(f"Success - Duration: {duration:.2f}s, API: {'Claude' if client.use_claude else 'OpenAI'}")
        return result
        
    except Exception as e:
        # Log failures (you'll get lots during migration)
        logger.error(f"API failure - User: {user_id}, API: {'Claude' if client.use_claude else 'OpenAI'}, Error: {e}")
        
        # Emergency fallback (saved us multiple times)
        if client.use_claude:
            logger.info("Falling back to OpenAI")
            backup_client = AIClient()  # OpenAI
            return backup_client.generate("You are helpful", prompt)
        else:
            raise  # If OpenAI fails, we're screwed anyway

## Rollout schedule we actually used:
## Week 1: CLAUDE_ROLLOUT=5 (5% of users)  
## Week 2: CLAUDE_ROLLOUT=15 (15% of users)
## Week 3: CLAUDE_ROLLOUT=30 (30% of users)
## Week 4: CLAUDE_ROLLOUT=60 (60% of users)
## Week 5: CLAUDE_ROLLOUT=100 (everyone)

Start with 5% rollout. Monitor error rates. If everything breaks, set CLAUDE_ROLLOUT=0 and revert instantly. Don't be a hero - production stability matters more than migration speed.

The Problems You'll Actually Face (And How to Fix Them)

Q

Claude keeps rejecting prompts that OpenAI accepted. WTF?

A

Claude's safety filters are way more sensitive. Here's what triggers them and how to work around it:

What breaks:

  • Anything mentioning "hack", "exploit", "vulnerability"
  • Legal document analysis (Claude thinks you're asking for legal advice)
  • Code with eval() or exec() functions
  • Medical content (even basic health info)

How to fix it:

## Instead of: "Analyze this contract for potential issues"
## Try: "As a business analyst, identify key terms and clauses in this document"

## Instead of: "Find vulnerabilities in this code"  
## Try: "As a security researcher, review this code for defensive purposes"

We had to rewrite 60% of our prompts. Budget time for this.

Q

Why do I keep getting "400 Bad Request" errors with Claude?

A

The error messages are shit, but here's what's usually wrong:

1. Wrong role names (this broke us for 2 days):

## WRONG - this will fail
messages = [{"role": "human", "content": "Hello"}]

## RIGHT - Claude uses "user" not "human" 
messages = [{"role": "user", "content": "Hello"}]

2. System prompts in wrong place:

## WRONG - don't put system in messages array
messages = [
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"}
]

## RIGHT - separate system parameter
system = "You are helpful"
messages = [{"role": "user", "content": "Hello"}]

3. Missing required fields:
Always include max_tokens or Claude returns cryptic errors.

Q

Rate limits are different and will screw you over

A

Claude's rate limits work differently than OpenAI's. You'll hit them when you least expect it.

OpenAI: Nice tiered system based on your spending
Claude: More restrictive, especially for new accounts

import time
import random

def retry_api_call(func, max_retries=3):
    """Simple retry logic that actually works"""
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            error_msg = str(e).lower()
            if "rate" in error_msg or "429" in error_msg:
                if attempt < max_retries - 1:
                    # Claude rate limits are more aggressive
                    wait_time = (2 ** attempt) + random.uniform(0, 2)
                    print(f"Rate limited, waiting {wait_time:.1f}s...")
                    time.sleep(wait_time)
                    continue
            print(f"API call failed: {e}")
            raise
    
    raise Exception("Max retries exceeded")
Q

What if I have fine-tuned OpenAI models?

A

You're fucked. Claude doesn't support fine-tuning for most users.

Your options:

  1. Beg Anthropic for fine-tuning access (good luck)
  2. Rewrite everything with few-shot prompting:
## Replace your fine-tuned model with examples in the prompt
system_prompt = """You are a customer service bot. Examples:

User: "I'm pissed about my order!"
Bot: "I understand you're frustrated. Let me help immediately. What's your order number?"

User: "Where's my package?"  
Bot: "I'll check tracking right now. Can you give me your order number?"

Follow this pattern: acknowledge, offer immediate help, ask for specifics."""
  1. Use a different service for fine-tuned models and keep Claude for everything else
Q

Do the cost savings actually pan out?

A

For us: yes, but with caveats.

Real savings from our migration:

  • Monthly bill dropped from $1,200 to ~$650
  • But we spent 3 weeks of engineering time ($15K+ in salary)
  • First month had higher costs due to retries and debugging

Hidden costs nobody mentions:

  • Claude outputs tend to be longer (more output tokens)
  • Safety filter rejections mean more retry attempts
  • You'll need fallback logic (maintaining both APIs temporarily)

Break-even point: About 3-4 months for us, depending on usage.

Q

Is Claude actually faster?

A

Depends what you mean by "faster."

Response time: About the same (1-3 seconds)
Time to correct answer: Claude is better, fewer retries needed
Streaming: Claude feels snappier but I haven't measured it

The bigger win: Claude's 200K context window eliminates chunking. We went from 4 API calls to process a large document down to 1.

Q

Why does Claude sometimes cost more per request?

A

Claude likes to write novels. Seriously.

## OpenAI response: "Yes, use PostgreSQL for your use case."
## Claude response: "Based on your requirements, PostgreSQL would be an excellent choice. Here's why: [500 words of explanation]"

You can limit this with shorter max_tokens settings, but then you might cut off useful responses.

Q

What should I actually monitor?

A

Forget the fancy metrics. Watch these:

1. Error rates (most important):

## Log every API call result
def track_api_call(provider, success, error_msg=None):
    if success:
        print(f"✅ {provider} success")
    else:
        print(f"❌ {provider} failed: {error_msg}")
    
    # Send to wherever you track metrics

2. Daily costs:
Check your billing dashboards daily. Costs can spike unexpectedly.

3. User complaints:
Users will notice if response quality drops. Monitor support tickets.

4. Response times:
Claude can be slower for complex prompts. Set up alerts if response time > 10 seconds.

Q

What's a realistic rollout timeline?

A

Based on our experience:

Week 1-2: Get basic integration working (expect lots of failures)
Week 3: Start with 5% of traffic on non-critical features
Week 4: Increase to 15% if error rates are acceptable
Week 5-6: 30% traffic, include user-facing features
Week 7-8: 60% if everything looks stable
Week 9+: Full migration (keep OpenAI as emergency backup)

## Simple rollout controller we used
def get_rollout_percentage():
    week = int(os.getenv("MIGRATION_WEEK", "0"))
    percentages = {
        0: 0, 1: 0, 2: 0,   # Weeks 1-2: Development only
        3: 5,               # Week 3: 5% 
        4: 15,              # Week 4: 15%
        5: 30, 6: 30,       # Weeks 5-6: 30%
        7: 60, 8: 60,       # Weeks 7-8: 60%
    }
    return percentages.get(week, 100)  # Default to 100% after week 8

Don't rush it. We tried to go faster and broke production twice.

Q

How to rollback when everything goes to hell

A

You need a kill switch:

## Emergency rollback - saved our ass multiple times
def emergency_rollback():
    """Immediately switch all traffic back to OpenAI"""
    os.environ["USE_CLAUDE"] = "false"
    os.environ["CLAUDE_ROLLOUT"] = "0"
    
    # Restart your app or reload config
    print("🚨 EMERGENCY ROLLBACK ACTIVATED")
    # restart_app() or whatever you need

## Check error rates and rollback if needed
error_count = 0
def check_rollback(api_call_failed):
    global error_count
    if api_call_failed:
        error_count += 1
    
    # If 10 errors in a row, rollback immediately
    if error_count >= 10:
        emergency_rollback()
Q

What happens when Claude goes down?

A

Claude has had some outages. You need automatic fallback:

def ai_call_with_fallback(prompt):
    """Try Claude first, fallback to OpenAI if it fails"""
    try:
        return call_claude(prompt)
    except Exception as e:
        if "connection" in str(e).lower() or "timeout" in str(e).lower():
            print("Claude seems down, using OpenAI")
            return call_openai(prompt)  # Emergency backup
        else:
            raise  # Other errors should bubble up

The key is having both APIs working simultaneously during migration. Don't be clever and remove OpenAI too early.

Production Deployment (The Part That Actually Matters)

This is where you make good on that promise to save money without destroying production. After 3 weeks of debugging and 2 production outages, here's the monitoring and deployment setup that actually kept our $600/month savings intact.

Here's the monitoring and production setup that kept us from getting fired when things went wrong. Based on Claude Code best practices and production monitoring patterns.

What You Actually Need to Monitor

Skip the fancy observability bullshit. Track these or you'll be debugging blind:

The 4 Things That Matter

1. Error rate by API:

import time
from collections import defaultdict

## Simple error tracking that actually works
error_counts = defaultdict(int)
total_requests = defaultdict(int)

def track_api_call(provider, success):
    total_requests[provider] += 1
    if not success:
        error_counts[provider] += 1
    
    # Log error rate every 100 requests
    if total_requests[provider] % 100 == 0:
        error_rate = error_counts[provider] / total_requests[provider] * 100
        print(f"{provider} error rate: {error_rate:.1f}% ({error_counts[provider]}/{total_requests[provider]})")
        
        # Alert if error rate > 10%
        if error_rate > 10:
            print(f"🚨 HIGH ERROR RATE: {provider} at {error_rate:.1f}%")

2. Daily costs (check this every morning):

## Dead simple cost tracking
daily_costs = {"openai": 0, "claude": 0}

def log_request_cost(provider, input_tokens, output_tokens):
    if provider == "claude":
        cost = (input_tokens * 0.000003) + (output_tokens * 0.000015)
    else:  # openai
        cost = (input_tokens * 0.00003) + (output_tokens * 0.00006)
    
    daily_costs[provider] += cost
    
    # Log every $10 spent
    if sum(daily_costs.values()) % 10 < 0.1:
        print(f"💰 Daily costs: OpenAI ${daily_costs['openai']:.2f}, Claude ${daily_costs['claude']:.2f}")

3. Response time (users notice when it's slow):
Log anything over 5 seconds because users will complain.

4. User complaints:
Monitor your support tickets. Users will tell you when Claude's responses suck compared to OpenAI.

Basic Health Check (Don't Overthink It)

Run this every few minutes to make sure both APIs work. For advanced monitoring, check out claude-code-otel for comprehensive observability or ccost for accurate usage tracking.

import time

def simple_health_check():
    """Test both APIs with a simple prompt"""
    test_prompt = "Say 'OK' if you can read this"
    
    # Test OpenAI
    try:
        openai_client = AIClient()  # use_claude=False
        openai_start = time.time()
        openai_response = openai_client.generate("You are helpful", test_prompt)
        openai_time = time.time() - openai_start
        openai_ok = "ok" in openai_response.lower() and len(openai_response) > 0
        print(f"OpenAI: {'✅' if openai_ok else '❌'} ({openai_time:.1f}s)")
    except Exception as e:
        print(f"OpenAI: ❌ Error: {e}")
        openai_ok = False
    
    # Test Claude  
    try:
        claude_client = AIClient()  # Will use USE_CLAUDE env var
        claude_start = time.time()
        claude_response = claude_client.generate("You are helpful", test_prompt)
        claude_time = time.time() - claude_start
        claude_ok = "ok" in claude_response.lower() and len(claude_response) > 0
        print(f"Claude: {'✅' if claude_ok else '❌'} ({claude_time:.1f}s)")
    except Exception as e:
        print(f"Claude: ❌ Error: {e}")
        claude_ok = False
    
    return openai_ok, claude_ok

## Run this every 5 minutes
if __name__ == "__main__":
    import schedule
    
    schedule.every(5).minutes.do(simple_health_check)
    
    while True:
        schedule.run_pending()
        time.sleep(60)

Performance Tips That Actually Help

Don't Batch Requests (It's Not Worth It)

Both OpenAI and Claude charge per token, not per request. Batching saves you nothing and adds complexity. Just make requests as needed.

Cache Responses (This Actually Saves Money)

If users ask similar questions, cache the responses. Claude also supports prompt caching to reduce costs on repeated context. Check the usage monitoring guide for real-time cost tracking.

import hashlib
import json
from datetime import datetime, timedelta

## Simple in-memory cache (use Redis for production)
response_cache = {}

def get_cache_key(system_prompt, user_prompt):
    """Create hash of prompts for cache key"""
    combined = f"{system_prompt}|{user_prompt}"
    return hashlib.md5(combined.encode()).hexdigest()

def get_cached_response(system_prompt, user_prompt):
    """Check if we've seen this prompt before"""
    cache_key = get_cache_key(system_prompt, user_prompt)
    
    if cache_key in response_cache:
        cached = response_cache[cache_key]
        # Cache for 1 hour
        if datetime.now() - cached['timestamp'] < timedelta(hours=1):
            print(f"Cache hit for {cache_key[:8]}...")
            return cached['response']
        else:
            del response_cache[cache_key]  # Expired
    
    return None

def cache_response(system_prompt, user_prompt, response):
    """Cache successful response"""
    cache_key = get_cache_key(system_prompt, user_prompt)
    response_cache[cache_key] = {
        'response': response,
        'timestamp': datetime.now()
    }
    print(f"Cached response {cache_key[:8]}...")

## Use it in your AI client
def generate_with_cache(system_prompt, user_prompt):
    # Try cache first
    cached = get_cached_response(system_prompt, user_prompt)
    if cached:
        return cached
    
    # Generate new response
    ai = AIClient()
    response = ai.generate(system_prompt, user_prompt)
    
    # Cache it
    cache_response(system_prompt, user_prompt, response)
    return response

Handling Failures (Keep It Simple)

When Claude breaks, fall back to OpenAI:

def ai_generate_with_fallback(system_prompt, user_prompt, max_retries=2):
    """Try Claude first, fallback to OpenAI if it fails"""
    
    # Try Claude first
    for attempt in range(max_retries):
        try:
            claude_client = AIClient()  # with USE_CLAUDE=true
            return claude_client.generate(system_prompt, user_prompt)
        except Exception as e:
            print(f"Claude attempt {attempt+1} failed: {e}")
            if attempt < max_retries - 1:
                time.sleep(1)  # Brief delay before retry
                continue
            break
    
    # Claude failed, try OpenAI
    print("Claude failed completely, falling back to OpenAI")
    try:
        openai_client = AIClient()  # with USE_CLAUDE=false
        return openai_client.generate(system_prompt, user_prompt)
    except Exception as e:
        print(f"OpenAI also failed: {e}")
        raise Exception("Both AI providers are down")

## Emergency kill switch
def emergency_disable_claude():
    """Instantly switch all traffic to OpenAI"""
    import os
    os.environ["USE_CLAUDE"] = "false"
    print("🚨 Emergency: All traffic switched to OpenAI")
    # You'll need to restart your app or reload config

The key is keeping both APIs working during migration. Don't get fancy with circuit breakers and complex failover logic - simple retry and fallback works fine.

Related Tools & Recommendations

pricing
Similar content

OpenAI vs Claude vs Gemini: Enterprise AI API Cost Analysis

Uncover the true enterprise costs of OpenAI API, Anthropic Claude, and Google Gemini. Learn procurement realities, hidden fees, and how to budget for AI APIs ef

OpenAI API
/pricing/openai-api-vs-anthropic-claude-vs-google-gemini/enterprise-procurement-guide
100%
pricing
Similar content

AI API Pricing Reality Check: Claude, OpenAI, Gemini Costs

No bullshit breakdown of Claude, OpenAI, and Gemini API costs from someone who's been burned by surprise bills

Claude
/pricing/claude-vs-openai-vs-gemini-api/api-pricing-comparison
90%
news
Similar content

Anthropic Claude Data Policy Changes: Opt-Out by Sept 28 Deadline

September 28 Deadline to Stop Claude From Reading Your Shit - August 28, 2025

NVIDIA AI Chips
/news/2025-08-28/anthropic-claude-data-policy-changes
86%
news
Similar content

Anthropic Claude Data Deadline: Share or Keep Private by Sept 28

Anthropic Just Gave Every User 20 Days to Choose: Share Your Data or Get Auto-Opted Out

Microsoft Copilot
/news/2025-09-08/anthropic-claude-data-deadline
86%
pricing
Similar content

Enterprise AI API Costs: Claude, OpenAI, Gemini TCO Analysis

Our AI bill went from around $500 to over $12K in one month. Here's everything I learned about enterprise AI pricing so your finance team doesn't murder you.

Claude
/pricing/enterprise-ai-apis-2025-claude-openai-gemini-tco-analysis/enterprise-tco-analysis
71%
integration
Recommended

Claude + LangChain + Pinecone RAG: What Actually Works in Production

The only RAG stack I haven't had to tear down and rebuild after 6 months

Claude
/integration/claude-langchain-pinecone-rag/production-rag-architecture
64%
tool
Recommended

LangChain - Python Library for Building AI Apps

integrates with LangChain

LangChain
/tool/langchain/overview
64%
tool
Recommended

LangChain Production Deployment - What Actually Breaks

integrates with LangChain

LangChain
/tool/langchain/production-deployment-guide
64%
tool
Recommended

Azure OpenAI Enterprise Deployment - Don't Let Security Theater Kill Your Project

So you built a chatbot over the weekend and now everyone wants it in prod? Time to learn why "just use the API key" doesn't fly when Janet from compliance gets

Microsoft Azure OpenAI Service
/tool/azure-openai-service/enterprise-deployment-guide
61%
tool
Recommended

Azure OpenAI Service - OpenAI Models Wrapped in Microsoft Bureaucracy

You need GPT-4 but your company requires SOC 2 compliance. Welcome to Azure OpenAI hell.

Azure OpenAI Service
/tool/azure-openai-service/overview
61%
tool
Recommended

Azure OpenAI Service - Production Troubleshooting Guide

When Azure OpenAI breaks in production (and it will), here's how to unfuck it.

Azure OpenAI Service
/tool/azure-openai-service/production-troubleshooting
61%
tool
Recommended

Google Vertex AI - Google's Answer to AWS SageMaker

Google's ML platform that combines their scattered AI services into one place. Expect higher bills than advertised but decent Gemini model access if you're alre

Google Vertex AI
/tool/google-vertex-ai/overview
59%
alternatives
Similar content

Claude API Alternatives: Affordable LLMs & Migration Guide

Real alternatives from developers who've actually made the switch in production

Claude API
/alternatives/claude-api/developer-focused-alternatives
59%
howto
Similar content

REST to GraphQL Migration Guide: Real-World Survival Tips

I've done this migration three times now and screwed it up twice. This guide comes from 18 months of production GraphQL migrations - including the failures nobo

/howto/migrate-rest-api-to-graphql/complete-migration-guide
58%
review
Similar content

Enterprise AI Platforms: Real-world Comparison & Alternatives

Real-world experience with AWS Bedrock, Azure OpenAI, Google Vertex AI, and Claude API after way too much time debugging this stuff

OpenAI API Enterprise
/review/openai-api-alternatives-enterprise-comparison/enterprise-evaluation
58%
tool
Similar content

OpenAI Platform API Guide: Setup, Authentication & Costs

Call GPT from your code, watch your bills explode

OpenAI Platform API
/tool/openai-platform-api/overview
58%
news
Similar content

OpenAI & Anthropic Reveal Critical AI Safety Testing Flaws

Two AI Companies Admit Their Safety Systems Suck

OpenAI ChatGPT/GPT Models
/news/2025-08-31/ai-safety-testing-concerns
56%
review
Similar content

OpenAI API Enterprise Review: Costs, Value & Implementation Truths

Skip the sales pitch. Here's what this thing really costs and when it'll break your budget.

OpenAI API Enterprise
/review/openai-api-enterprise/enterprise-evaluation-review
52%
tool
Recommended

Replicate - Skip the Docker Nightmares and CUDA Driver Battles

alternative to Replicate

Replicate
/tool/replicate/overview
51%
tool
Similar content

OpenAI Platform Team Management: Secure API Keys & Budget Control

How to manage your team's AI budget without going bankrupt or letting devs accidentally nuke production

OpenAI Platform
/tool/openai-platform/project-organization-management
51%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization