Claude API - Anthropic's Actually Reliable AI API

Currently viewing the human version

Why Claude API Doesn't Suck (Unlike Some Others)

I've shipped this stuff to prod, here's what actually works

Look, I've been burned by AI APIs before. The Claude API is different - it actually does what the docs say it does. No mysterious "the model is currently overloaded" errors during your product launch, no weird hallucinations about API endpoints that don't exist.

The Three Models: What They're Actually Good For

Opus 4.1: The Expensive Genius That'll Bankrupt You

Opus 4.1 is fucking brilliant but will absolutely destroy your API budget. The 200K token context window handles entire codebases without choking. I learned this the hard way when it solved a distributed systems problem that had stumped our entire team for weeks, then I got the bill. Check the model comparison for detailed specifications and performance benchmarks before you accidentally spend your quarterly budget in a weekend.

Actually useful for:

Code architecture - It'll design your entire system and actually make sense
Complex debugging - Finds the bug you've been staring at all afternoon
Document analysis - Reads contracts better than most lawyers (and cheaper)
Research synthesis - This thing connects dots you missed across 50 papers

Reality check: Check current pricing - Opus 4.1 costs like $75 per million output tokens. A complex debugging session can hit $50+ easily. Worth every penny when you're two hours away from missing a deadline, financial suicide for routine tasks.

Sonnet 4: The Workhorse That Won't Get You Fired

Anthropic AI Development

Sonnet 4 is your bread and butter. Fast enough for real-time chat, smart enough for complex code review, and priced so you won't have to explain a massive API bill to your boss. This is what you use for 90% of everything unless you really need Opus-level reasoning or you're processing mountains of simple shit with Haiku.

What it's actually good at:

Customer support - Handles weird user questions without making stuff up
Code reviews - Catches bugs and suggests improvements without being pedantic
Content editing - Rewrites your docs to not sound like you wrote them hungover
API integrations - The tool calling actually works reliably, unlike some other APIs that just randomly fail

Real cost: Check current pricing - Sonnet 4 runs about $15 per million output tokens. A typical customer support conversation costs maybe $0.03-0.05. Less than your overpriced startup coffee, way more useful than your morning standup.

Haiku 3.5: Fast but Dumb as a Brick

Haiku 3.5 is lightning fast and costs basically nothing. Perfect for brain-dead simple tasks, absolutely useless the moment someone asks anything that requires two brain cells to rub together.

Use it for:

Simple chat responses - "How do I reset my password?" not "Explain quantum computing"
Content generation - Blog post outlines, not the actual posts
Data extraction - Pull info from structured data, don't ask for analysis
High-volume processing - When you need to process 100k documents and speed is more important than accuracy

Economics: Stupidly cheap at like $4 per million output tokens, but you absolutely get what you pay for. Learned this during a product demo when Haiku confidently told a potential customer that our SaaS was "available on Mars" because it misunderstood our global availability messaging.

What Actually Works (And What Doesn't)

Context Windows That Don't Lie

The 200K token context window isn't marketing bullshit - it actually works across all models. I've stuffed entire repos into it, and Claude remembers variables defined way back in the conversation.

Real use cases that work:

Codebase analysis - Drop your entire backend and ask it to find the bug
Long conversations - Customer support sessions that go 20+ messages without losing track
Document synthesis - I've fed it a shitload of PDFs and actually got a coherent summary back

Gotcha: The bigger the context, the slower the response. Budget a few seconds for huge requests. Found this out the hard way when I stuffed our entire API documentation (like 150K tokens) into a single request and it took 12 seconds to respond. Users thought the app crashed.

Vision That Doesn't Hallucinate Unicorns

Claude's vision actually reads what's in images instead of making shit up:

Screenshots - Describes your UI bugs better than your QA team
Charts/graphs - Extracts data without inventing numbers
Handwritten notes - Reads your terrible handwriting better than you do
Technical diagrams - Actually understands system architecture drawings, which is honestly impressive

Reality check: Works great on clear images, struggles with low-res photos or weird angles. Test with your actual data before building your entire pipeline around it.

Tool Calling That Mostly Works

Tool calling connects Claude to your APIs and databases. When it works, it's magical. When it doesn't, the error messages are useless.

What works well:

Database queries - Give it a schema and it writes decent SQL
API calls - Follows OpenAPI specs without weird hallucinations
File processing - Handles up to 500MB files
Python execution - The built-in sandbox actually runs code and shows you the results

Pain points: JSON schema validation errors are cryptic as hell. Plan to spend a day debugging function signatures.

Security Stuff That Won't Get You Fired

Anthropic Security Framework

Your security team will love this - Anthropic actually has all the compliance checkboxes filled out:

They've Got The Certifications

SOC 2 Type II - Check
HIPAA compliance - Check (if you're dealing with health data)
GDPR compliance - Check (with EU data centers)
Zero data retention - They don't train on your conversations

Access Control That Makes Sense

API keys with actual permissions (not just one key to rule them all)
SSO integration if your company is into that SAML/OAuth stuff
Usage monitoring so you can see who's burning through your budget
Audit trails for when compliance asks what the hell happened

Enterprise Features (If You Pay Enough)

Custom rate limits - Because 4k RPM isn't enough for some people
Priority support - Actual humans who know what they're talking about
Invoice billing - Monthly invoices instead of credit card charges
Data residency - Keep your EU data in the EU

Companies Actually Using This in Production

Cursor - Code Editor That Doesn't Suck

Cursor built their AI code editor on Claude and it actually works. Their users write code faster without the AI suggesting completely broken functions. The Cursor team has demonstrated Claude's effectiveness in real-world coding scenarios.

What they learned:

Claude understands context across large codebases
Tool calling integrates well with git operations
Streaming responses feel natural for code generation
Cost is manageable even for heavy daily usage

Intercom - Customer Support Without the Rage

Intercom uses Claude to handle customer support without making customers want to throw their laptops out the window.

Production lessons:

Haiku handles 80% of simple questions fine
Sonnet for complex issues requiring actual thinking
Tool integration with their knowledge base actually works
Response quality is consistent across languages

StubHub - Data Analysis That Makes Sense

StubHub processes massive amounts of event data and Claude helps them make sense of it without hiring 20 more analysts.

Real impact:

Market analysis that finds actual trends, not random correlations
Fraud detection that catches edge cases humans miss
Automated reports that executives actually read
Cost per analysis dropped 70% vs hiring consultants

Getting Your Shit Together: From Idea to Prod

Start Here (Don't Skip Steps)

Console - Test your prompts, don't go straight to code
Workbench - Tune your prompts until they work reliably
SDKs - Use the official Python/TypeScript libraries
Production - Add monitoring, error handling, and cost controls

Follow the quick start guide for proper setup and check the best practices documentation. The API reference has all the technical details you'll need for implementation.

How to Not Go Bankrupt

Route intelligently - Haiku for simple stuff, Sonnet for most things, Opus when you're desperate
Prompt caching - Up to 90% savings if you reuse context (which you will)
Batch processing - 50% off if you can wait a few hours
Monitor your spend - Set daily limits before your boss finds out

Bottom line: Claude API is the first AI API I've used that doesn't randomly break in production. The models do what they say they'll do, the pricing is transparent, and when something goes wrong, their support actually helps instead of sending you to a forum.

Claude Model Reality Check - What They Actually Cost and Do

Model	Smart Level	Input Cost	Output Cost	Context	Max Output	Reality Check
Opus 4.1	Terrifyingly smart	$$$$	$$$$$	200K tokens	32K tokens	Use when desperate, budget never
Sonnet 4	Smart enough	$$	$$$	200K tokens	8K tokens	Your daily driver for everything
Haiku 3.5	Fast & cheap	$	$	200K tokens	4K tokens	Simple tasks only, breaks on complex stuff

Actually Implementing Claude API: The Stuff That Breaks

Your First Call (That'll Probably Fail)

Don't dive straight into code. Seriously. Start with the Console and test your prompts first. Trust me, your first API call will fail for some stupid reason. Read through the authentication guide and API basics before diving in. The troubleshooting guide covers the most common mistakes.

API Development Workflow

## This'll work (copy-paste ready)
## Set ANTHROPIC_API_BASE=https://api.anthropic.com (see https://docs.anthropic.com/en/api/messages for details)
curl -X POST \"$ANTHROPIC_API_BASE/v1/messages\" \
  --header \"x-api-key: $ANTHROPIC_API_KEY\" \
  --header \"anthropic-version: 2023-06-01\" \
  --header \"content-type: application/json\" \
  --data '{
    \"model\": \"claude-sonnet-4-20250514\",
    \"max_tokens\": 1024,
    \"messages\": [
      {\"role\": \"user\", \"content\": \"Test message\"}
    ]
  }'

Common failures that'll waste your morning:

Forgot to set $ANTHROPIC_API_KEY - you'll get a cryptic 401 with no details
Used wrong model name - check the model docs for current model names
Forgot anthropic-version header - API throws {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\"}}
Set max_tokens too low - response gets cut off mid-sentence and you'll think the API is broken
Wrong content-type - forgot application/json and spent 2 hours debugging 400 errors

SDKs That Actually Work

Python SDK (The Solid One)

The Python SDK is bulletproof. Use it. Don't write your own HTTP client. Follow the Python quick start and check the SDK documentation for complete examples.

import os
import anthropic

## Never hardcode API keys. Ever.
client = anthropic.Anthropic(
    api_key=os.getenv(\"ANTHROPIC_API_KEY\")
)

try:
    message = client.messages.create(
        model=\"claude-sonnet-4-20250514\",
        max_tokens=1000,
        messages=[{\"role\": \"user\", \"content\": \"Test\"}]
    )
    print(message.content[0].text)
except anthropic.RateLimitError:
    print(\"Hit rate limit, implement exponential backoff\")
except anthropic.APIError as e:
    print(f\"API error: {e}\")

Gotchas learned the hard way:

message.content is a list, not a string - spent 2 hours debugging TypeError: 'list' object has no attribute 'strip' like an idiot
Rate limit errors will happen - especially during lunch rush when everyone's testing their shit. Handle them or your app dies with 429s
The temperature parameter barely does anything - Claude 4 is way more consistent than the old GPT chaos
Opus 4.1 throws cryptic validation errors if you set both temperature and top_p - pick one, asshole
Streaming responses randomly timeout mid-sentence if your nginx config is fucked - took me a weekend to figure out it was our proxy settings

TypeScript SDK (Has Weird Async Quirks)

The TypeScript SDK works but has some async weirdness. Check the TypeScript examples and Node.js integration guide for proper setup:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function callClaude(prompt: string) {
  try {
    const message = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1000,
      messages: [{ role: 'user', content: prompt }]
    });

    // Content is an array, just like Python SDK
    return message.content[0].text;
  } catch (error) {
    if (error instanceof Anthropic.RateLimitError) {
      // Handle rate limits properly
      throw new Error('Rate limited, try again later');
    }
    throw error;
  }
}

TypeScript pain points:

Async/await chains get messy with error handling
Type definitions are good but not perfect
Browser usage needs CORS proxy (obviously)

Production Features That'll Save Your Ass

Python Development

Streaming (Use It If Users Care About Typing Indicators)

Streaming gives users that satisfying "AI is thinking" feeling:

import anthropic

client = anthropic.Anthropic()

## Streaming adds complexity but users love it
with client.messages.stream(
    model=\"claude-sonnet-4-20250514\",
    max_tokens=1000,
    messages=[{\"role\": \"user\", \"content\": \"Explain something complex\"}]
) as stream:
    for text in stream.text_stream:
        print(text, end=\"\", flush=True)
        # In a real app, you'd send this to your websocket/SSE

Streaming reality:

Adds complexity to your error handling (stream can die mid-response)
Users love the real-time feedback, especially for long responses
Don't use it for batch processing - just adds overhead

Error Handling (You'll Hit Rate Limits)

Error Handling

You WILL hit rate limits in production. Plan for it:

import anthropic
import time
import random
from anthropic import RateLimitError, APIError

def call_claude_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model=\"claude-sonnet-4-20250514\",
                max_tokens=1000,
                messages=[{\"role\": \"user\", \"content\": prompt}]
            )
            return response.content[0].text

        except RateLimitError:
            if attempt == max_retries - 1:
                raise Exception(\"Still rate limited after retries\")

            # Exponential backoff with jitter (learned this the hard way)
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f\"Rate limited, waiting {wait_time:.1f}s...\")
            time.sleep(wait_time)

        except APIError as e:
            # These are usually your fault (bad request, invalid model, etc)
            print(f\"API error: {e}\")
            raise

Error handling reality:

Rate limits hit harder during peak hours (learned this during our product launch)
The streaming API can fail silently - monitor for incomplete responses
Network timeouts happen - set reasonable timeouts on your HTTP client

Context Management (Or How to Not Go Bankrupt)

Your biggest cost driver will be context bloat. Monitor it:

def keep_context_sane(conversation_history, max_tokens=100000):
    \"\"\"Prevent context from eating your budget\"\"\"

    # Keep system message + recent stuff
    system_msg = conversation_history[0] if conversation_history[0][\"role\"] == \"system\" else None
    recent_messages = conversation_history[-8:]  # Last 8 exchanges usually enough

    # Rough token estimation (good enough for cost control)
    total_chars = sum(len(msg[\"content\"]) for msg in recent_messages)
    estimated_tokens = total_chars // 4  # ~4 chars per token

    if estimated_tokens > max_tokens:
        # Nuclear option: just keep the system message and last 2 exchanges
        return ([system_msg] if system_msg else []) + conversation_history[-2:]

    return ([system_msg] if system_msg else []) + recent_messages

Context management lessons:

Users will paste entire documents. Truncate aggressively.
The 1M context window is real but responses get slower with big contexts
Monitor your token usage - costs spiral faster than you think

Advanced Stuff (When You Need It)

Tool Calling (JSON Schema Hell)

Tool calling works but the error messages suck. Read the tool use guide and check the function calling examples in the cookbook for working patterns:

## Simple example that actually works
def get_user_data(user_id: str) -> str:
    \"\"\"Get user data from database\"\"\"
    return f\"User {user_id}: Premium subscriber since 2023\"

tools = [
    {
        \"name\": \"get_user_data\",
        \"description\": \"Get user information from database\",
        \"input_schema\": {
            \"type\": \"object\",
            \"properties\": {
                \"user_id\": {\"type\": \"string\"}
            },
            \"required\": [\"user_id\"]
        }
    }
]

response = client.messages.create(
    model=\"claude-sonnet-4-20250514\",
    max_tokens=1000,
    tools=tools,
    messages=[{\"role\": \"user\", \"content\": \"What can you tell me about user123?\"}]
)

## Handle tool calls (this part is annoying)
if response.content[0].type == \"tool_use\":
    tool_call = response.content[0]
    result = get_user_data(tool_call.input[\"user_id\"])
    # Now send another request with the result...

Tool calling pain points:

JSON schema validation errors are cryptic as hell
You need to handle the back-and-forth conversation manually
Complex schemas break in weird ways

File Processing (Upload Timeouts Suck)

The Files API handles big documents but watch the upload timeouts:

## Files can be huge but uploads can timeout
try:
    with open(\"large_document.pdf\", \"rb\") as f:
        file_response = client.files.create(
            file=f,
            purpose=\"user_data\"
        )

    # Now analyze it
    analysis = client.messages.create(
        model=\"claude-sonnet-4-20250514\",
        max_tokens=2000,
        messages=[{
            \"role\": \"user\",
            \"content\": [
                {\"type\": \"text\", \"text\": \"What are the key points in this document?\"},
                {\"type\": \"file\", \"source\": {\"type\": \"file\", \"media_type\": \"application/pdf\", \"file_id\": file_response.id}}
            ]
        }]
    )
except Exception as e:
    print(f\"File upload failed: {e}\")
    # Have a fallback for large files

File API gotchas:

500MB limit sounds big but PDFs add up fast
Upload timeouts are real - implement retries
Some file formats work better than others (PDFs > Word docs)

Production Monitoring (Set It Up Before You Go Live)

Monitoring Dashboard

Cost Tracking (Your Boss Will Ask)

Monitor spending BEFORE it gets out of hand:

class SimpleCostTracker:
    def __init__(self, daily_budget=500):
        self.daily_budget = daily_budget
        self.daily_spend = 0
        # Model costs per million tokens (check current pricing docs!)
        self.rates = {
            \"claude-3-5-sonnet-latest\": {\"input\": 3.00, \"output\": 15.00},
            \"claude-3-5-haiku-latest\": {\"input\": 0.80, \"output\": 4.00},
            \"claude-3-opus-latest\": {\"input\": 15.00, \"output\": 75.00}
        }

    def estimate_cost(self, model, input_tokens, output_tokens):
        rate = self.rates.get(model, self.rates[\"claude-3-5-sonnet-latest\"])
        cost = (input_tokens * rate[\"input\"] + output_tokens * rate[\"output\"]) / 1_000_000
        return cost

    def check_budget(self, estimated_cost):
        if self.daily_spend + estimated_cost > self.daily_budget:
            raise Exception(f\"Daily budget hit: ${self.daily_spend:.2f} + ${estimated_cost:.2f} > ${self.daily_budget}\")
        return True

Monitoring That Actually Helps

Set up real monitoring immediately:

import logging
import time

## Log everything for debugging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitored_claude_call(prompt, model=\"claude-3-5-sonnet-latest\"):
    start_time = time.time()
    try:
        response = client.messages.create(
            model=model,
            max_tokens=1000,
            messages=[{\"role\": \"user\", \"content\": prompt}]
        )

        duration = time.time() - start_time
        logger.info(f\"Claude call successful: {duration:.2f}s, model={model}\")
        return response

    except Exception as e:
        duration = time.time() - start_time
        logger.error(f\"Claude call failed: {duration:.2f}s, error={e}\")
        raise

Bottom line: Claude API is solid but you need proper error handling, cost monitoring, and rate limit management. The docs are good but missing the production gotchas. Plan for rate limits, monitor your spend, and test everything with real user data before going live.

Claude API FAQ - Questions Engineers Actually Ask

Why doesn't my API call work?

First things you probably screwed up:

Missing x-api-key header (you'll get a 401)
Wrong model name - they change regularly, check the docs
Forgot anthropic-version header (API complains about this)
Set max_tokens too low and responses get cut off
API key doesn't have permissions (if you're using organization-level keys)

Quick test:

## Set ANTHROPIC_API_BASE=https://api.anthropic.com - docs at https://docs.anthropic.com/en/api/messages
curl -X POST "$ANTHROPIC_API_BASE/v1/messages" \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{"model": "claude-sonnet-4-20250514", "max_tokens": 100, "messages": [{"role": "user", "content": "test"}]}'

Why is my bill higher than expected?

Your costs probably spiraled because:

Context bloat - Users paste entire documents and you don't truncate
Using Opus for everything - The output costs will murder your budget fast
No prompt caching - You're re-sending the same context repeatedly
Inefficient prompts - Long prompts = higher input costs

Check current pricing - rates change:

Haiku: Check docs for input/output rates per million tokens
Sonnet: Check docs for input/output rates per million tokens
Opus: Check docs for input/output rates per million tokens

Cost control:

Use prompt caching for repeated context (saves money)
Batch processing for discounts on non-urgent stuff
Route simple tasks to Haiku, complex stuff to Sonnet, only use Opus when desperate

Reality check: A typical chat message costs like $0.02-0.05 with Sonnet. If you're paying way more, something's fucked.

Which model should I actually use?

AI Model Selection

Start with Sonnet 4 for everything. Seriously. Don't overthink it.

Once you're running in production:

Haiku 3.5 for brain-dead simple stuff ("reset my password", basic data extraction)
Sonnet 4 for everything else (seriously, like 90% of your use cases)
Opus 4.1 when Sonnet shits the bed and you're desperate (complex analysis, architectural decisions)

Real decision tree:

Does it need to be fast and cheap? → Haiku (but test it thoroughly)
Is it complex reasoning that Sonnet struggles with? → Opus (check your budget first)
Everything else → Sonnet

Don't fall into the trap of optimizing model selection before you have real usage data. Get Sonnet 4 working first, then optimize costs later when you actually understand your traffic patterns.

Why do I keep hitting rate limits?

Rate limits are real and you'll hit them:

Standard tier: 1K requests/minute
Higher tiers: Up to 4K requests/minute (if you spend enough)
Enterprise: Whatever you negotiate

You're probably hitting limits because:

Peak usage spikes (lunch time, Monday mornings)
No request queuing/backoff logic
Multiple instances hitting the same rate limit
You're using the API for batch processing (use the batch API instead, idiot)

How to handle it:

import time
import random

def exponential_backoff(attempt):
    wait_time = (2 ** attempt) + random.uniform(0, 1)
    time.sleep(wait_time)

Pro tip: Most apps never hit 1K RPM in normal usage. If you're hitting limits regularly, you're probably doing something inefficient.

How do I not leak my API key?

Security Key

Basic security (don't be an idiot):

Never hardcode API keys in your code
Use environment variables: os.getenv("ANTHROPIC_API_KEY")
Add .env to your .gitignore
Rotate keys regularly (monthly is good)

For production:

Use your cloud provider's secret management (AWS Secrets Manager, etc.)
Set up API key rotation
Monitor usage for weird spikes (could indicate compromise)

Enterprise stuff your security team wants:

SOC 2, HIPAA, GDPR compliance (Anthropic has it)
SSO integration (SAML/OAuth)
Audit logs of all API calls
Role-based access controls

## Don't do this
client = Anthropic(api_key="sk-ant-api03-...")

## Do this
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

How do I debug tool calling when it breaks?

Tool calling error messages are terrible. Here's what usually goes wrong:

Common issues:

JSON schema doesn't match what Claude sends
Function description is unclear
Missing required parameters
Wrong parameter types

Debugging tips:

## Add verbose logging to see what Claude is actually sending
import json
import logging

logging.basicConfig(level=logging.DEBUG)

def my_function(user_id: str):
    logging.debug(f"Function called with: {user_id}")
    return f"User data for {user_id}"

## Check the tool call details
if response.content[0].type == "tool_use":
    tool_call = response.content[0]
    logging.debug(f"Tool: {tool_call.name}, Input: {tool_call.input}")

Pro tip: Start with simple functions and gradually add complexity. Claude is picky about schema definitions.

What languages/frameworks work with Claude?

Use the official SDKs if possible:

Python: anthropic-sdk-python - solid, well-maintained
TypeScript/JavaScript: anthropic-sdk-typescript - has some async quirks but works

Everything else:

It's just HTTP REST calls, any language with an HTTP client works
Community SDKs exist for Go, Rust, Ruby, PHP
Quality varies, check GitHub activity before using

Frameworks that integrate well:

LangChain - native support, lots of examples
Streamlit - easy UI prototypes
FastAPI - production APIs

Skip the frameworks initially - get the raw API working first, then add complexity.

How do I monitor costs before my boss freaks out?

Set up monitoring immediately:

Console dashboard shows real-time usage
Set up spending alerts (do this first)
Check costs daily until you understand usage patterns

Simple cost tracking:

class CostMonitor:
    def __init__(self):
        self.daily_budget = 100  # Set your actual limit
        self.current_spend = 0

    def track_call(self, model, input_tokens, output_tokens):
        # WARNING: Update these rates from the pricing docs!
        rates = {
            "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
            "claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
            "claude-opus-4-1-20250805": {"input": 15.00, "output": 75.00}
        }
        cost = (input_tokens * rates[model]["input"] +
                output_tokens * rates[model]["output"]) / 1_000_000
        self.current_spend += cost

        if self.current_spend > self.daily_budget:
            raise Exception(f"Daily budget exceeded: ${self.current_spend:.2f}")

Pro tip: Set conservative daily limits initially. You can always increase them, but overages are hard to explain.

Can I use this commercially without getting sued?

Yes, it's designed for commercial use:

No weird licensing restrictions
You own the output Claude generates
Standard terms of service (read them, obviously)

Compliance stuff that matters:

SOC 2, HIPAA, GDPR certified (your legal team will ask)
Data processing agreements available
EU data residency if you need it

Enterprise features if you pay enough:

99.9% uptime SLA (they actually hit this)
Dedicated support (real humans, not chatbots)
Custom contracts for big deployments

What breaks in production?

Shit that'll wake you up during on-call:

Rate limits during peak hours - hit 4K RPM limit during Black Friday rush, API returned HTTP 429 with Retry-After: 60, spent my entire weekend implementing exponential backoff with jitter
Context windows filling up - some genius user pasted their entire 100MB log file (I think it was like 600GB? Maybe more?), maxed out the 200K token limit, crashed our chat service with context_length_exceeded errors until I added aggressive truncation at 150K tokens
Streaming responses dying mid-sentence - nginx proxy_read_timeout was set to 30s, Claude's streaming died halfway through responses, users thought the AI had a stroke, took me 3 tries to find the right nginx config
Tool calling schema errors - spent 6 hours debugging {"type": "invalid_request_error", "error": {"type": "invalid_request_error", "message": "function: null"}} that turned out to be a missing fucking comma in my JSON schema
File uploads timing out - 500MB PDF uploads kept failing with request_timeout after exactly 30 seconds, had to implement chunking and multipart uploads which was a complete nightmare
Memory leaks in long conversations - conversation history grew to 2GB+ RAM per user session, took down our server at 2am on Sunday with OOMKilled errors, now I aggressively prune context after 50 messages

Realistic limitations:

Context window: 200K tokens (real, but slower with big contexts)
No memory between conversations (obvious but people forget)
Rate limits based on your tier (1K-4K RPM max)
JSON responses only (no streaming binary data)

Infrastructure you'll actually need:

Redis for caching and session management
Queue system for rate limit handling
Monitoring for cost and error tracking
Load balancer if you hit scale

Languages: English works best, other languages are decent but not perfect. Programming languages work great across the board.

Quick Navigation

I've shipped this stuff to prod, here's what actually works

The Three Models: What They're Actually Good For

Opus 4.1: The Expensive Genius That'll Bankrupt You

Sonnet 4: The Workhorse That Won't Get You Fired

Haiku 3.5: Fast but Dumb as a Brick

What Actually Works (And What Doesn't)

Context Windows That Don't Lie

Vision That Doesn't Hallucinate Unicorns

Tool Calling That Mostly Works

Security Stuff That Won't Get You Fired

They've Got The Certifications

Access Control That Makes Sense

Enterprise Features (If You Pay Enough)

Companies Actually Using This in Production

Cursor - Code Editor That Doesn't Suck

Intercom - Customer Support Without the Rage

StubHub - Data Analysis That Makes Sense

Getting Your Shit Together: From Idea to Prod

Start Here (Don't Skip Steps)

How to Not Go Bankrupt

Your First Call (That'll Probably Fail)

SDKs That Actually Work

Python SDK (The Solid One)

TypeScript SDK (Has Weird Async Quirks)

Production Features That'll Save Your Ass

Streaming (Use It If Users Care About Typing Indicators)

Error Handling (You'll Hit Rate Limits)

Context Management (Or How to Not Go Bankrupt)

Advanced Stuff (When You Need It)

Tool Calling (JSON Schema Hell)

File Processing (Upload Timeouts Suck)

Production Monitoring (Set It Up Before You Go Live)

Cost Tracking (Your Boss Will Ask)

Monitoring That Actually Helps

Why doesn't my API call work?

Why is my bill higher than expected?

Which model should I actually use?

Why do I keep hitting rate limits?

How do I not leak my API key?

How do I debug tool calling when it breaks?

What languages/frameworks work with Claude?

How do I monitor costs before my boss freaks out?

Can I use this commercially without getting sued?

What breaks in production?

Related Tools & Recommendations

Multi-Framework AI Agent Integration - What Actually Works in Production

LangChain vs LlamaIndex vs Haystack vs AutoGen - Which One Won't Ruin Your Weekend

Claude Pricing Got You Down? Here Are the Alternatives That Won't Bankrupt Your Startup

Python vs JavaScript vs Go vs Rust - Production Reality Check

OpenAI Alternatives That Actually Save Money (And Don't Suck)

OpenAI Alternatives That Won't Bankrupt You

I've Been Testing Enterprise AI Platforms in Production - Here's What Actually Works

Google Gemini API: What breaks and how to fix it

Google Vertex AI - Google's Answer to AWS SageMaker

Cursor vs GitHub Copilot vs Codeium vs Tabnine vs Amazon Q - Which One Won't Screw You Over

Amazon ECR - Because Managing Your Own Registry Sucks

I've Been Testing Amazon Q Developer for 3 Months - Here's What Actually Works and What's Marketing Bullshit

Google Pixel 10 Pro Launch: Tensor G5 and Gemini AI Integration

Google Gets Slapped With $425M for Lying About Privacy (Shocking, I Know)

GKE Security That Actually Stops Attacks

Azure OpenAI Service - Production Troubleshooting Guide

Azure OpenAI Enterprise Deployment - Don't Let Security Theater Kill Your Project

How to Actually Use Azure OpenAI APIs Without Losing Your Mind

Stop Fighting with Vector Databases - Here's How to Make Weaviate, LangChain, and Next.js Actually Work Together

LlamaIndex - Document Q&A That Doesn't Suck