Claude's API is pretty solid compared to most AI APIs. No bullshit OAuth flows, no complex configuration - just grab an API key and start making requests. The authentication is straightforward bearer token stuff that actually works as documented.
Right now you've got three main models to pick from: Sonnet 4 is the sweet spot - handles most tasks without killing your budget. Opus 4.1 costs a lot more but actually thinks through complex problems. Haiku 3.5 is fast and cheap but sometimes gives you answers that make you wonder if it's having a stroke.
Recent API Features (August-September 2025)
The web fetch tool is actually useful for scraping data. Files API lets you upload documents (10MB limit will bite you), and extended thinking makes Claude way smarter but can double or triple your token costs. Found that out the hard way.
The API does tool use, vision, and PDF stuff. PDFs can explode into 100K+ tokens without warning.
Claude Sonnet 4 got a 1M token context window in beta but it costs 2x input, 1.5x output when you go over 200K tokens. Prompt caching saves 90% costs IF you structure prompts right - get it wrong and you're paying full price for nothing.
Authentication and Initial Setup
Anthropic Console Dashboard
Get your API key from the Anthropic Console. Don't hardcode it anywhere because security will have your head. Store it in env vars and rotate it occasionally or when you inevitably commit it to git by accident.
## Don't hardcode your key anywhere - security will not be happy
export ANTHROPIC_API_KEY=\"sk-ant-api03-...\"
## Test your connection
curl -X GET https://docs.anthropic.com/en/api/models-list \
-H \"x-api-key: $ANTHROPIC_API_KEY\" \
-H \"anthropic-version: 2023-06-01\"
The Messages API is your main workhorse. File uploads work great until you hit the 10MB limit. Batch processing saves money if you can wait 24 hours for results. Usage monitoring tells you how much you've spent after it's too late.
Model Selection and Cost Reality Checks
Claude Opus 4.1 - Only use this when you actually need the smartest model. Extended thinking made one of my simple batch jobs cost like $800 because I didn't realize how many tokens it was generating behind the scenes. Good for complex analysis, terrible for your wallet.
Claude Sonnet 4 - This is what you want for most stuff. Handles code review, document analysis, complex questions. The 1M context window sounds cool but costs more when you use it - found this out analyzing a big codebase and getting hit with a $150 charge.
Claude Haiku 3.5 - Fast and cheap. Good for simple tasks like classification or basic questions. Don't expect it to understand complex logic or write good code.
Use specific model IDs like claude-sonnet-4-20250514
in production, not aliases. I learned this when an alias auto-updated overnight and broke our streaming - took forever to figure out why responses were getting cut off. Check your model version if you see weird behavior after updates.
Migration guides exist but who has time to read those when prod is on fire?
API Integration Patterns That Actually Work
API Integration Architecture Pattern
Production is where your perfect demo falls apart. Rate limits hit during peak hours, costs spike when you're not watching, and Claude decides your perfectly reasonable question violates some content policy.
import asyncio
from anthropic import Anthropic
import time
client = Anthropic()
async def claude_request_that_wont_fuck_you(messages, model=\"claude-sonnet-4-20250514\"):
\"\"\"This actually handles the shit that goes wrong\"\"\"
for attempt in range(3):
try:
response = await client.messages.create(
model=model,
max_tokens=1000,
messages=messages,
timeout=60.0 # 30 seconds is too short for real requests
)
return response.content[0].text
except Exception as e:
if \"rate_limit\" in str(e).lower():
# Exponential backoff because Claude's rate limits are brutal
await asyncio.sleep(2 ** attempt)
continue
elif \"content_filter\" in str(e).lower():
# Claude blocked your request - good luck debugging why
raise ContentFilterError(\"Claude didn't like something in your prompt\")
else:
# Something else broke, probably not worth retrying
raise APIError(f\"Claude API error: {e}\")
# Give up after retries - usually means service issues or bad request
raise APIError(\"Gave up after 3 attempts - check your request or try again later\")
Batch processing saves 50% but adds 24-hour delays. Prompt caching works great when it works - structure it wrong and you're paying full price. Tool integration is amazing until your external API times out and Claude just... stops.
The OpenAI compatibility layer is helpful if you're migrating from OpenAI, but don't expect feature parity. The Usage API helps track costs after they've already killed your budget.
This overview gives you the foundation, but the real pain points emerge in production. The comparison table that follows will help you understand where Claude stands against alternatives, and then we'll dive into the production war stories that separate working demos from scalable systems.