Setup Reality Check
Setup is pretty straightforward if you've used OpenAI before. Here's what actually happens:
- Create account at perplexity.ai - the API key generation is buried in their settings, took me 10 minutes to find
- Generate API key - they have this "API groups" thing for billing which is confusing at first
- Install OpenAI SDK - yes, the same one you're already using
- Change your endpoint to
https://api.perplexity.ai/chat/completions
That's literally it. If you have OpenAI code, just swap the URL and key.
Your First Call (That Actually Works)
Copy-paste this and it'll work:
curl -X POST 'https://api.perplexity.ai/chat/completions' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "sonar",
"messages": [{"role": "user", "content": "What happened in AI news today?"}]
}'
You get back the answer plus all the sources it used. The response is bigger than OpenAI because it includes search results:
{
"choices": [{"message": {"content": "Based on recent reports..."}}],
"search_results": [
{"title": "AI Company Raises $100M", "url": "https://...", "date": "2025-08-29"}
],
"usage": {
"total_tokens": 245,
"cost": {
"input_tokens_cost": 0.000024,
"output_tokens_cost": 0.006585,
"request_cost": 0.006,
"total_cost": 0.012609
}
}
}
Gotcha #1: The search_results
array is fucking huge - seen 20+ sources with full metadata, URLs, abstracts, the works. Don't log this shit or you'll fill your disk in an hour:
const { search_results, ...loggableResponse } = response;
console.log("Response:", loggableResponse);
Docker gotcha that cost me 3 hours: Running in Docker with <1GB memory? Those massive search arrays will OOM your container randomly. Spent a morning debugging why our staging pods kept dying with exit code 137
every 10th request. Bumped memory to 1.5GB minimum and problem disappeared.
Recent update: They just added detailed cost breakdown in every API response. Now you get exact costs for input tokens, output tokens, request fees, and total cost - makes billing way more transparent.
The Pricing That'll Surprise You
Here's the actual pricing from their docs:
- Sonar: $1/M input + $1/M output tokens (basic model, good for most shit)
- Sonar Pro: $3/M input + $15/M output tokens (expensive but sometimes worth it)
- Sonar Deep Research: $2/M input + $8/M output + $2/M citation + $5 per 1K search queries (when you need everything and budgets don't matter)
The gotcha that murdered my budget: Sonar Pro costs 15x more for output tokens. I'm burning $180/month in staging with basic Sonar. Left it on Pro for one weekend and got a $600 bill. Nearly had a heart attack. Those output tokens add up faster than DynamoDB read units.
Rate limits that will bite you:
- Starter: 20 requests/minute (you'll hit this in seconds during development)
- Professional: 100+ requests/minute
- Enterprise: Custom limits if you pay enough
Pro tip: Test with basic Sonar first. Sonar Pro costs 15x more for output tokens and sometimes takes longer - not worth the "better reasoning" when you just need current info.
Production horror story: Left staging pointing at Sonar Pro for a weekend. Didn't notice until the $600 bill showed up. Should've been $180. Set up billing alerts or you'll get the same surprise I did.
Don't Make My Mistakes
Drop-in replacement works perfectly - until you hit the rate limits. The OpenAI compatibility is legit, just change the base URL.
Multi-provider is the way to go: I route research questions to Perplexity and creative stuff to OpenAI. Works great.
Streaming actually works unlike some other APIs. The search results come through early, then the answer streams. Good for UX.
Shit that will break your app (learned the hard way):
- Search timeouts: Fails 5% of the time with "search exceeded 30 second limit" - you get whatever garbage they scraped before timing out
- Rate limits from hell: 20 req/min is a joke. Hit it testing their hello world example. Get HTTP 429 with "rate limit exceeded, try again in 57 seconds"
- Response size bombs: Some responses are 80KB+ JSON because of search metadata - crashed our mobile client's JSON parser twice
- Memory leaks: Those search_results arrays pile up fast in Node.js - had staging eat 2GB RAM in an hour
- Surprise bills: Left Pro on a test endpoint over the weekend, came back to a $480 AWS-style surprise

What actually works in production:
- Cache responses for at least 10 minutes (search results change slowly anyway)
- Strip search_results array before logging (
delete response.search_results
)
- Always implement retry logic with exponential backoff (start with 1s, max 30s)
- Monitor both token costs AND response times (search timeouts kill UX)
- Set request timeouts to 35+ seconds (search takes 30s max)
Actually useful docs: Official quickstart (copy-paste examples work), pricing breakdown (updated pricing as of Aug 2024), and rate limits docs (you'll hit 20 req/min in seconds).