Everyone publishes pretty pricing tables, but here's what you need to know about running these things in production: you'll burn through way more tokens than expected, hit rate limits during client demos, and discover costs nobody mentioned in their marketing bullshit.
The Real Models You'll Actually Use
Claude 3.5 Sonnet ($$3/$$15 per million tokens) is what most developers end up with after getting burned by GPT-4. Yeah, it's expensive, but when GPT-4 gives you complete garbage for the third time on a complex reasoning task, you'll pay extra for Claude's consistency. I switched our document analysis from GPT-4 to Claude and our accuracy went from like 70-something percent to mid-90s - huge difference.
GPT-4o ($$5/$$15 per million tokens) is OpenAI's attempt to compete on price while maintaining quality. Works fine for most tasks, but still has that annoying tendency to confidently make shit up. Got fucked by this in v4.0.8 where it started inventing API endpoints that don't exist. Good for content generation where you can fact-check everything.
GPT-4o Mini ($$0.15/$$0.60 per million tokens) is legitimately useful for simple classification and basic tasks. I use it for initial content filtering before sending complex stuff to better models. Saves about 60% on costs compared to full GPT-4o.
Gemini 2.0 Flash ($$0.075/$$0.30 per million tokens) - Google's pricing is extremely aggressive but the model quality feels inconsistent. Works great for some tasks, completely shits the bed on others. Had it randomly start giving answers in Spanish for a week in August - turned out to be some weird bug in their v2.0.1 release. Worth testing for high-volume simple tasks where accuracy isn't critical.
The Hidden Costs That Will Fuck Your Budget
Context window overages are budget killers. Claude charges double ($$6/$$22.50) for prompts over 200K tokens, and you won't realize you're hitting this until you get the bill. We got hit with a $$800 surprise charge because our document summarization was accidentally including massive PDFs that blew past the limit.
Rate limits will cost you money in ways you don't expect. When you hit OpenAI's rate limits during peak hours, your app just sits there doing nothing while your users get pissed. Claude's limits are more generous but cost more per token.
Token counting is fucked across all providers. What they call "1,000 tokens" varies wildly. A 500-word document might be 600 tokens in Claude, 750 in GPT-4, and 800 in Gemini. Who knows why - the tokenizers just suck differently. Always overestimate by 20% or you'll get burned.
The biggest lesson: start cheap for prototyping, but budget for the premium ones for production. Free tiers will get you maybe 2 days of actual testing before you hit limits - learned this when our demo died during a client call because we hit OpenAI's free limit. Error was something like "RateLimitError: exceeded rate_limit_per_minute for gpt-4" right in front of the fucking client.
But per-token costs are just the start. The real budget killers are hidden in the pricing details nobody reads.