Google's REST API for talking to their AI models. Works fine, nothing revolutionary. Used it on a side project recently and ran into all the usual bullshit.
As of September 2025, there's three main flavors: Gemini 2.5 Flash, 2.5 Pro, and 2.0 Flash. Don't get confused by the version numbers - the 2.5 models are actually newer than 2.0. Google's marketing team clearly had a stroke.
The models you actually care about
Flash is fast and cheap. Pro is slow, expensive, but actually thinks. Flash-Lite is even cheaper but dumber. That's literally it.
Flash costs $0.30/million input tokens. Pro starts at $1.25 but jumps to $2.50 for large prompts (found this out the expensive way when our bill went from $20 to $200 overnight). These prices change without warning, so don't hardcode anything.
The free tier is a trap. Looks generous until you hit the rate limits. Then you're fucked.
Production reality: That 1M token context window sounds amazing until you realize rate limits will kill you first. We never got close to using the full context because everything times out or gets throttled.
Actually using the damn thing
Get an API key from Google AI Studio. Takes 30 seconds if you're lucky, 20 minutes if their OAuth is broken again.
Python SDK works. JavaScript SDK works. Don't use the raw REST API unless you enjoy pain. The SDKs handle retry logic, which you absolutely need because Google's infrastructure hiccups constantly.
Real gotcha: The SDK retries everything aggressively - your logs will be full of retry spam, but it usually works. Budget extra time for debugging because you'll spend hours figuring out which errors are real vs retry noise.
Model selection reality check
Use Flash for 90% of everything. It's fast enough and good enough. Pro is for when Flash gives you garbage output on complex reasoning tasks.
War story: Spent a week trying to make Flash work for multi-step logic before giving up and switching to Pro. Flash is great for summaries, simple code generation, basic Q&A. Pro actually thinks things through but costs 4x more and takes forever.
The "thinking tokens" thing is annoying - Pro models show their reasoning and charge you for it. You can't disable it. It's like paying extra to watch someone do homework.
Live API works in demos, breaks in production
Live API lets you do real-time voice chat. Demos perfectly in the office. Production is a nightmare of dropped WebSocket connections and mysterious timeouts.
Horror story: Spent 3 days debugging why Live API kept disconnecting users mid-conversation. Turns out our load balancer had a 60-second WebSocket timeout nobody knew about. The API docs mention none of this infrastructure bullshit you'll actually encounter.