Azure Cosmos DB - Getting Started Guide

Currently viewing the human version

Which API Should You Actually Use? (Spoiler: NoSQL)

Azure Cosmos DB Architecture Overview

Microsoft gives you five APIs, but they're not all created equal.

I've deployed every single one in production and lived through the pain. Here's what nobody tells you:

The APIs Ranked by How Much They'll Ruin Your Week

NoSQL API (Core SQL):

Just fucking use this one. It's Microsoft's favorite child and gets new features first. If you know SQL, the learning curve isn't terrible. Gets stored procedures, triggers, and somehow burns through RUs more efficiently than the others. When that weird connection pooling bug hit last month (I think it was 18.2.1?), NoSQL API got patched in like 2 days while MongoDB API users waited weeks.

MongoDB API: Good for migrations when you can't afford to rewrite everything.

Existing Mongo

DB drivers work, which is nice. But here's the fun part

it's not actually MongoDB. GridFS doesn't work, some aggregation pipeline operations behave differently, and good luck debugging why your compound indexes aren't being used properly. I spent three days figuring out why our session store was consuming 10x more RUs than expected.

Table API: The boring one that actually works reliably.

Key-value operations only, but they're fast and cheap. Perfect for user sessions, feature flags, or anything that doesn't need complex queries. I've never had a production incident with Table API because there's not much to break.

Cassandra API: Time-series data and nothing else.

CQL works until you try using secondary indexes

then you discover half the features are missing or behave weirdly. One team I worked with spent two weeks debugging why their WHERE clauses weren't working, turns out Cosmos DB's Cassandra doesn't support filtering on non-primary key columns the same way.

Gremlin API: Graph databases for when you hate yourself and everyone around you.

Query syntax looks like someone threw Cypher and SQL into a blender. Performance? Good fucking luck

I've seen a simple "find friends of friends" query somehow eat 2,000 RUs while an identical traversal cost 50 RUs. Still have no idea why. Only use this if you absolutely need graph operations and enjoy explaining to your manager why the database budget tripled.

The Reality Behind "Multi-API Magic"

Cosmos DB stores everything in their proprietary ARS format and translates to whatever API you're using. Clever engineering, but here's what bites you:

**No

SQL gets the best performance**

other APIs have translation overhead
Some features are NoSQL-only
stored procedures, triggers, patch operations
MongoDB compatibility isn't 100%
aggregation pipelines can consume 2x the RUs
Mixing APIs is asking for trouble
don't even think about it

When Each API Makes Sense (Real Talk)

Use NoSQL API:

You're starting fresh
You want new features first (vector search, full-text search)
You need stored procedures or server-side logic
You want the most efficient RU consumption

Use MongoDB API:

You're migrating existing MongoDB code
Your team refuses to learn new syntax
You have complex aggregation pipelines that work
You're stuck with existing MongoDB tooling

Use Table API:

Simple key-value lookups only
You're migrating from Azure Table Storage
You want predictable, cheap operations
Complex queries aren't needed

Use Cassandra API:

Time-series or IoT data at massive scale
You're already using Cassandra and it works
You need wide-column data modeling
You understand CQL limitations in Cosmos DB

Use Gremlin API:

You absolutely need graph traversals
Building recommendation engines
Fraud detection with relationship analysis
You enjoy debugging nightmare query performance

The Brutal Truth About RU Consumption

Every API uses Request Units, but the costs vary wildly:

Point reads: 1 RU per 1KB (only thing that's consistent)
Writes: 5-8 RUs per 1KB depending on API overhead
Queries:

SQL is cheapest, MongoDB costs 20-30% more, Gremlin will bankrupt you

Cross-partition queries: All APIs get destroyed equally

I watched one team's Mongo

DB aggregation pipeline absolutely destroy their budget

800 RUs for a query that would cost maybe 20 RUs in NoSQL.

Turns out they had these pointless $unwind operations that Cosmos DB just couldn't figure out how to optimize. Took them three weeks to unfuck it, but their monthly bill dropped from like $4,000 to $1,200.

Real Decision Framework

Use NoSQL API unless you have a damn good reason not to. It gets new features first, best tooling, and burns the fewest RUs.

Only use other APIs if:

You're migrating existing code and can't afford a full rewrite
Your team will quit if they have to learn new syntax
You need specific features (graph traversals, Cassandra wide columns)

Don't get clever with multiple APIs. Pick one, stick with it, and resist the urge to use them all just because Microsoft lets you.

Honest API Comparison

What Actually Matters	NoSQL (Core SQL)	MongoDB	Cassandra	Gremlin (Graph)	Table
When to Use	New projects, complex queries	MongoDB migrations	Time-series/IoT data	Recommendation engines	Simple lookups
Data Model	JSON documents	BSON documents	Wide columns	Vertices and edges	Key-value pairs
Query Language	SQL-like (familiar)	MongoDB queries	CQL (limited)	Gremlin (brain melting)	OData (basic)
Complex Queries	Yes, actually good	Yes, but costs more RUs	Limited, don't expect miracles	Yes, if you enjoy suffering	No, don't even try
RU Efficiency	Best you'll get	20-30% higher than NoSQL	Moderate	Depends on graph complexity	Efficient for simple ops
Real Production Pain	Partition key design hell	MongoDB compatibility gotchas	Secondary index limitations	Query optimization nightmare	None, it's boring
Learning Curve	2-3 weeks for basics	Easy if you know MongoDB	1-2 months for CQL	3-6 months to not hate it	1 day
When It Breaks	Shitty partition keys	Weird MongoDB quirks	Index fuckups	Graph traversal explosions	Almost never
Documentation Quality	Best available	Pretty good	Adequate	Confusing as hell	Simple enough
Community Support	Large and active	MongoDB community helps	Smaller but helpful	Niche but passionate	Limited
Hidden Gotchas	RU consumption surprises	Not 100% MongoDB compatible	No complex JOINs	Query performance unpredictable	None really

Setting Up Cosmos DB: What Actually Goes Wrong

You've picked your API (hopefully NoSQL) and now you need to set it up. Here's the setup guide with all the ways you'll fuck this up and how to fix it.

The decisions you make in the next 30 minutes determine whether you spend $500/month or $5,000/month on the same workload. No pressure.

Creating Your Account (First Way You'll Mess Up)

The Azure Portal Trap
Don't click through the portal defaults like an idiot. Teams blow their entire budgets because they trust Microsoft's "helpful" default settings. Use the quickstart but ignore literally every default value they suggest.

Critical decisions that'll destroy your budget:

API Selection: Pick wrong, rebuild everything later
Location: Choose based on where users are, not where you work
Capacity Mode: Provisioned vs Serverless - wrong choice means overpaying or throttling
Backup Policy: Continuous backup costs extra but saves your job

Provisioned vs Serverless Reality:

Provisioned: Pay even when nobody uses your app, cheaper with consistent traffic
Serverless: Costs 2x per operation, no minimum charge - fine for dev, death for production

A team I worked with picked Serverless for production because "it scales automatically." Their first traffic spike cost $3,000 in a single day. Serverless charges per RU consumed, not provisioned capacity.

Partition Keys: Where Dreams Go to Die

Cosmos DB Entity Relationships

This is where most people destroy their Cosmos DB setup. Get the partition key wrong and you're rebuilding everything from scratch.

Partition key rules that'll save your job:

Pick something with thousands of unique values (not 5)
Make sure queries don't scan every partition
You can NEVER change it after creating the container
Avoid hot spots like timestamps or status fields

Real examples of how teams destroyed their apps:

E-commerce Epic Fail:

// What they did (complete disaster):
"partitionKey": "/orderStatus"
// Result: 90% of orders in "pending" partition, constant 429 errors

// What actually works:
"partitionKey": "/customerId"
// Result: Even distribution, fast queries

IoT Catastrophe:

// The obvious stupid choice:
"partitionKey": "/timestamp"
// Result: All current data destroys one partition

// Less obvious but equally stupid:
"partitionKey": "/deviceType"
// Result: Tesla Model 3 devices overwhelm everything

// What works:
"partitionKey": "/deviceId"
// Result: Even distribution

Multi-tenant Disaster:

// Lazy choice that killed performance:
"partitionKey": "/tenantPlan"
// Result: All enterprise customers in one partition

// The fix:
"partitionKey": "/tenantId"
// Result: Perfect isolation

I watched one team spend 6 weeks rebuilding their entire fucking data model because some genius decided to partition on orderStatus. Guess what? 95% of orders are "pending" so that one partition was getting absolutely hammered. Queries took 30+ seconds and users were ready to throw their laptops out the window.

Consistency Levels: Choose Your Own Adventure

Azure Cosmos DB Consistency Levels

Cosmos DB has five consistency levels. Pick wrong and you'll either blow your budget or ship data bugs.

Session Consistency (Default): Just use this

Users see their own writes immediately
Might not see other users' writes right away
Works for 90% of applications
Costs 1x RUs (baseline)
Gotcha: Multi-device users see inconsistencies

Strong Consistency: When money's involved

Everyone sees identical data always
Banking, payments, inventory where wrong data = lawsuit
Costs 2x RUs (ouch)
Single write region only (kills global performance)

Eventual Consistency: When you don't care

Data becomes consistent eventually, maybe in seconds
Perfect for analytics, logging, "close enough" scenarios
Cheapest at 1x RUs
Users see stale data for random amounts of time

Bounded Staleness: The compromise nobody wants

Data might be X seconds old or Y versions behind
Collaborative editing, leaderboards
Costs 2x RUs without strong guarantees
Most teams should just use Session instead

Consistent Prefix: The weird one

See writes in order but might miss recent ones
Social feeds, activity streams
Costs 1x RUs
Rarely needed in practice

95% of teams should use Session consistency. Don't overthink it unless you're handling money or have bizarre requirements.

The Index Trap That'll Kill Your Performance

Cosmos DB indexes everything by default. Every property in your JSON gets indexed, which sounds great until you realize you're paying to index 50MB blob fields you never search.

How to not destroy your RU budget:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/userId/?" // Only index what you query
    },
    {
      "path": "/createdDate/?" // Timestamp for sorting
    }
  ],
  "excludedPaths": [
    {
      "path": "/largeDescription/*", // Don't index big text
      "path": "/binaryData/*", // Never index blobs
      {
      "path": "/*" // Exclude everything else
    }
  ]
}

Indexing strategies:

Default: Index everything (expensive writes, fast queries)
Smart: Only index queried fields (cheap writes, fast targeted queries)
Dumb: No indexes (cheap writes, everything else is slow)

A team I worked with indexed their document store including 5MB PDF contents. Every write consumed 100+ RUs because Cosmos DB was indexing binary data. Took them three months and $20k before they figured out why their bill was insane.

RU Capacity: How to Not Go Broke

Microsoft's capacity calculator will lie to you. It gives optimistic estimates that never match production reality. Whatever number it spits out, multiply by 2x for a realistic starting point.

Request Units are how Cosmos DB charges you. Estimate wrong and you'll either get throttled or pay 10x more than expected.

RU math that matters:

Read 1KB: 1 RU
Write 1KB: 5-8 RUs
Delete: 5-8 RUs
Query scanning 100 docs: 50-200 RUs (wildly depends on complexity)

Provisioned capacity reality:

Too low: 429 throttling errors during traffic spikes
Too high: Pay for capacity you never use
Autoscale: Scales 10%-100% of max, costs 1.5x at minimum

Start with manual provisioning at 25% of what you think you need. Monitor for a week. Increase based on actual usage, not Microsoft's fairy tale projections.

Global Distribution: Great Feature, Expensive Reality

Global distribution sounds awesome until you see the bill. Here's what matters for multi-region deployments:

Start simple:

One write region where most users are
Add read regions for global users
Don't enable multi-region writes unless you absolutely need it

Multi-region cost breakdown:

Single region: $1,000/month
Multi-region reads: $1,500/month (+50%)
Multi-region writes: $3,000/month (+200%)

Code that works:

var options = new CosmosClientOptions()
{
    ApplicationName = "MyApp",
    ApplicationRegion = Regions.WestUS2, // Primary region
    ConnectionMode = ConnectionMode.Direct // Always Direct
};

The Emulator: Your New Best Friend

Use the Cosmos DB emulator for development or spend $200/month testing things.

## Docker setup that works
docker run -p 8081:8081 -m 3g --name cosmos-emulator \
  microsoft/azure-cosmosdb-emulator

Emulator gotchas:

Self-signed SSL certificates (ignore in dev)
Performance doesn't match production
New features lag behind
Breaks randomly on Windows updates

Monitoring That Matters

Your RU consumption graph becomes your obsession. Normalized RU utilization shows which partition keys eat your budget. One partition consistently above 80% = bottleneck.

Set up alerts or get bill surprises:

{
  "alertName": "RU Consumption > 80%",
  "threshold": 80,
  "action": "Wake someone up"
}

Essential metrics:

RU consumption: > 80% = add capacity now
429 errors: Throttling = users getting timeouts
P99 latency: > 100ms = something's broken
Monthly cost: Track weekly, not monthly

Production checklist:

Partition key tested with realistic data
RU consumption measured under load
429 error handling with retry patterns
Backup policy configured
Cost alerts set up
Team knows how to debug at 3 AM

That's it. Everything else is optimization for later. Get these basics right and you won't hate Cosmos DB.

FAQ: The Questions You Actually Ask (When Things Break)

Why is my Cosmos DB bill $8,000 this month when I expected $500?

Welcome to the fucking club. Here's probably what happened:

You enabled multi-region writes without reading the fine print (+200% cost)
Your queries scan entire collections instead of using partition keys
You're indexing binary data (images, PDFs) that you never search
Autoscale is stuck at maximum because of shitty partition key design
Cross-partition queries during traffic spikes

Fix it now:

Check Azure Cost Management to see what's eating your budget
Look at RU consumption metrics - normalized RU > 80% = problem
Turn off multi-region writes unless you actually need them
Review indexing policy and exclude large fields

Real production costs:

Small app (10K users): $300-1,200/month (not the $200-800 Microsoft claims)
Medium app (100K users): $1,500-5,000/month
Large app (1M+ users): $5,000-25,000/month

Can I use multiple APIs on the same data? I heard it's possible.

No. Don't even think about it.

Technically possible doesn't mean good idea. Each API expects different data shapes and has different performance. You'll get:

Data corruption from schema mismatches
Performance issues from API translation overhead
Debugging nightmares when something breaks
Angry team members who have to maintain your mess

One API per container. Period.

My queries are taking 30+ seconds. What the hell is wrong?

90% of the time it's one of these:

Terrible partition key design - you're scanning every partition for every query
Missing indexes - you excluded too much from indexing policy
Cross-partition queries - WHERE clause doesn't include partition key
Hot partitions - all data in one partition getting throttled

Debug steps that work:

-- Check partition distribution (NoSQL API)
SELECT c.partitionKey, COUNT(1) as count
FROM c
GROUP BY c.partitionKey
ORDER BY count DESC

If one partition has like 10x more docs than others, you're pretty much screwed. Time to rebuild with a better partition key or find a new job.

Quick fixes:

Add partition key to WHERE clause
Check if you're doing SELECT * and returning huge documents
Look at query metrics - RU consumption > 100 for simple queries = problem
Increase RUs temporarily to see if it's just throttling

I'm getting 429 errors constantly. How do I fix this?

429 = "Too Many Requests" = you're being throttled

Immediate fixes:

Increase provisioned RUs (costs money but stops the bleeding)
Enable autoscale if you haven't
Implement retry logic in your app (SDKs do this automatically)
Check for hot partitions in Azure Monitor

Long-term fixes:

Redesign partition key if one partition is getting hammered
Optimize queries to consume fewer RUs
Spread traffic across multiple partition keys
Use bulk operations for multiple document operations

Reality check: If you're getting 429s during normal operation, your partition key probably sucks ass.

Should I use Provisioned or Serverless? I'm confused as hell.

Provisioned Throughput:

Pay for RUs whether you use them or not
Cheaper if you have consistent traffic
Required for multi-region deployments
Can handle traffic spikes if you provision enough

Serverless:

Pay only for RUs you consume
2x more expensive per RU than Provisioned
Single region only
Gets expensive fast under sustained load

Use Serverless for dev/test environments. Use Provisioned for production unless your app gets less than 1000 requests/day.

How much do operations actually cost in RUs?

What You're Doing	Document Size	RU Cost	Reality Check
Read by ID	1KB	1 RU	Only thing that's consistent
Write new doc	1KB	5-6 RUs	Higher with lots of indexes
Update existing	1KB	6-8 RUs	Depends on what changed
Simple query	10 results	5-15 RUs	Add partition key or pay more
Cross-partition scan	100 results	100-500 RUs	Expensive as hell
Complex aggregation	1000 docs	200-1000 RUs	Can bankrupt you during spikes

How to not waste RUs:

Always include partition key in WHERE clauses
Use bulk operations for multiple writes
Don't index fields you never query
Use point reads (by ID) whenever possible

My app randomly throws errors. What's happening?

Common Cosmos DB errors:

429 - Too Many Requests: You're being throttled

Fix: Increase RUs or fix partition key design

404 - Not Found: Document or container doesn't exist

Fix: Check database/container names and document IDs

400 - Bad Request: Malformed query or document

Fix: Check JSON structure and query syntax

503 - Service Unavailable: Cosmos DB is having issues

Fix: Implement retry logic and wait it out

RequestRateTooLarge: Same as 429, different name

Fix: Same as 429 - more RUs or better partition keys

I need to migrate from MongoDB/SQL. How screwed am I?

From MongoDB: Not terrible

Use Azure Database Migration Service for the data
Most MongoDB code works with minimal changes
Gotcha: Some MongoDB features aren't 100% compatible

From SQL Server: Pretty painful

You'll need to denormalize your relational data
Export to JSON and import, or use Azure Data Factory
Reality check: Plan for weeks of refactoring, not days

Migration checklist:

Test new partition key with realistic data volume
Measure RU consumption with actual query patterns
Plan for downtime during cutover (migration tools lie about zero downtime)
Have rollback plans ready

What consistency level should I actually use?

Session Consistency: Use this for 95% of applications

Users see their own writes immediately
Don't see other users' writes immediately (usually fine)
Best performance/consistency balance

Strong Consistency: Only for financial transactions

Everyone sees the same data always
Costs 2x RUs and limits you to single-region writes
Required for payments, banking, inventory

Everything else: You probably don't need them

Eventual: For analytics and logging where "close enough" works
Bounded Staleness: Rarely needed in practice
Consistent Prefix: Even more rarely needed

Can I integrate with other Azure services?

Yes, and it's actually pretty good:

Azure Functions: Change feed triggers work great for real-time processing

Azure Search: Cosmos DB indexer gives you full-text search

Power BI: Direct Query works but can be slow with large datasets

Synapse Analytics: Synapse Link for analytics without killing production performance

Stream Analytics: Direct output to Cosmos DB for real-time data ingestion

Example that works:

[CosmosDBTrigger(
    databaseName: "MyDB",
    collectionName: "Users",
    ConnectionStringSetting = "CosmosDBConnection")]
public static void ProcessUserChanges(IReadOnlyList<Document> docs)
{
    // Runs whenever documents change
    // Great for cache invalidation, notifications, etc.
}

Resources: The Good, The Bad, and The Useless

44%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation

The APIs Ranked by How Much They'll Ruin Your Week

The Reality Behind "Multi-API Magic"

When Each API Makes Sense (Real Talk)

The Brutal Truth About RU Consumption

Real Decision Framework

Creating Your Account (First Way You'll Mess Up)

Partition Keys: Where Dreams Go to Die

Consistency Levels: Choose Your Own Adventure

The Index Trap That'll Kill Your Performance

RU Capacity: How to Not Go Broke

Global Distribution: Great Feature, Expensive Reality

The Emulator: Your New Best Friend

Monitoring That Matters

Why is my Cosmos DB bill $8,000 this month when I expected $500?

Can I use multiple APIs on the same data? I heard it's possible.

My queries are taking 30+ seconds. What the hell is wrong?

I'm getting 429 errors constantly. How do I fix this?

Should I use Provisioned or Serverless? I'm confused as hell.

How much do operations actually cost in RUs?

My app randomly throws errors. What's happening?

I need to migrate from MongoDB/SQL. How screwed am I?

What consistency level should I actually use?

Can I integrate with other Azure services?

Related Tools & Recommendations

Amazon DocumentDB - MongoDB's Evil Twin

Azure - Microsoft's Cloud Platform (The Good, Bad, and Expensive)

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

Azure AI Foundry Production Reality Check

Amazon DynamoDB - AWS NoSQL Database That Actually Scales

MongoDB 스키마 설계 - 삽질 안 하는 법

MongoDB Alternatives: Choose the Right Database for Your Specific Use Case

Google Cloud Firestore - NoSQL That Won't Ruin Your Weekend

Kafka Will Fuck Your Budget - Here's the Real Cost

Apache Kafka - The Distributed Log That LinkedIn Built (And You Probably Don't Need)

MongoDB vs DynamoDB vs Cosmos DB - The Database Choice That'll Make or Break Your Project

How to Fix Your Slow-as-Hell Cassandra Cluster

Cassandra Vector Search - Build RAG Apps Without the Vector Database Bullshit

Hardening Cassandra Security - Because Default Configs Get You Fired

Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You

Redis Insight - The Only Redis GUI That Won't Make You Rage Quit

Redis Alternatives for High-Performance Applications

Kafka + Spark + Elasticsearch: Don't Let This Pipeline Ruin Your Life

Your Elasticsearch Cluster Went Red and Production is Down

EFK Stack Integration - Stop Your Logs From Disappearing Into the Void