CockroachDB - PostgreSQL That Scales Horizontally

Currently viewing the human version

What CockroachDB Actually Is and When You'd Use It

Look, CockroachDB is basically PostgreSQL that got stretched across multiple machines and regions. As of September 2025, the latest available version is v25.3. But here's what nobody mentions upfront: they killed the open source version on November 18, 2024. Now it's all proprietary with a "free tier" that disappears when your company hits $10M revenue and requires sending them telemetry about everything you're doing. So factor that vendor lock-in into your decisions.

How the Distributed Thing Actually Works

CockroachDB Key-Value Architecture

CockroachDB splits into five layers that handle different parts of the database:

SQL Layer: Takes your PostgreSQL queries and breaks them down into smaller pieces that can be distributed. About 80% of your existing app will probably work without changes, but expect to hit some edge cases with the PostgreSQL features they don't support yet.

Transaction Layer: Uses timestamps and distributed consensus to make sure your transactions work across multiple machines in different regions. Slower than single-node PostgreSQL (2-10ms writes vs sub-millisecond), but it actually works reliably across continents.

Distribution Layer: Automatically splits your data into ranges and tries to keep related data together. Sometimes it gets this wrong and you'll spend a weekend figuring out why your queries are slow because related data ended up on different continents.

Replication Layer: Uses Raft consensus to keep 3+ copies of everything. When nodes die (and they will), the remaining nodes vote on who's in charge. Pretty solid, though you'll get paged when nodes go down.

Storage Layer: Pebble storage engine underneath handles the actual disk I/O. It's optimized for writes, which is good because distributed systems do a lot of writing to maintain consistency.

What Actually Makes This Worth the Complexity

Strong Consistency

CockroachDB gives you serializable isolation - your transactions either happen or they don't, no weird edge cases where data appears and disappears. The ACID guarantees documentation explains how this works across regions. This is why you'd consider this over just running PostgreSQL read replicas.

Horizontal Scaling That Actually Works

Adding nodes to the cluster actually makes things faster and more reliable. Production clusters can scale to hundreds of nodes across continents. The auto-rebalancing system redistributes data when you add nodes, though "automatically" still means you'll spend time tuning and monitoring. Netflix's case study shows how they scaled to massive workloads.

Multi-Region Without the Headache

The multi-region features are where CockroachDB beats rolling your own solution:

Regional Tables: Pin tables to specific regions for compliance or latency
Global Tables: Put read-heavy reference data everywhere for fast local reads
Regional by Row: Automatically place rows based on content (like user location)

This saves you from building your own sharding logic and dealing with cross-region consistency problems. The multi-region deployment patterns guide shows configurations that work in production.

Self-Healing (Mostly)

When things break, CockroachDB usually fixes itself:

Node failures: Other nodes take over automatically
Load rebalancing: Moves data around to optimize performance
Maintenance tasks: Handles compaction and cleanup without downtime
Rolling upgrades: Can upgrade without taking the cluster down

That said, "self-healing" doesn't mean zero-ops. You'll still get alerts when nodes die and need to understand what's happening.

PostgreSQL Compatibility (Pretty Good, Not Perfect)

CockroachDB wire protocol compatibility means your existing PostgreSQL tools and drivers work. Most common features are supported:

Standard SQL (the important stuff from ANSI SQL 2016)
Common data types (JSON, arrays, UUIDs)
Indexes (B-tree, partial, expression-based)
Basic stored procedures and functions
Foreign keys and constraints
Views and materialized views

The compatibility is good enough that you can often point your app at CockroachDB and it'll work. But you'll find PostgreSQL features that aren't supported, especially the more exotic ones.

Performance Reality Check

CockroachDB is built for OLTP workloads, not analytics. Here's what you actually get:

Local reads: Sub-millisecond if data is nearby
Cross-region transactions: 50-100ms if you're lucky, can be higher
Write latency: 2-10ms because of consensus overhead
Throughput: Scales linearly as you add nodes (if you design your schema right)

It works best with normalized schemas where related data stays together. If you need heavy analytics, just use something else or stream the data out to a data warehouse.

Licensing and Pricing (The Expensive Part)

Here's the licensing situation as of November 2024:

CockroachDB Enterprise Free: "Free" for companies under $10M revenue, but they get telemetry on everything you do and you need annual renewal. They'll literally know more about your database usage than you do.

CockroachDB Enterprise: The paid version for actual companies. Pricing is "contact sales" which translates to "we'll charge what we think you can afford" based on how desperate you look during negotiations.

CockroachDB Cloud: Fully managed on AWS, Google Cloud, and Azure. Convenient but prepare for sticker shock - you're paying for their ops team plus markup on cloud resources.

The free tier disappears the moment you hit $10M revenue, so plan your business growth carefully. Once you're locked into their ecosystem, they control the pricing and there's no open source escape hatch anymore.

CockroachDB vs. The Alternatives You're Actually Considering

Category/Attribute	CockroachDB	PostgreSQL	MongoDB	Cassandra	TiDB
Architecture	Distributed SQL	Single-node SQL	Document NoSQL	Wide-column NoSQL	Distributed SQL
Consistency Model	Strong (works but slow)	Strong (single-node)	Eventual (data corruption roulette)	Eventual	Strong
ACID Transactions	Full distributed ACID	Full (single-node)	Limited multi-document	Nope	Full distributed ACID
SQL Compatibility	PostgreSQL wire protocol	Native PostgreSQL	MongoDB Query Language	CQL (not SQL)	MySQL compatible
Horizontal Scaling	Automatic (mostly)	Manual read replicas	Automatic	Automatic	Automatic
Multi-Region Support	Native geo-distribution	Manual sharding hell	Replica sets	Multi-datacenter	Manual configuration
CAP Theorem	CP (Consistency + Partition tolerance)	CA (single-node)	AP (Availability + Partition)	AP	CP
Data Model	Relational tables	Relational tables	Documents (JSON/BSON)	Column families	Relational tables
Query Language	Standard SQL	Standard SQL	MQL + aggregation pipeline	CQL	Standard SQL
Schema Flexibility	Fixed schema	Fixed schema	Schema-less (schema-chaos)	Schema-less	Fixed schema
Read Latency	Sub-ms (local), 50-200ms (cross-region)	Sub-ms	Sub-ms	Sub-ms	Sub-ms
Write Latency	2-10ms (consensus tax)	Sub-ms	Sub-ms	Sub-ms	5-15ms
Max Cluster Size	1000+ nodes (if you have the budget)	1 primary + replicas	100+ shards	1000+ nodes	100+ nodes
Concurrent Connections	10,000+ per node	100-400 (pgbouncer required)	65,000+	High	4,000+ per node
What It's Good For	Global OLTP	Everything PostgreSQL	Document storage	Time-series, logs	OLTP with analytics
Setup Complexity	Medium (distributed is hard)	Low (one database)	Medium	High (good luck)	High
Day-to-Day Maintenance	Low (mostly self-healing)	Medium	Medium	High (constant tuning)	Medium
Monitoring	Built-in web UI + metrics	External tools required	MongoDB Compass + tools	External tools required	Built-in dashboard
Backup/Recovery	Built-in distributed backup	pg_dump/pg_restore	mongodump/mongorestore	Nodetool + prayers	Built-in tools
Upgrades	Rolling, zero-downtime	Planned downtime (usually)	Rolling upgrades	Rolling upgrades	Rolling upgrades
When Things Break	Usually self-heals	You fix it	Usually self-heals	You're on your own	Mixed bag
Global Apps Needing ACID	Perfect fit	Don't try this	Nope	Nope	Maybe
Financial/Banking	Good choice	Single region only	Hell no	Hell no	Could work
E-commerce	Overkill unless global	Perfect for most cases	Fine for catalogs	Not for transactions	Overkill
Time-series/IoT	Wrong tool	Wrong tool	Okay	Built for this	Wrong tool
Content/CMS	Overkill	Perfect	Built for this	Wrong tool	Overkill
Analytics	Don't	With extensions, yes	Limited	No	Purpose-built
Getting Started	Free tier (with limits)	Free forever	Free community version	Free	Free
Production	$$$$ (contact sales)	$$ (RDS pricing)	$$ (Atlas pricing)	$$ (infrastructure only)	$$
Enterprise Support	$$$$$	$$ (many vendors)	$$$	$$	$$
Hidden Costs	Vendor lock-in risk	Operational complexity	Query complexity	Operational nightmare	Learning curve
When You're Stuck	They own you	Postgres everywhere	Many alternatives	Good luck migrating	TiDB specific

Actually Deploying CockroachDB (The Hard Parts Nobody Tells You)

So you've decided to take the plunge into distributed SQL hell. Here's what actually happens when you try to deploy CockroachDB in the real world, not the marketing demos.

Development vs. Production Reality

Local Development (Actually Works)

Development setup is surprisingly smooth, unlike most distributed systems. The Docker approach works and you can have a three-node cluster running locally:

## This actually works and is useful for testing
docker run -d --name=roach1 --hostname=roach1 --net=roachnet -p 26257:26257 \
  cockroachdb/cockroach:v25.2.0 start --insecure --join=roach1,roach2,roach3

The CockroachDB Cloud free tier is decent for development - 10 GiB storage and 250M compute units monthly. Good for testing whether your app will implode when you move to distributed.

Pro tip: Use --logtostderr=WARNING or you'll drown in logs. Found this out after spending 2 hours wondering why my disk was full.

Production Deployment (Where Dreams Go to Die)

Production is where you learn that distributed systems are hard:

Node Placement: You need at least 3 nodes across different failure domains. Cross-region setups need careful latency planning - 100ms+ between regions will hurt performance. Don't put nodes too far apart unless you enjoy 30-second transaction timeouts. The cluster topology guide explains placement strategies.

Hardware Sizing: Don't cheap out. Recommended specs are 16+ GB RAM and SSD storage. CockroachDB needs more resources than PostgreSQL due to consensus overhead. Expect 2-3x the hardware costs, learned this when our "adequate" cluster died under production load. Check the capacity planning guide and performance benchmarks for realistic sizing.

Network: Network partitions will test your system design. Cross-region deployments need dedicated connections if you want consistent performance. We tried saving money with regular internet connections and spent 3 months debugging split-brain scenarios and connection refused errors. The network requirements documentation covers bandwidth and latency requirements.

Common Error You'll See: restart transaction: TransactionRetryWithProtoRefreshError happens more often than you'd expect. Your app needs retry logic or users will see random failures. This isn't PostgreSQL where transactions rarely conflict.

Schema Design (Get This Wrong and Suffer)

Don't Design Like PostgreSQL

CockroachDB punishes bad schema design more than single-node databases:

Primary Keys: Auto-incrementing IDs create hotspots where all writes hit the same node. Use compound keys with a distributed first column (user_id, tenant_id). This can take months to learn through experience. Hash-sharded indexes help but check your version - some earlier versions had issues with them. The primary key design guide covers anti-patterns to avoid.

Table Locality: Use regional tables for region-specific data and global tables for reference data. Get this wrong and your cross-region queries will be painfully slow. The multi-region table patterns documentation shows common configurations.

Indexes: Covering indexes are crucial to avoid extra lookups across nodes. More important than in PostgreSQL because network calls are expensive. Read about index best practices before going to production.

Multi-Tenancy (One Thing CockroachDB Does Well)

Multi-tenancy is where CockroachDB shines compared to PostgreSQL:

Row-level Tenancy: Put tenant_id in your primary key and use row-level security. Works better than trying to shard PostgreSQL manually. Just don't forget the tenant_id in your queries or you'll scan the entire cluster.

Regional by Row: REGIONAL BY ROW automatically puts tenant data in their home region. Brilliant for global SaaS apps, though setting it up correctly takes some trial and error.

Performance Optimization (Required Reading)

Query Optimization (Or How to Not Hate Your Database)

Avoid Cross-Region Joins: These will destroy your performance. Denormalize data if needed to keep related stuff together. Foreign keys across regions are performance killers.

Batch Operations: Single-row operations are slow. Batch your inserts/updates or watch your app crawl. CockroachDB handles large batches well, unlike some distributed databases.

Isolation Levels: Serializable isolation is the default and slower. Use read committed if you don't need the strongest guarantees. Your app will be faster and you'll sleep better.

Monitoring (Set This Up First)

CockroachDB DB Console Overview

CockroachDB Architecture Overview

The built-in web UI is actually good and the Prometheus integration works:

Key Metrics: Watch SQL query latency, replica lag, and resource utilization. More metrics than you'll ever need, but focus on the ones that indicate your app is dying.

Query Performance: EXPLAIN ANALYZE works differently than PostgreSQL but shows you where distributed queries are spending time. Essential for debugging slow queries.

Migration Reality Check

From PostgreSQL (Your Most Likely Path)

Compatibility: About 80% of PostgreSQL works. Check the compatibility matrix for the features you actually use. Budget time for rewrites.

Schema Changes: You'll need to redesign your primary keys and think about data locality. This isn't optional if you want good performance.

App Changes: Add retry logic for transaction conflicts. Distributed systems have more contention than single-node databases. Your app will need to handle this gracefully or users will complain.

Migration gotcha: pg_dump output won't work directly. CockroachDB's IMPORT statement is picky about CSV formats and will fail silently on edge cases. I've seen imports missing rows because of embedded newlines in text fields.

From NoSQL (Only If You're Desperate)

Data Modeling: You'll need to transform your documents into normalized tables. CockroachDB has JSON support but you lose the flexibility that made you choose NoSQL originally.

Query Translation: NoSQL query patterns don't translate well to SQL. Plan for significant rewrites if you used complex aggregation pipelines or document-specific features.

Consistency: The one good reason to migrate - you get real ACID transactions instead of eventual consistency disasters. But it's a major project, not a weekend migration.

When You Actually Need This Thing

The Reality Check

Backups: Cross-region backups actually work well, which is more than you can say for most distributed databases.

Security: Encryption, auth, and audit logging don't suck. They actually spent time getting this right instead of shipping it broken.

Operations: The self-healing stuff works most of the time. When it doesn't, you'll need someone who understands distributed systems to figure out what's broken.

Look, CockroachDB makes sense if you actually need global distribution with strong consistency. Most apps don't - PostgreSQL with read replicas is simpler and cheaper. But if you're building a global financial platform or multi-region gaming backend where eventually consistent data would fuck up your business, CockroachDB beats building your own distributed system from scratch. Just know what you're signing up for in terms of complexity and costs.

Real Questions About CockroachDB

Should I actually use CockroachDB instead of PostgreSQL?

Probably not. If you're asking this question, Postgre

SQL is likely fine for your use case. CockroachDB makes sense when you need global distribution with strong consistency guarantees

think multi-region financial apps or global gaming platforms. For most CRUD apps, even large ones, PostgreSQL with read replicas is simpler and cheaper.

How much slower is CockroachDB compared to regular PostgreSQL?

Expect 2-10ms write latency vs sub-millisecond for PostgreSQL due to consensus overhead. Cross-region transactions can take 50-200ms depending on your setup. It's not dramatically slower for most workloads, but it's definitely not faster. The trade-off is worth it only if you need the distributed features.

Will my PostgreSQL app work with CockroachDB?

About 80% of it will work without changes. The wire protocol compatibility is good, but you'll hit PostgreSQL features that CockroachDB doesn't support. Plan to spend time finding and fixing these incompatibilities. Also, you'll need to add retry logic because distributed systems have more transaction conflicts than single-node databases.

What happens when nodes fail?

Usually, nothing bad. CockroachDB keeps 3+ replicas of everything, so losing one or two nodes is fine. The remaining nodes vote on a new leader and keep going. You'll get alerts about the dead nodes, and you should replace them eventually, but the database stays up. It's actually pretty solid in this regard. But when it goes wrong: I've seen a 3-node cluster lose 2 nodes during a data center power failure. Got unavailable: majority of replicas are down errors until the nodes came back online. The surviving node couldn't serve reads or writes. Always run at least 5 nodes in production across multiple zones.

How much does this actually cost?

That depends on how much they think you can pay. The "contact sales" pricing model means they'll negotiate based on your size and desperation. Expect it to be more expensive than managed PostgreSQL. The free tier works for smaller companies (under $10M revenue) but comes with telemetry requirements.

Wait, didn't CockroachDB used to be open source?

Yeah, they killed the open source version in August 2024. Now it's all proprietary with a "Cockroach

DB Software License." The free tier is for companies under $10M revenue and requires telemetry. Once you grow beyond that, you're paying their enterprise prices. Classic vendor lock-in move.

How do I design schemas that don't suck in CockroachDB?

Avoid auto-incrementing primary keys

they create hotspots where all writes hit the same node.

Use compound keys with a distributed first column (like user_id). Keep related data together using table locality settings. And for the love of all that is holy, avoid cross-region JOINs if you care about performance. Real example: I've seen a users table with auto-incrementing ID cause most writes to hit one node. Switching to (tenant_id, user_id) as primary key distributed load better and reduced write latency significantly.

Should I migrate from MongoDB to CockroachDB?

Only if you need real ACID transactions and are tired of MongoDB's eventual consistency problems. CockroachDB has decent JSON support, but you'll lose MongoDB's aggregation pipeline and need to rewrite complex queries. It's a significant migration, not a drop-in replacement.

What's the monitoring situation like?

The built-in web UI is actually pretty good for cluster health and query performance. It exports Prometheus metrics, so you can plug it into your existing Grafana setup. You'll get alerts when nodes die or performance tanks. The query execution stats are detailed enough to figure out what's eating your performance budget.

How complicated is multi-region setup?

It's surprisingly not terrible. You can pin tables to regions, replicate reference data globally, or let CockroachDB automatically place rows based on content. The hard part is designing your schema so related data stays in the same region

cross-region JOINs will kill your performance.

What about backups and disaster recovery?

Built-in distributed backups work well and can be scheduled automatically. You can do point-in-time recovery and restore individual databases. For real-time replication, changefeeds stream data to external systems. It's better than rolling your own distributed backup system.

How painful is migration from PostgreSQL?

Migration pain depends on how exotic your Postgre

SQL features are. Basic stuff works fine, but you'll hit unsupported features and need to rewrite those parts. Plan for schema changes to avoid hotspots. Budget extra time for testing

distributed systems have different failure modes than single-node databases.

Is this good for analytics?

No. Cockroach

DB is built for OLTP, not analytics. For heavy analytical workloads, use it as the transactional system and stream data to a dedicated analytics platform. Don't try to make CockroachDB your data warehouse

you'll hate life.

Why does my node keep crashing with out-of-memory errors?

CockroachDB is memory-hungry and doesn't handle OOM conditions gracefully. Common causes:

Too many client connections: Default connection limits can exhaust memory. Use connection pooling with pgbouncer.
Large transactions: Importing 10GB in a single transaction will kill your node. Batch your operations.
Cache misconfiguration: The default cache settings assume you have tons of RAM. Tune --cache and --max-sql-memory flags.

Real example: Got signal: killed in logs with no other error. Turned out the Linux OOM killer was murdering our nodes because we didn't tune memory settings for our 8GB instances.

Quick Navigation

How the Distributed Thing Actually Works

What Actually Makes This Worth the Complexity

Strong Consistency

Horizontal Scaling That Actually Works

Multi-Region Without the Headache

Self-Healing (Mostly)

PostgreSQL Compatibility (Pretty Good, Not Perfect)

Performance Reality Check

Licensing and Pricing (The Expensive Part)

Development vs. Production Reality

Local Development (Actually Works)

Production Deployment (Where Dreams Go to Die)

Schema Design (Get This Wrong and Suffer)

Don't Design Like PostgreSQL

Multi-Tenancy (One Thing CockroachDB Does Well)

Performance Optimization (Required Reading)

Query Optimization (Or How to Not Hate Your Database)

Monitoring (Set This Up First)

Migration Reality Check

From PostgreSQL (Your Most Likely Path)

From NoSQL (Only If You're Desperate)

When You Actually Need This Thing

The Reality Check

Should I actually use CockroachDB instead of PostgreSQL?

How much slower is CockroachDB compared to regular PostgreSQL?

Will my PostgreSQL app work with CockroachDB?

What happens when nodes fail?

How much does this actually cost?

Wait, didn't CockroachDB used to be open source?

How do I design schemas that don't suck in CockroachDB?

Should I migrate from MongoDB to CockroachDB?

What's the monitoring situation like?

How complicated is multi-region setup?

What about backups and disaster recovery?

How painful is migration from PostgreSQL?

Is this good for analytics?

Why does my node keep crashing with out-of-memory errors?

Related Tools & Recommendations

Claude vs GPT-4 vs Gemini vs DeepSeek - Which AI Won't Bankrupt You?

v0 by Vercel - Code Generator That Sometimes Works

How to Run LLMs on Your Own Hardware Without Sending Everything to OpenAI

Framer Hits $2B Valuation: No-Code Website Builder Raises $100M - August 29, 2025

Migrate JavaScript to TypeScript Without Losing Your Mind

jQuery - The Library That Won't Die

OpenAI Browser Implementation Challenges

Cursor Enterprise Security Assessment - What CTOs Actually Need to Know

Istio - Service Mesh That'll Make You Question Your Life Choices

What Enterprise Platform Pricing Actually Looks Like When the Sales Gloves Come Off

MariaDB - What MySQL Should Have Been

Docker Desktop Got Expensive - Here's What Actually Works

Protocol Buffers - Google's Binary Format That Actually Works

Tesla FSD Still Can't Handle Edge Cases (Like Train Crossings)

Datadog - Expensive Monitoring That Actually Works

Stop Writing Selenium Scripts That Break Every Week - Claude Can Click Stuff for You

Hugging Face Transformers - The ML Library That Actually Works

Base - The Layer 2 That Actually Works

Confluence Enterprise Automation - Stop Doing The Same Shit Manually

Serverless Container Pricing Reality Check - What This Shit Actually Costs