My queries were fast yesterday, now they're dog shit slow. What happened?

Someone fucked up your indexes or your data grew past the tipping point. Run `db.collection.explain("executionStats")` and look for `COLLSCAN` - if you see that, you're scanning the entire collection like an idiot. Turn on the profiler: `db.setProfilingLevel(1, { slowms: 100 })` and find what broke. Usually it's: - New query patterns without indexes - A developer who dropped an index "by mistake" - Collections that grew from 1K to 10M documents - Someone upgraded MongoDB without testing Check if anyone deployed code recently. I've seen junior devs push queries that scan millions and millions of documents because "it worked fine in development."

How do I know if MongoDB is actually using my fucking indexes?

Look for `IXSCAN` in explain plans. If you see `COLLSCAN`, your indexes are useless. ```javascript db.users.find({ status: "active" }).explain("executionStats") ``` **Red flags in explain output:** - `totalDocsExamined` >> `totalDocsReturned` = collection scan - `executionTimeMillis` > 100ms = probably broken - `stage: "COLLSCAN"` = delete this query and start over Atlas Performance Advisor will tell you which indexes are never used. I found tons of unused indexes once - dropping them roughly doubled write performance.

Should I upgrade to MongoDB 8.0 or is it another clusterfuck?

MongoDB 8.0 claims to be [much faster for reads](https://www.mongodb.com/blog/post/mongodb-8-0-raising-the-bar) but upgrading is still gambling with your sanity. **Version reality check:** - **MongoDB 7.0:** Has a [concurrency bug](https://jira.mongodb.org/browse/SERVER-94735) that kills performance. Apply the hotfix or suffer. - **MongoDB 8.0:** Fixes the 7.0 problems but introduces new ones. Test in staging for 2 weeks minimum. - **MongoDB 6.0:** Still works fine. Don't fix what isn't broken. Skip 7.0 entirely if you can. Going from 6.0 → 8.0 has fewer landmines than 6.0 → 7.0 → 8.0.

My aggregation pipelines timeout and crash. Why?

MongoDB kills aggregations that use over 100MB of RAM. Your pipeline is probably doing stupid shit like: - `$lookup` joins on unindexed fields - `$sort` without supporting indexes - Processing millions of documents before filtering **Fix it:** ```javascript // Move $match to the fucking beginning db.orders.aggregate([ { $match: { date: { $gte: lastMonth } } }, // Filter first { $lookup: { ... } }, // Then join { $group: { ... } } // Then aggregate ]) ``` Use `allowDiskUse: true` for large operations, but expect them to be slow as hell.

How many indexes is too many indexes?

Each index slows down writes by 10-15%. I've seen collections with way too many indexes where writes took forever. **Rough limits:** - **0-5 indexes:** Usually fine - **6-15 indexes:** Monitor write performance - **15+ indexes:** You're probably doing something wrong - **Text indexes:** Avoid unless you hate write performance Create indexes based on actual query patterns, not hypothetical bullshit. One compound index is better than five single-field indexes.

MongoDB is eating all my RAM and I'm getting OOM killed

WiredTiger uses 50% of RAM by default. This is normal and good - unused RAM is wasted RAM. **When to worry:** - Server is swapping (bad) - Getting OOM kills (very bad) - Cache pressure warnings in logs (also bad) Check cache stats: `db.serverStatus().wiredTiger.cache` If your working set doesn't fit in cache, you need more RAM or less data. There's no magic optimization that fixes insufficient memory.

Why are writes getting slower as my collection grows?

Index maintenance cost scales with collection size. Every insert/update has to maintain all your indexes. **Collection size reality:** - **Under a million docs:** Writes are fast - **1M - 10M documents:** Noticeable slowdown - **10M - 100M documents:** Indexes start hurting - **100M+ documents:** Consider sharding or archiving Use TTL indexes to delete old data automatically. I've seen massive collections with years of logs that should have been deleted.

How do I optimize for both reads AND writes without everything breaking?

You don't. It's always a tradeoff. **Reads want:** Tons of indexes for every query pattern **Writes want:** Zero indexes for maximum speed **Real compromise:** - Create compound indexes that serve multiple queries - Use partial indexes for filtered datasets - Archive old data that's rarely queried - Consider read replicas for analytics Don't try to optimize every possible query. Focus on the top 80% that matter.

My replica set secondary is lagging behind and I'm panicking

Replication lag means your secondary can't keep up with primary writes. Usually caused by: - Underpowered secondary hardware - Network issues between nodes - Massive write spikes - Different hardware specs between primary/secondary **Quick fixes:** - Check if primary and secondary have identical hardware (they should) - Monitor oplog size: `db.getReplicationInfo()` - Look for network latency between data centers - Consider adding more secondaries to distribute read load If lag consistently stays above 10 seconds, your secondary is fucked and needs more resources.

Connection timeouts are destroying my application

This is usually connection pool misconfiguration. Your application creates too many connections, hits MongoDB's limit, and everything dies. **Application fixes:** - **Node.js:** Around 15 connections per instance - **Python:** ~10 connections per process - **Java:** 50-100 per application server **Server fixes:** ```javascript // Check current connections db.serverStatus().connections // Increase if needed (but fix the app first) maxIncomingConnections: 2000 ``` Monitor connection count during deployments. I've seen microservices create thousands of connections because someone disabled pooling.

Can I just automate MongoDB optimization and forget about it?

Atlas Performance Advisor finds some problems automatically. But it won't fix: - Shit schema design - Queries written by drunk developers - Applications that create new connections for every request - Text indexes on 100GB collections Tools help, but you still need to understand what you're doing. There's no magic "make MongoDB fast" button.

My MongoDB cluster keeps hitting CPU spikes during peak hours. What's the culprit?

CPU spikes usually mean inefficient queries or missing indexes. Check these in order: 1. **Run profiler during peak**: `db.setProfilingLevel(1, { slowms: 50 })` - catch anything over 50ms 2. **Look for collection scans**: `db.system.profile.find({"planSummary": /COLLSCAN/})` 3. **Check for regex queries**: `db.system.profile.find({"command.filter": {$exists: true}})` 4. **Monitor concurrent connections**: `db.serverStatus().connections` Most CPU spikes come from queries that scan millions of docs. One bad aggregation pipeline during lunch hour can take down your entire cluster.

MongoDB is throwing "MongoNetworkTimeoutError" constantly. How do I fix this shit?

Network timeouts are usually connection pool problems, not actual network issues. Check these settings: **Connection pool configuration:** - **maxPoolSize**: Should be 10-50 per application instance - **maxIdleTimeMS**: Don't set too low, 300000ms (5 min) is reasonable - **serverSelectionTimeoutMS**: Increase to 10000ms if you see intermittent failures **Common causes:** - App creates new MongoClient for every request (connection pool exhaustion) - Load balancer timeout < MongoDB socket timeout - DNS resolution issues with replica set discovery - Firewall dropping idle connections Fix 90% of timeout issues by reusing a single MongoClient instance across your entire application.

My writes are getting slower as my collection grows past 10M documents. Normal?

Yes, but you can minimize it. Write performance degrades because: **Index maintenance overhead scales with collection size:** - 1M docs: Pretty fast writes - 10M docs: Getting slower - 100M docs: Noticeably slow writes - 1B docs: You're probably fucked without sharding **Optimization strategies:** - **Drop unused indexes**: Each index adds 10-30% write overhead - **Use bulk operations**: `insertMany()` instead of multiple `insertOne()` calls - **Consider write concern**: `{w: 1}` instead of `{w: "majority"}` for non-critical data - **Shard before 100M documents**: Shard key selection matters more than you think **Emergency fix for huge collections:** ```javascript // Check index usage first db.collection.aggregate([{$indexStats: {}}]) // Drop indexes with 0 usage db.collection.dropIndex("unused_field_1") ``` The 100M document mark is where most people realize they should have planned for sharding.

Currently viewing the AI version

Switch to human version

MongoDB Performance Tuning: AI-Optimized Technical Reference

CRITICAL CONFIGURATION SETTINGS

Database Profiler Configuration

Default State: OFF (fatal for production debugging)
Production Setting: db.setProfilingLevel(1, { slowms: 100 })
Level 0: Disabled (dangerous default)
Level 1: Log slow queries only (recommended)
Level 2: Log everything (will fill disk space rapidly - 200GB+ quickly)

WiredTiger Cache Configuration

Default: 50% RAM minus 1GB (too conservative for dedicated servers)
Production Recommended: 70-80% of available RAM
Configuration Command: db.adminCommand({setParameter: 1, "wiredTigerEngineConfigString": "cache_size=20GB"})
Critical Warning: Do not modify checkpoint intervals unless experienced - data corruption risk during power outages

MongoDB Version-Specific Issues

MongoDB 7.0: Contains performance regression reducing concurrent transactions from 128 to 8

Emergency Fix for 7.0:

db.adminCommand({
  setParameter: 1,
  storageEngineConcurrentWriteTransactions: 128,
  storageEngineConcurrentReadTransactions: 128
})

MongoDB 8.0: Fixes 7.0 issues but requires extensive staging testing
Recommendation: Skip 7.0 entirely, upgrade 6.0 → 8.0 directly

PERFORMANCE ANALYSIS TOOLS

Query Analysis Commands

// Check profiler status
db.getProfilingStatus()

// Find slowest queries
db.system.profile.find().limit(5).sort({ millis: -1 }).pretty()

// Query execution analysis
db.collection.find(query).explain("executionStats")

// Current connections
db.serverStatus().connections

Critical Metrics to Monitor

millis: Query execution time
planSummary: Index usage (IXSCAN = good, COLLSCAN = collection scan failure)
docsExamined vs docsReturned: Efficiency ratio (high ratio indicates missing indexes)
totalDocsExamined: High values indicate performance problems
executionTimeMillis: Query duration

INDEX OPTIMIZATION MATRIX

Index Type	Use Cases	Storage Overhead	Write Performance Impact	Critical Warnings
Single Field	Simple lookups (user_id, email)	~10% of collection	Minimal	Creates too many reduces write performance
Compound	Multi-field queries	15-25% of collection	Moderate	Field order critical - wrong order = useless
Text	Full-text search	30-50% of collection	Severe write degradation	Avoid unless no alternative exists
Geospatial (2dsphere)	Location queries	15-25% of collection	Moderate	Works for 2D only, avoid 3D altitude indexing
TTL	Auto-expiring data	Minimal	Background cleanup overhead	Incorrect TTL values delete live data
Sparse	Optional fields with nulls	Significantly lower	Improves writes	Only beneficial for sparse data
Partial	Filtered datasets	Much lower	Better write performance	Complex filters may be ignored by optimizer
Hashed	Sharding shard keys	Standard	Standard	Prevents range queries and sorting

Index Quantity Guidelines

0-5 indexes: Usually acceptable
6-15 indexes: Monitor write performance degradation (10-15% per index)
15+ indexes: Likely over-indexed, review necessity
Rule: One compound index better than multiple single-field indexes

CONNECTION POOL CONFIGURATION

Application-Side Limits

Node.js: 15 connections per instance maximum
Python: 10 connections per process maximum
Java: 50-100 connections per server acceptable

Connection Pool Settings

const client = new MongoClient(uri, {
  maxPoolSize: 15,
  maxIdleTimeMS: 300000,     // 5 minutes
  serverSelectionTimeoutMS: 10000,
  socketTimeoutMS: 45000
});

Server-Side Configuration

// Check current connections
db.serverStatus().connections

// Configuration limit
net:
  maxIncomingConnections: 2000

AGGREGATION PIPELINE OPTIMIZATION

Critical Performance Rules

$match First Rule: Always place $match stages at pipeline beginning
Memory Limit: 100MB RAM limit kills aggregations
Emergency Setting: Use allowDiskUse: true for large operations (expect slow performance)

Optimized Pipeline Structure

// CORRECT - Filter first
db.orders.aggregate([
  { $match: { status: "completed", date: { $gte: lastMonth } } },  // Reduce dataset first
  { $lookup: { from: "customers", ... } },                        // Join smaller dataset
  { $group: { _id: "$customer_id", total: { $sum: "$amount" } } }, // Process fewer documents
  { $sort: { total: -1 } }                                       // Sort final results
])

// INCORRECT - Process everything then filter
db.orders.aggregate([
  { $lookup: { from: "customers", ... } },          // Join all data
  { $group: { ... } },                              // Process everything
  { $match: { status: "completed" } },              // Filter after damage done
  { $sort: { total: -1 } }
])

PRODUCTION INFRASTRUCTURE REQUIREMENTS

Hardware Specifications

Storage: NVMe SSDs required (SATA SSDs minimum, spinning disks unsuitable)
CPU: More cores = better concurrency (4 cores minimum, 8+ cores recommended)
Memory: Working set must fit in cache (non-negotiable for performance)

Atlas Tier Performance Reality

M10-M30: Shared instances, unpredictable performance due to noisy neighbors
M40-M60: Dedicated instances, predictable performance baseline
M80+: High performance tier, significant cost increase

Atlas Pricing Context (2025)

M10: ~$60/month, 2GB RAM - development only
M30: ~$285/month, 8GB RAM - small production minimum
M50: ~$580/month, 16GB RAM - typical business deployment
M80: >$1000/month, 32GB RAM - serious applications
Auto-scaling Warning: Set upper limits to prevent unexpected bills (cases of $15,000-$30,000 surprise costs)

CRITICAL FAILURE SCENARIOS

Collection Scan Disasters

Symptom: planSummary: "COLLSCAN" in explain output
Impact: Query scans entire collection, performance degrades exponentially with data growth
Root Cause: Missing indexes or incorrect field order in compound indexes
Detection: db.system.profile.find({"planSummary": /COLLSCAN/}).count()

Connection Pool Exhaustion

Symptom: MongoNetworkTimeoutError, connection refused errors
Impact: New users cannot connect, application failure
Root Cause: Application creates new connections per request instead of pooling
Prevention: Monitor db.serverStatus().connections.current during deployments

Memory Pressure Scenarios

Cache Pressure: Working set exceeds available cache memory
Impact: Performance drops to disk I/O speeds
Detection: Cache hit ratio below 95%
Resolution: Increase RAM or reduce working set size

Replication Lag Issues

Threshold: Lag above 10 seconds indicates serious problems
Impact: Read replicas serve stale data, backup integrity concerns
Causes: Underpowered secondary hardware, network latency, massive write spikes
Monitoring: rs.status().members[].optimeDate differences

PRODUCTION MONITORING CHECKLIST

Essential Metrics

Query time p95: Keep under 100ms for user-facing queries
Index hit ratio: Maintain above 95%
Connection count: Track for leak detection
Replication lag: Keep under 2 seconds
WiredTiger cache hit ratio: Target above 95%

Performance Degradation Triggers

Collection size milestones:
- Under 1M documents: Fast writes
- 1M-10M documents: Noticeable slowdown begins
- 10M-100M documents: Index maintenance becomes significant cost
- 100M+ documents: Consider sharding or archiving

Atlas-Specific Monitoring

Performance Advisor: Automatically identifies slow queries and suggests indexes
Auto-scaling limits: Set maximum cluster size to prevent cost explosions
Read preference configuration: Verify secondary lag acceptable for read workloads

COMMON PRODUCTION DISASTERS AND SOLUTIONS

Text Index Creation Disaster

Scenario: Text index created on production collection
Impact: 18-hour build time, $15,000 compute bill, feature never shipped
Prevention: Create text indexes during maintenance windows only

Collection Scan from Development Query

Scenario: db.users.find({}) executed on millions of documents
Impact: Hours-long execution, API downtime
Root Cause: Query worked in development with 10 documents
Prevention: Mandatory explain() analysis before production deployment

Aggregation Pipeline Memory Explosion

Scenario: Multi-stage aggregation with $lookup on every document
Impact: MongoDB attempted to load excessive data into memory, primary crash
Recovery: Long failover time, data inconsistency risk
Prevention: Pipeline stage ordering validation, memory usage testing

Connection Pool Massacre

Scenario: Application created new connections per HTTP request
Impact: Connection limit exceeded, authentication failures for new users
Solution: Single MongoClient instance with proper pooling configuration

OPTIMIZATION DECISION FRAMEWORK

Read vs Write Optimization Trade-offs

Read optimization: Requires multiple indexes (each index adds 10-15% write overhead)
Write optimization: Minimize indexes (reduces query flexibility)
Compromise strategy: Design compound indexes serving multiple query patterns

Index Creation Decision Matrix

Create index if: Query runs frequently AND current performance unacceptable
Avoid index if: Query runs rarely OR write performance more critical
Review regularly: Drop unused indexes (check with db.collection.aggregate([{$indexStats: {}}]))

Sharding Considerations

Shard before: 100M documents (write performance degradation threshold)
Shard key selection: More critical than optimization efforts
Alternative: Archiving old data with TTL indexes

EMERGENCY TROUBLESHOOTING COMMANDS

Immediate Performance Analysis

// Check for collection scans
db.system.profile.find({"planSummary": /COLLSCAN/})

// Current resource usage
db.serverStatus().wiredTiger.cache

// Connection status
db.serverStatus().connections

// Index usage statistics
db.collection.aggregate([{$indexStats: {}}])

// Replication status
rs.status().members[].optimeDate

Emergency Performance Fixes

// Enable profiler immediately
db.setProfilingLevel(1, { slowms: 50 })

// Check for unused indexes
db.collection.aggregate([{$indexStats: {}}])

// Drop unused indexes (example)
db.collection.dropIndex("unused_field_1")

// MongoDB 7.0 concurrency fix
db.adminCommand({
  setParameter: 1,
  storageEngineConcurrentWriteTransactions: 128,
  storageEngineConcurrentReadTransactions: 128
})

RESOURCE REQUIREMENTS AND COSTS

Time Investment for Optimization

Basic profiler setup: 30 minutes
Index optimization project: 1-2 weeks (depending on application complexity)
Infrastructure tuning: 3-5 days
Emergency production fix: 2-8 hours (depending on issue complexity)

Expertise Requirements

Basic optimization: Understanding of index concepts, explain() analysis
Advanced tuning: WiredTiger configuration, aggregation pipeline optimization
Production debugging: Profiler analysis, connection pool management, replica set troubleshooting

Hidden Costs

Atlas auto-scaling: Can generate surprise bills of $15,000-$30,000
Text index creation: Hours of compute time, significant cost on cloud platforms
Wrong MongoDB version: 7.0 performance regression requires hotfixes
Emergency support: Paid support plans for production issues

Break-Even Analysis

Small applications (<1M documents): Basic profiling and indexing sufficient
Medium applications (1M-50M documents): Comprehensive index strategy required
Large applications (50M+ documents): Professional optimization and monitoring essential

This technical reference provides actionable intelligence for MongoDB performance optimization while preserving critical operational context and failure scenarios that affect real-world implementation decisions.

Useful Links for Further Investigation

MongoDB Performance Resources That Don't Suck

Link	Description
MongoDB Query Optimization Guide	The only MongoDB documentation that's not complete garbage. Actually explains how query optimization works.
Database Profiler Documentation	How to set up the profiler without destroying your disk space. Follow this exactly.
Atlas Performance Advisor	One of the few Atlas features that actually works. Finds your shitty queries automatically.
MongoDB Indexing Strategies	Comprehensive index documentation. Read this before creating your 50th single-field index.
WiredTiger Configuration	How to configure the storage engine without corrupting your data.
MongoDB Compass	Free, official, crashes less than the alternatives. Visual explain plans are actually helpful.
Studio 3T Professional	Costs $200/year but worth every penny. Better profiling, query autocomplete, and doesn't freeze when you have large collections.
MongoDB for VS Code	Works well for quick queries if you live in your editor. Also available on [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=mongodb.mongodb-vscode).
MongoDB 8.0 Performance Deep Dive	Honest analysis of MongoDB 8.0 performance improvements and gotchas. Must read before upgrading.
MongoDB Memory Usage Analysis	Good explanation of how MongoDB actually uses RAM.
Index Design Patterns	MongoDB design patterns that don't suck. Schema optimization guide.
Stack Overflow - MongoDB Performance	Better answers than official forums. Search before asking obvious questions.
MongoDB Community Forums	Official support. Response quality varies from excellent to "have you tried turning it off and on again."
MongoDB Stack Overflow	Better than Reddit for technical questions. Search first.
MongoDB Slack	Good for quick questions if you can tolerate Slack.
Atlas Monitoring	Built into Atlas, comprehensive metrics, actually works. Fixed link that wasn't broken in validation.
Percona MongoDB Exporter	Open source Prometheus exporter. Percona knows MongoDB better than Oracle.
DataDog MongoDB Integration	Expensive but excellent dashboards and alerting. Worth it for large deployments.
New Relic MongoDB Monitoring	Similar to DataDog, slightly cheaper, good query analysis.
YCSB MongoDB Benchmark	Industry standard benchmark. Use this to test hardware changes.
MongoDB Official Benchmarks	Official benchmark scripts. Good for comparing MongoDB versions.
MongoDB University Performance Course	Free course that's actually worth your time. Covers indexing, profiling, and optimization.
MongoDB Performance Best Practices	Official best practices guide. Not marketing bullshit for once.
MongoDB Engineering Blog	Technical posts from MongoDB engineers. Skip the marketing fluff.
Percona MongoDB Blog	Percona engineers who actually run MongoDB in production. High-quality technical articles.
Studio 3T Blog	Good tutorials and troubleshooting guides from people who use MongoDB daily.
MongoDB JIRA Issues	Search known bugs before opening support tickets.
SERVER-94735	The MongoDB 7.0 concurrency bug that destroyed everyone's performance.
MongoDB Support Portal	Paid support if you have Atlas or Enterprise. Actually helpful for production issues.