Why mongoexport is So Damn Slow (And What Actually Causes It)

mongoexport performance sucks for several specific technical reasons that MongoDB doesn't make obvious. Based on real Stack Overflow threads and production experience, here's what actually kills performance and why your exports crawl at 500 docs per second on collections that should export way faster.

The Real Performance Killers

Single-Threaded Architecture: mongoexport is completely single-threaded. Even on a 16-core server, it'll max out one CPU core while the other 15 sit idle. This Stack Overflow thread shows someone waiting 12 hours to export 5.5% of a 130 million document collection. The MongoDB tools architecture never implemented parallel processing.

Terrible Memory Management: mongoexport is a memory-guzzling nightmare. With WiredTiger compression, it decompresses every fucking document into memory, does its thing, then throws it all away instead of streaming. I've watched it balloon to 14GB of RAM trying to export a collection that's 2GB compressed on disk. It's like watching someone fill up a swimming pool to wash their hands. Understanding WiredTiger storage explains why this is so inefficient.

Collection Scan Performance: Even with no query filters, mongoexport doesn't do efficient sequential reads. It performs scattered reads through WiredTiger's B-tree structure, which kills disk I/O performance. Someone with NVMe SSDs capable of 1GB/sec throughput was only getting 50MB/sec with mongoexport. The collection scanning behavior is fundamentally inefficient.

No Resume Capability: When mongoexport crashes (and it will), you start over from zero. No checkpointing, no resume functionality. Crash at 90% through your 48-hour export? You get to stare at this:

mongoexport --collection=massive_collection --out=data.json
2025-09-01T23:47:12.123+0000    connected to: mongodb://localhost/
2025-09-01T23:47:12.145+0000    exported 45372891 records
Killed

Then start over from 0 and contemplate your life choices.

Memory Usage Reality Check

WiredTiger Storage

Collection compression makes this worse. If your collection uses zlib compression (which is common), every document has to be decompressed during export. This happens in the same thread that's doing everything else, creating a CPU bottleneck even when your disk and network are underutilized. The compression algorithms all require CPU-intensive decompression.

Actual Numbers: A production export of a 15 million document collection (250GB on disk, compressed) required 8GB of RAM and took 18 hours. That's roughly 230 documents per second on hardware that should handle 10x that throughput.

The underlying getMore commands show the problem clearly:

command: getMore { getMore: 14338659261, collection: \"places\" } 
docsExamined:5369 numYields:1337 nreturned:5369 reslen:16773797 
protocol:op_query 22796ms

22.8 seconds to return 5,369 documents. That's 235 docs per second, and this was the optimized case.

Why Skip and Limit Don't Save You

The traditional workaround of using `--skip` and `--limit` to chunk exports doesn't work like you'd expect. MongoDB has to examine every document up to your skip value, so skip=10000000 means scanning 10 million documents just to start. This is a fundamental pagination limitation in MongoDB.

Skip Performance Reality:

  • Skip 0: starts immediately
  • Skip 1M: takes 5 minutes to start
  • Skip 10M: takes 45 minutes to start
  • Skip 50M: might never start

This makes parallel exports with skip/limit basically useless for large collections. Each process sits there scanning millions of documents it's going to ignore.

Parallel Processing: The Only Way to Make It Not Suck

Since mongoexport is single-threaded garbage, the only real solution is running multiple processes in parallel. This isn't some theoretical optimization - it's been tested and works. Stack Overflow testing shows 6x speed improvements with 8 parallel processes. The technique is similar to MongoDB parallel bulk operations.

Query-Based Parallel Processing (Actually Works)

Instead of skip/limit, divide your collection by query ranges. This requires a field you can split on - ideally something with decent distribution. Understanding ObjectID structure helps with range splitting.

ObjectID-Based Splitting (Best Option):

## Calculate ObjectID ranges for time periods
## ObjectIDs from 2025-01-01 start with: 6585...
## ObjectIDs from 2025-06-01 start with: 6656...

mongoexport --query='{\"_id\":{\"$gte\":{\"$oid\":\"658500000000000000000000\"},\"$lt\":{\"$oid\":\"659000000000000000000000\"}}}' \
  --collection=orders --db=prod --out=orders_q1.json &

mongoexport --query='{\"_id\":{\"$gte\":{\"$oid\":\"659000000000000000000000\"},\"$lt\":{\"$oid\":\"65a000000000000000000000\"}}}' \
  --collection=orders --db=prod --out=orders_q2.json &

## Run 4-8 of these in parallel

Date-Based Splitting (If You Have Date Fields):

mongoexport --query='{\"created_at\":{\"$gte\":{\"$date\":\"2025-01-01\"},\"$lt\":{\"$date\":\"2025-03-01\"}}}' \
  --collection=events --db=analytics --out=events_q1.json &

mongoexport --query='{\"created_at\":{\"$gte\":{\"$date\":\"2025-03-01\"},\"$lt\":{\"$date\":\"2025-06-01\"}}}' \
  --collection=events --db=analytics --out=events_q2.json &

Hash-Based Splitting (For Even Distribution):

## Split by modulo on a numeric field
mongoexport --query='{\"user_id\":{\"$mod\":[4,0]}}' --collection=users --db=app --out=users_0.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,1]}}' --collection=users --db=app --out=users_1.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,2]}}' --collection=users --db=app --out=users_2.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,3]}}' --collection=users --db=app --out=users_3.json &

The `$mod` operator provides even distribution across processes. This approach works best with indexed fields for optimal query performance.

Performance Testing Results

Real-world testing on an 8-core server with a 200K document collection:

  • 1 process: 32.7 seconds
  • 2 processes: 16.5 seconds (2x speedup)
  • 4 processes: 8.4 seconds (4x speedup)
  • 8 processes: 5.1 seconds (6.4x speedup)

Beyond 8 processes, you hit diminishing returns as disk I/O becomes the bottleneck. The sweet spot is usually cores × 0.75 processes. Understanding CPU vs I/O bottlenecks helps optimize parallel configuration. Monitor with iostat to identify bottlenecks.

Python-Based Parallel Solution

Python Programming

For more control, use Python with multiprocessing. This approach uses PyMongo's parallel_scan (MongoDB 3.6+ with MMAPv1) or custom query splitting. The PyMongo documentation covers parallel processing patterns.

from multiprocessing import Pool
import subprocess
import json

def export_chunk(query_params):
    collection, db, query, output_file = query_params
    
    cmd = [
        'mongoexport', 
        '--collection', collection,
        '--db', db,
        '--query', json.dumps(query),
        '--out', output_file
    ]
    
    subprocess.run(cmd, check=True)
    return f\"Exported {output_file}\"

## Define your chunks
chunks = [
    ('orders', 'prod', {'_id': {'$gte': ObjectId('658500000000000000000000'), '$lt': ObjectId('659000000000000000000000')}}, 'orders_1.json'),
    ('orders', 'prod', {'_id': {'$gte': ObjectId('659000000000000000000000'), '$lt': ObjectId('65a000000000000000000000')}}, 'orders_2.json'),
    # Add more chunks...
]

## Run in parallel
with Pool(processes=8) as pool:
    results = pool.map(export_chunk, chunks)

Memory Optimization Per Process

Each mongoexport process still has the same memory problems, but now you're spreading the load. Monitor memory usage:

## Watch memory usage while parallel export runs
watch -n 1 'ps aux | grep mongoexport | grep -v grep'

If processes start getting OOMKilled, reduce parallelism or add swap. Each process can use 2-4GB of RAM depending on document size and complexity.

Combining Output Files

After parallel export, combine the files:

## JSON files (create array)
echo '[' > combined.json
find . -name \"chunk_*.json\" -exec cat {} \; | sed 's/$/,/' | sed '$ s/,$//' >> combined.json  
echo ']' >> combined.json

## CSV files (preserve header)
head -1 chunk_0.csv > combined.csv
tail -n +2 -q chunk_*.csv >> combined.csv

This parallel approach is the only proven way to make mongoexport perform acceptably on large collections. It's not elegant, but it works when you need to export millions of documents without waiting days.

Performance Optimization Questions (Real Problems, Real Solutions)

Q

How many parallel mongoexport processes should I run?

A

Start with your CPU core count minus 2, then test.

On an 8-core box, try 6 processes first. More isn't always better

  • I've seen setups where 16 processes actually ran slower than 8 because they were all fighting over disk I/O and MongoDB started rejecting connections with:```Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed:

SocketException: server returned error on SASL authentication step:

AuthenticationFailed: Authentication failed. Too many authentication attempts.```

Q

Why does mongoexport still eat massive amounts of RAM even with parallel processing?

A

Each process has the same memory management problems as single-threaded mongoexport. If you're running 8 processes and each uses 3GB of RAM, you need 24GB total. Monitor with htop and kill processes if you start hitting swap. Better to run fewer processes than crash the server.

Q

Can I use indexes to speed up range queries for parallel export?

A

Absolutely, and you should. Create compound indexes on the fields you're splitting by:bashdb.orders.createIndex({"_id": 1}) # Usually existsdb.events.createIndex({"created_at": 1}) # For date-based splitsdb.users.createIndex({"user_id": 1}) # For hash-based splitsWithout proper indexes, each parallel query becomes a full collection scan, defeating the purpose.

Q

What happens if one of my parallel export processes crashes?

A

You lose that chunk and have to restart just that process.

This is why parallel export is better than single-threaded

  • lose 1/8th instead of everything. Keep track of which processes finished:bash# Add process trackingmongoexport --query='{\"_id\":{\"$gte\":...}}' --out=chunk_1.json && touch chunk_1.done &mongoexport --query='{\"_id\":{\"$gte\":...}}' --out=chunk_2.json && touch chunk_2.done &# Check what finishedls *.done
Q

How do I calculate ObjectID ranges for time-based splitting?

A

ObjectIDs embed timestamps. Use this Python to generate ranges:pythonfrom bson import ObjectIdfrom datetime import datetime# Create ObjectID for specific datestart_date = datetime(2025, 1, 1)end_date = datetime(2025, 6, 1)start_oid = ObjectId.from_datetime(start_date)end_oid = ObjectId.from_datetime(end_date)print(f\"Query: {{'_id': {{'$gte': ObjectId('{start_oid}'), '$lt': ObjectId('{end_oid}')}}}}\")

Q

Will parallel exports overwhelm my MongoDB server?

A

Possibly. Each mongoexport opens its own connection and runs its own query. On a production server with limited connection pools, 8 parallel exports might cause connection failures for your application. Use --readPreference=secondary to hit replicas instead of primary.

Q

Can I resume failed parallel exports?

A

Not directly, but you can check file sizes and restart missing chunks:bash# Check if files are too small (likely incomplete)find . -name \"chunk_*.json\" -size -1M -exec rm {} \;# Restart only the missing chunksThe nuclear option: delete everything and start over. At least with parallel processing, restarts only take hours instead of days.

Performance Comparison: mongoexport Optimization Techniques

Technique

Speed Improvement

Memory Usage

Complexity

Crash Recovery

Best For

Single Process Default

1x baseline

2-8GB per export

Simple

❌ Start over

Collections under 1M docs

Parallel by ObjectID Range

4-6x faster

2-8GB × processes

Medium

✅ Per-chunk recovery

Most collections

Parallel by Date Range

3-5x faster

2-8GB × processes

Medium

✅ Per-chunk recovery

Time-series data

Parallel by Hash/Modulo

5-7x faster

2-8GB × processes

Easy

✅ Per-chunk recovery

Evenly distributed fields

Skip/Limit Chunking

❌ Often slower

2-8GB per process

Easy

❌ Skip overhead

Never recommended

Python PyMongo Parallel

6-8x faster

1-3GB × processes

Hard

✅ Custom recovery

Complex requirements

mongodump + Processing

10-15x faster

500MB-2GB

Medium

✅ Resume capable

When JSON structure isn't critical

Performance Resources and Tools (What Actually Helps)

Related Tools & Recommendations

tool
Similar content

mongoexport: Export MongoDB Data to JSON & CSV - Overview

MongoDB's way of dumping collection data into readable JSON or CSV files

mongoexport
/tool/mongoexport/overview
100%
tool
Similar content

Protocol Buffers: Troubleshooting Performance & Memory Leaks

Real production issues and how to actually fix them (not just optimize them)

Protocol Buffers
/tool/protocol-buffers/performance-troubleshooting
70%
tool
Recommended

MongoDB Atlas Enterprise Deployment Guide

integrates with MongoDB Atlas

MongoDB Atlas
/tool/mongodb-atlas/enterprise-deployment
69%
tool
Similar content

PostgreSQL Performance Optimization: Master Tuning & Monitoring

Optimize PostgreSQL performance with expert tips on memory configuration, query tuning, index design, and production monitoring. Prevent outages and speed up yo

PostgreSQL
/tool/postgresql/performance-optimization
61%
tool
Similar content

Node.js Performance Optimization: Boost App Speed & Scale

Master Node.js performance optimization techniques. Learn to speed up your V8 engine, effectively use clustering & worker threads, and scale your applications e

Node.js
/tool/node.js/performance-optimization
53%
alternatives
Recommended

Your MongoDB Atlas Bill Just Doubled Overnight. Again.

integrates with MongoDB Atlas

MongoDB Atlas
/alternatives/mongodb-atlas/migration-focused-alternatives
51%
tool
Similar content

MongoDB Overview: How It Works, Pros, Cons & Atlas Costs

Explore MongoDB's document database model, understand its flexible schema benefits and pitfalls, and learn about the true costs of MongoDB Atlas. Includes FAQs

MongoDB
/tool/mongodb/overview
49%
tool
Similar content

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Explore PostgreSQL's advantages over other databases, dive into real-world production horror stories, solutions for common issues, and expert debugging tips.

PostgreSQL
/tool/postgresql/overview
49%
troubleshoot
Similar content

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

Production-tested solutions for MongoDB topology errors that break Node.js apps and kill database connections

MongoDB
/troubleshoot/mongodb-topology-closed/connection-pool-exhaustion-solutions
49%
tool
Similar content

LM Studio Performance: Fix Crashes & Speed Up Local AI

Stop fighting memory crashes and thermal throttling. Here's how to make LM Studio actually work on real hardware.

LM Studio
/tool/lm-studio/performance-optimization
44%
integration
Similar content

MongoDB Express Mongoose Production: Deployment & Troubleshooting

Deploy Without Breaking Everything (Again)

MongoDB
/integration/mongodb-express-mongoose/production-deployment-guide
41%
howto
Similar content

MongoDB to PostgreSQL Migration: The Complete Survival Guide

Four Months of Pain, 47k Lost Sessions, and What Actually Works

MongoDB
/howto/migrate-mongodb-to-postgresql/complete-migration-guide
41%
compare
Similar content

MongoDB vs DynamoDB vs Cosmos DB: Enterprise Database Selection Guide

Real talk from someone who's deployed all three in production and lived through the 3AM outages

MongoDB
/compare/mongodb/dynamodb/cosmos-db/enterprise-database-selection-guide
39%
tool
Similar content

CDC Database Platform Guide: PostgreSQL, MySQL, MongoDB Setup

Stop wasting weeks debugging database-specific CDC setups that the vendor docs completely fuck up

Change Data Capture (CDC)
/tool/change-data-capture/database-platform-implementations
39%
compare
Similar content

PostgreSQL vs. MySQL vs. MongoDB: Enterprise Scaling Reality

When Your Database Needs to Handle Enterprise Load Without Breaking Your Team's Sanity

PostgreSQL
/compare/postgresql/mysql/mongodb/redis/cassandra/enterprise-scaling-reality-check
37%
compare
Similar content

MongoDB vs. PostgreSQL vs. MySQL: 2025 Performance Benchmarks

Dive into real-world 2025 performance benchmarks for MongoDB, PostgreSQL, and MySQL. Discover which database truly excels under load for reads and writes, beyon

/compare/mongodb/postgresql/mysql/performance-benchmarks-2025
37%
tool
Similar content

Webpack Performance Optimization: Fix Slow Builds & Bundles

Optimize Webpack performance: fix slow builds, reduce giant bundle sizes, and implement production-ready configurations. Improve app loading speed and user expe

Webpack
/tool/webpack/performance-optimization
37%
tool
Similar content

pandas Overview: What It Is, Use Cases, & Common Problems

Data manipulation that doesn't make you want to quit programming

pandas
/tool/pandas/overview
37%
tool
Similar content

React Production Debugging: Fix App Crashes & White Screens

Five ways React apps crash in production that'll make you question your life choices.

React
/tool/react/debugging-production-issues
37%
tool
Similar content

Python 3.12 Migration Guide: Faster Performance, Dependency Hell

Navigate Python 3.12 migration with this guide. Learn what breaks, what gets faster, and how to avoid dependency hell. Real-world insights from 7 app upgrades.

Python 3.12
/tool/python-3.12/migration-guide
37%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization