mongoexport Performance Optimization - Stop Waiting Hours for Exports

Why mongoexport is So Damn Slow (And What Actually Causes It)

mongoexport performance sucks for several specific technical reasons that MongoDB doesn't make obvious. Based on real Stack Overflow threads and production experience, here's what actually kills performance and why your exports crawl at 500 docs per second on collections that should export way faster.

The Real Performance Killers

Single-Threaded Architecture: mongoexport is completely single-threaded. Even on a 16-core server, it'll max out one CPU core while the other 15 sit idle. This Stack Overflow thread shows someone waiting 12 hours to export 5.5% of a 130 million document collection. The MongoDB tools architecture never implemented parallel processing.

Terrible Memory Management: mongoexport is a memory-guzzling nightmare. With WiredTiger compression, it decompresses every fucking document into memory, does its thing, then throws it all away instead of streaming. I've watched it balloon to 14GB of RAM trying to export a collection that's 2GB compressed on disk. It's like watching someone fill up a swimming pool to wash their hands. Understanding WiredTiger storage explains why this is so inefficient.

Collection Scan Performance: Even with no query filters, mongoexport doesn't do efficient sequential reads. It performs scattered reads through WiredTiger's B-tree structure, which kills disk I/O performance. Someone with NVMe SSDs capable of 1GB/sec throughput was only getting 50MB/sec with mongoexport. The collection scanning behavior is fundamentally inefficient.

No Resume Capability: When mongoexport crashes (and it will), you start over from zero. No checkpointing, no resume functionality. Crash at 90% through your 48-hour export? You get to stare at this:

mongoexport --collection=massive_collection --out=data.json
2025-09-01T23:47:12.123+0000    connected to: mongodb://localhost/
2025-09-01T23:47:12.145+0000    exported 45372891 records
Killed

Then start over from 0 and contemplate your life choices.

Memory Usage Reality Check

WiredTiger Storage

Collection compression makes this worse. If your collection uses zlib compression (which is common), every document has to be decompressed during export. This happens in the same thread that's doing everything else, creating a CPU bottleneck even when your disk and network are underutilized. The compression algorithms all require CPU-intensive decompression.

Actual Numbers: A production export of a 15 million document collection (250GB on disk, compressed) required 8GB of RAM and took 18 hours. That's roughly 230 documents per second on hardware that should handle 10x that throughput.

The underlying getMore commands show the problem clearly:

command: getMore { getMore: 14338659261, collection: \"places\" } 
docsExamined:5369 numYields:1337 nreturned:5369 reslen:16773797 
protocol:op_query 22796ms

22.8 seconds to return 5,369 documents. That's 235 docs per second, and this was the optimized case.

Why Skip and Limit Don't Save You

The traditional workaround of using `--skip` and `--limit` to chunk exports doesn't work like you'd expect. MongoDB has to examine every document up to your skip value, so skip=10000000 means scanning 10 million documents just to start. This is a fundamental pagination limitation in MongoDB.

Skip Performance Reality:

Skip 0: starts immediately
Skip 1M: takes 5 minutes to start
Skip 10M: takes 45 minutes to start
Skip 50M: might never start

This makes parallel exports with skip/limit basically useless for large collections. Each process sits there scanning millions of documents it's going to ignore.

Parallel Processing: The Only Way to Make It Not Suck

Since mongoexport is single-threaded garbage, the only real solution is running multiple processes in parallel. This isn't some theoretical optimization - it's been tested and works. Stack Overflow testing shows 6x speed improvements with 8 parallel processes. The technique is similar to MongoDB parallel bulk operations.

Query-Based Parallel Processing (Actually Works)

Instead of skip/limit, divide your collection by query ranges. This requires a field you can split on - ideally something with decent distribution. Understanding ObjectID structure helps with range splitting.

ObjectID-Based Splitting (Best Option):

## Calculate ObjectID ranges for time periods
## ObjectIDs from 2025-01-01 start with: 6585...
## ObjectIDs from 2025-06-01 start with: 6656...

mongoexport --query='{\"_id\":{\"$gte\":{\"$oid\":\"658500000000000000000000\"},\"$lt\":{\"$oid\":\"659000000000000000000000\"}}}' \
  --collection=orders --db=prod --out=orders_q1.json &

mongoexport --query='{\"_id\":{\"$gte\":{\"$oid\":\"659000000000000000000000\"},\"$lt\":{\"$oid\":\"65a000000000000000000000\"}}}' \
  --collection=orders --db=prod --out=orders_q2.json &

## Run 4-8 of these in parallel

Date-Based Splitting (If You Have Date Fields):

mongoexport --query='{\"created_at\":{\"$gte\":{\"$date\":\"2025-01-01\"},\"$lt\":{\"$date\":\"2025-03-01\"}}}' \
  --collection=events --db=analytics --out=events_q1.json &

mongoexport --query='{\"created_at\":{\"$gte\":{\"$date\":\"2025-03-01\"},\"$lt\":{\"$date\":\"2025-06-01\"}}}' \
  --collection=events --db=analytics --out=events_q2.json &

Hash-Based Splitting (For Even Distribution):

## Split by modulo on a numeric field
mongoexport --query='{\"user_id\":{\"$mod\":[4,0]}}' --collection=users --db=app --out=users_0.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,1]}}' --collection=users --db=app --out=users_1.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,2]}}' --collection=users --db=app --out=users_2.json &
mongoexport --query='{\"user_id\":{\"$mod\":[4,3]}}' --collection=users --db=app --out=users_3.json &

The `$mod` operator provides even distribution across processes. This approach works best with indexed fields for optimal query performance.

Performance Testing Results

Real-world testing on an 8-core server with a 200K document collection:

1 process: 32.7 seconds
2 processes: 16.5 seconds (2x speedup)
4 processes: 8.4 seconds (4x speedup)
8 processes: 5.1 seconds (6.4x speedup)

Beyond 8 processes, you hit diminishing returns as disk I/O becomes the bottleneck. The sweet spot is usually cores × 0.75 processes. Understanding CPU vs I/O bottlenecks helps optimize parallel configuration. Monitor with iostat to identify bottlenecks.

Python-Based Parallel Solution

Python Programming

For more control, use Python with multiprocessing. This approach uses PyMongo's parallel_scan (MongoDB 3.6+ with MMAPv1) or custom query splitting. The PyMongo documentation covers parallel processing patterns.

from multiprocessing import Pool
import subprocess
import json

def export_chunk(query_params):
    collection, db, query, output_file = query_params
    
    cmd = [
        'mongoexport', 
        '--collection', collection,
        '--db', db,
        '--query', json.dumps(query),
        '--out', output_file
    ]
    
    subprocess.run(cmd, check=True)
    return f\"Exported {output_file}\"

## Define your chunks
chunks = [
    ('orders', 'prod', {'_id': {'$gte': ObjectId('658500000000000000000000'), '$lt': ObjectId('659000000000000000000000')}}, 'orders_1.json'),
    ('orders', 'prod', {'_id': {'$gte': ObjectId('659000000000000000000000'), '$lt': ObjectId('65a000000000000000000000')}}, 'orders_2.json'),
    # Add more chunks...
]

## Run in parallel
with Pool(processes=8) as pool:
    results = pool.map(export_chunk, chunks)

Memory Optimization Per Process

Each mongoexport process still has the same memory problems, but now you're spreading the load. Monitor memory usage:

## Watch memory usage while parallel export runs
watch -n 1 'ps aux | grep mongoexport | grep -v grep'

If processes start getting OOMKilled, reduce parallelism or add swap. Each process can use 2-4GB of RAM depending on document size and complexity.

Combining Output Files

After parallel export, combine the files:

## JSON files (create array)
echo '[' > combined.json
find . -name \"chunk_*.json\" -exec cat {} \; | sed 's/$/,/' | sed '$ s/,$//' >> combined.json  
echo ']' >> combined.json

## CSV files (preserve header)
head -1 chunk_0.csv > combined.csv
tail -n +2 -q chunk_*.csv >> combined.csv

This parallel approach is the only proven way to make mongoexport perform acceptably on large collections. It's not elegant, but it works when you need to export millions of documents without waiting days.

Performance Optimization Questions (Real Problems, Real Solutions)

How many parallel mongoexport processes should I run?

Start with your CPU core count minus 2, then test.

On an 8-core box, try 6 processes first. More isn't always better

I've seen setups where 16 processes actually ran slower than 8 because they were all fighting over disk I/O and MongoDB started rejecting connections with:```Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed:

SocketException: server returned error on SASL authentication step:

AuthenticationFailed: Authentication failed. Too many authentication attempts.```

Why does mongoexport still eat massive amounts of RAM even with parallel processing?

Each process has the same memory management problems as single-threaded mongoexport. If you're running 8 processes and each uses 3GB of RAM, you need 24GB total. Monitor with htop and kill processes if you start hitting swap. Better to run fewer processes than crash the server.

Can I use indexes to speed up range queries for parallel export?

Absolutely, and you should. Create compound indexes on the fields you're splitting by:bashdb.orders.createIndex({"_id": 1}) # Usually existsdb.events.createIndex({"created_at": 1}) # For date-based splitsdb.users.createIndex({"user_id": 1}) # For hash-based splitsWithout proper indexes, each parallel query becomes a full collection scan, defeating the purpose.

What happens if one of my parallel export processes crashes?

You lose that chunk and have to restart just that process.

This is why parallel export is better than single-threaded

lose 1/8th instead of everything. Keep track of which processes finished:bash# Add process trackingmongoexport --query='{\"_id\":{\"$gte\":...}}' --out=chunk_1.json && touch chunk_1.done &mongoexport --query='{\"_id\":{\"$gte\":...}}' --out=chunk_2.json && touch chunk_2.done &# Check what finishedls *.done

How do I calculate ObjectID ranges for time-based splitting?

ObjectIDs embed timestamps. Use this Python to generate ranges:pythonfrom bson import ObjectIdfrom datetime import datetime# Create ObjectID for specific datestart_date = datetime(2025, 1, 1)end_date = datetime(2025, 6, 1)start_oid = ObjectId.from_datetime(start_date)end_oid = ObjectId.from_datetime(end_date)print(f\"Query: {{'_id': {{'$gte': ObjectId('{start_oid}'), '$lt': ObjectId('{end_oid}')}}}}\")

Will parallel exports overwhelm my MongoDB server?

Possibly. Each mongoexport opens its own connection and runs its own query. On a production server with limited connection pools, 8 parallel exports might cause connection failures for your application. Use --readPreference=secondary to hit replicas instead of primary.

Can I resume failed parallel exports?

Not directly, but you can check file sizes and restart missing chunks:bash# Check if files are too small (likely incomplete)find . -name \"chunk_*.json\" -size -1M -exec rm {} \;# Restart only the missing chunksThe nuclear option: delete everything and start over. At least with parallel processing, restarts only take hours instead of days.

Performance Comparison: mongoexport Optimization Techniques

Technique	Speed Improvement	Memory Usage	Complexity	Crash Recovery	Best For
Single Process Default	1x baseline	2-8GB per export	Simple	❌ Start over	Collections under 1M docs
Parallel by ObjectID Range	4-6x faster	2-8GB × processes	Medium	✅ Per-chunk recovery	Most collections
Parallel by Date Range	3-5x faster	2-8GB × processes	Medium	✅ Per-chunk recovery	Time-series data
Parallel by Hash/Modulo	5-7x faster	2-8GB × processes	Easy	✅ Per-chunk recovery	Evenly distributed fields
Skip/Limit Chunking	❌ Often slower	2-8GB per process	Easy	❌ Skip overhead	Never recommended
Python PyMongo Parallel	6-8x faster	1-3GB × processes	Hard	✅ Custom recovery	Complex requirements
mongodump + Processing	10-15x faster	500MB-2GB	Medium	✅ Resume capable	When JSON structure isn't critical

Quick Navigation

The Real Performance Killers

Memory Usage Reality Check

Why Skip and Limit Don't Save You

Query-Based Parallel Processing (Actually Works)

Performance Testing Results

Python-Based Parallel Solution

Memory Optimization Per Process

Combining Output Files

How many parallel mongoexport processes should I run?

Why does mongoexport still eat massive amounts of RAM even with parallel processing?

Can I use indexes to speed up range queries for parallel export?

What happens if one of my parallel export processes crashes?

How do I calculate ObjectID ranges for time-based splitting?

Will parallel exports overwhelm my MongoDB server?

Can I resume failed parallel exports?

Related Tools & Recommendations

mongoexport: Export MongoDB Data to JSON & CSV - Overview

Protocol Buffers: Troubleshooting Performance & Memory Leaks

MongoDB Atlas Enterprise Deployment Guide

PostgreSQL Performance Optimization: Master Tuning & Monitoring

Node.js Performance Optimization: Boost App Speed & Scale

Your MongoDB Atlas Bill Just Doubled Overnight. Again.

MongoDB Overview: How It Works, Pros, Cons & Atlas Costs

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

LM Studio Performance: Fix Crashes & Speed Up Local AI

MongoDB Express Mongoose Production: Deployment & Troubleshooting

MongoDB to PostgreSQL Migration: The Complete Survival Guide

MongoDB vs DynamoDB vs Cosmos DB: Enterprise Database Selection Guide

CDC Database Platform Guide: PostgreSQL, MySQL, MongoDB Setup

PostgreSQL vs. MySQL vs. MongoDB: Enterprise Scaling Reality

MongoDB vs. PostgreSQL vs. MySQL: 2025 Performance Benchmarks

Webpack Performance Optimization: Fix Slow Builds & Bundles

pandas Overview: What It Is, Use Cases, & Common Problems

React Production Debugging: Fix App Crashes & White Screens

Python 3.12 Migration Guide: Faster Performance, Dependency Hell