The Production Horror Stories (And How to Fix Them)

Memory Leaks: The Silent Killers

Your app starts at 200MB RAM. Six hours later it's at 1.8GB and climbing. The V8 heap limit is ~2GB on 64-bit systems - hit it and your app dies with FATAL ERROR: Reached heap limit. Use heap profiling tools and Chrome DevTools to track down the leaks before they kill your production server.

Most Common Culprits:

Memory Leak Detection Tools

Global variables that never get cleared:

// WRONG - creates a memory leak
let userCache = new Map();
app.get('/users/:id', (req, res) => {
    userCache.set(req.params.id, userData); // Never cleaned up
});

// RIGHT - use TTL cache with cleanup
const NodeCache = require('node-cache');
const userCache = new NodeCache({ stdTTL: 600 }); // 10-minute expiry

Event listeners that pile up:

// WRONG - adds listener on every request
app.get('/data', (req, res) => {
    req.on('close', handleClose); // Memory leak
});

// RIGHT - remove listeners
app.get('/data', (req, res) => {
    req.on('close', handleClose);
    res.on('finish', () => {
        req.removeListener('close', handleClose);
    });
});

Debugging Memory Leaks - Tools That Actually Work:

**Clinic.js Doctor** - Free and catches most leaks:

npm install -g @nodejs/clinic
clinic doctor -- node app.js
## Let it run for 10+ minutes under load
## Kill with Ctrl+C and check the generated report

**0x Profiler** - Shows exactly what's eating memory:

npm install -g 0x
0x app.js
## Generate load, then kill process
## Opens flame graph showing memory hotspots

Production Memory Monitoring:

// Real production memory monitoring
const memoryUsage = () => {
    const usage = process.memoryUsage();
    console.log({
        rss: Math.round(usage.rss / 1024 / 1024) + 'MB',
        heapUsed: Math.round(usage.heapUsed / 1024 / 1024) + 'MB',
        heapTotal: Math.round(usage.heapTotal / 1024 / 1024) + 'MB',
        external: Math.round(usage.external / 1024 / 1024) + 'MB'
    });
    
    // Kill process if heap usage > 1.5GB (before hitting 2GB limit)
    if (usage.heapUsed > 1.5 * 1024 * 1024 * 1024) {
        console.error('Memory usage too high, restarting...');
        process.exit(1);
    }
};

setInterval(memoryUsage, 30000); // Check every 30 seconds

Event Loop Blocking - When Everything Stops

The event loop is single-threaded. Block it and your entire API becomes unresponsive. I've seen 2-second API responses turn into 30-second timeouts because someone processed a CSV file synchronously.

Event Loop Lag Detection:

const { performance } = require('perf_hooks');

let previousNow = performance.now();
setInterval(() => {
    const now = performance.now();
    const lag = now - previousNow - 1000; // Expected 1000ms interval
    
    if (lag > 100) {
        console.warn(`Event loop lag: ${lag.toFixed(2)}ms`);
        
        // Log stack trace to find the blocking code
        console.trace('Event loop blocked here');
    }
    previousNow = now;
}, 1000);

Common Event Loop Blockers:

Synchronous file operations - Never use these in production:

// WRONG - blocks the event loop completely
const fs = require('fs');
const data = fs.readFileSync('./large-file.json'); // BLOCKS EVERYTHING

// RIGHT - async file operations
const fs = require('fs').promises;
const data = await fs.readFile('./large-file.json'); // Non-blocking

JSON.parse() on large payloads:

// WRONG - blocks on large JSON
app.post('/upload', (req, res) => {
    const data = JSON.parse(req.body); // Can block for seconds
});

// RIGHT - stream processing or worker threads
const { Worker } = require('worker_threads');

app.post('/upload', (req, res) => {
    const worker = new Worker(`
        const { parentPort } = require('worker_threads');
        parentPort.on('message', (data) => {
            try {
                const parsed = JSON.parse(data);
                parentPort.postMessage({ success: true, data: parsed });
            } catch (error) {
                parentPort.postMessage({ success: false, error: error.message });
            }
        });
    `, { eval: true });
    
    worker.postMessage(req.body);
    worker.on('message', (result) => {
        res.json(result);
        worker.terminate();
    });
});

Node.js Event Loop Blocking

Database Connection Hell

Database connections are where most production Node.js apps die. Connection pools run out, queries hang forever, and suddenly your API returns 500 errors.

Connection Pool Debugging:

// Most apps get this wrong
const mysql = require('mysql2');

const pool = mysql.createPool({
    host: 'localhost',
    user: 'app',
    password: 'secret',
    database: 'production',
    connectionLimit: 10, // Too low for production load
    acquireTimeout: 60000, // Default timeout often too high
    timeout: 60000,
    reconnect: true
});

// RIGHT - production-ready pool with monitoring
const pool = mysql.createPool({
    host: process.env.DB_HOST,
    user: process.env.DB_USER,
    password: process.env.DB_PASSWORD,
    database: process.env.DB_NAME,
    connectionLimit: 50, // Higher limit for production
    acquireTimeout: 10000, // Fail fast on connection issues
    timeout: 30000, // Reasonable query timeout
    reconnect: true,
    multipleStatements: false // Security
});

// Monitor pool health
setInterval(() => {
    console.log('DB Pool Stats:', {
        totalConnections: pool.config.connectionLimit,
        activeConnections: pool._allConnections.length,
        freeConnections: pool._freeConnections.length,
        queuedConnections: pool._connectionQueue.length
    });
    
    // Alert if pool is running low
    const freeConnections = pool._freeConnections.length;
    const totalConnections = pool.config.connectionLimit;
    
    if (freeConnections / totalConnections < 0.2) {
        console.error('Database connection pool running low!');
    }
}, 30000);

Query Timeout Hell:

// Production query with proper timeout handling
const executeQuery = (query, params) => {
    return new Promise((resolve, reject) => {
        const timeout = setTimeout(() => {
            reject(new Error('Query timeout'));
        }, 15000); // 15 second timeout
        
        pool.execute(query, params, (error, results) => {
            clearTimeout(timeout);
            
            if (error) {
                console.error('Query failed:', {
                    query: query.substring(0, 100) + '...',
                    error: error.message,
                    code: error.code,
                    errno: error.errno
                });
                reject(error);
            } else {
                resolve(results);
            }
        });
    });
};

// Usage with error handling
app.get('/users/:id', async (req, res) => {
    try {
        const results = await executeQuery(
            'SELECT * FROM users WHERE id = ?',
            [req.params.id]
        );
        
        if (results.length === 0) {
            return res.status(404).json({ error: 'User not found' });
        }
        
        res.json(results[0]);
    } catch (error) {
        console.error('Database error:', error);
        
        if (error.message === 'Query timeout') {
            res.status(504).json({ error: 'Database timeout' });
        } else {
            res.status(500).json({ error: 'Database error' });
        }
    }
});

Process Crashes and Recovery

Your Node.js process will crash. The question is whether you'll recover gracefully or leave users staring at error pages.

Graceful Shutdown Handling:

// Production-ready graceful shutdown
const gracefulShutdown = (signal) => {
    console.log(`Received ${signal}, starting graceful shutdown...`);
    
    // Stop accepting new connections
    server.close((err) => {
        if (err) {
            console.error('Error during server close:', err);
            process.exit(1);
        }
        
        console.log('HTTP server closed');
        
        // Close database connections
        if (pool) {
            pool.end(() => {
                console.log('Database pool closed');
                process.exit(0);
            });
        } else {
            process.exit(0);
        }
    });
    
    // Force exit after 30 seconds
    setTimeout(() => {
        console.error('Forced exit after timeout');
        process.exit(1);
    }, 30000);
};

// Handle shutdown signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

// Handle uncaught exceptions
process.on('uncaughtException', (error) => {
    console.error('Uncaught Exception:', error);
    
    // Log to external service (Sentry, LogRocket, etc.)
    if (typeof logError === 'function') {
        logError(error);
    }
    
    // Graceful shutdown after logging
    setTimeout(() => {
        process.exit(1);
    }, 1000);
});

// Handle unhandled promise rejections
process.on('unhandledRejection', (reason, promise) => {
    console.error('Unhandled Promise Rejection at:', promise, 'reason:', reason);
    
    // Log the error but don't exit immediately
    if (typeof logError === 'function') {
        logError(reason);
    }
});

The key to production Node.js troubleshooting is preparation. Set up monitoring before things break, because when your app crashes at 3AM, you need data immediately - not time to install debugging tools.

Production Troubleshooting FAQ: The Questions Asked at 3AM

Q

My Node.js app crashes with "heap out of memory" - what do I do RIGHT NOW?

A

Your app hit the V8 heap limit (~2GB). Immediate fix: Restart with more memory:

## Emergency restart with 4GB heap
node --max-old-space-size=4096 app.js

## Or 8GB if you have the RAM
node --max-old-space-size=8192 app.js

But this is a band-aid. You have a memory leak. Use Clinic.js or 0x profiler to find what's eating memory. Pin EVERYTHING. I mean everything - exact versions in package-lock.json.

Q

My API responses went from 200ms to 10+ seconds overnight - WTF happened?

A

The event loop is blocked. Something is doing synchronous work that's killing performance.

Quick diagnosis:

## Install event loop lag monitoring
npm install --save @nodejs/clinic

## Run with doctor
clinic doctor -- node app.js
## Load test your app
## Kill with Ctrl+C and check the flamegraph

Common causes:

  • Someone added fs.readFileSync() in a request handler
  • JSON.parse() on large payloads (>10MB)
  • Unoptimized loops processing arrays
  • RegExp on user input (ReDoS attacks)
  • Synchronous crypto operations
Q

Database connections are timing out but the DB server is fine - what gives?

A

Your connection pool is exhausted. Node.js apps create tons of concurrent connections and most pools default to 10 connections max.

Check pool health:

// Log pool stats every 30 seconds
setInterval(() => {
    console.log('Pool:', {
        active: pool._allConnections.length,
        free: pool._freeConnections.length,
        queued: pool._connectionQueue.length
    });
}, 30000);

Fix: Increase connection limit to 50-100 for production. And always set query timeouts - I've seen queries hang for hours.

Q

My app works fine locally but crashes in Docker - what's different?

A

Memory limits. Your local machine has 16GB RAM, your container has 512MB. Docker kills processes that exceed memory limits.

Debug memory in Docker:

## Add memory monitoring to your container
FROM node:20-alpine
## ... your setup
CMD ["node", "--max-old-space-size=400", "app.js"]

Check Docker memory limits:

docker stats your-container-name
## Shows actual memory usage vs limits

Also check if you're using --max-old-space-size - set it to 80% of container memory.

Q

How do I debug which dependency is causing the memory leak?

A

Use clinic doctor to generate a flame graph, then look for suspicious patterns:

Steps that actually work:

  1. Install clinic: npm install -g @nodejs/clinic
  2. Run: clinic doctor -- node app.js
  3. Generate real load (not toy requests)
  4. Let it run for 10+ minutes
  5. Kill with Ctrl+C
  6. Open the HTML report

Look for functions consuming disproportionate memory. Usually it's:

  • Caching libraries (Redis client, memory caches)
  • Database connection libraries
  • WebSocket libraries
  • File upload handlers
Q

My Node.js process keeps getting killed in production with no error logs - why?

A

The OS is killing your process due to OOM (Out of Memory). This happens before Node.js can log anything.

Check system logs:

## Linux
dmesg | grep -i "killed process"
journalctl -u your-service-name | grep -i oom

## Shows which process got killed and why

Prevention:

// Monitor memory and restart before OOM kill
const memoryMonitor = () => {
    const usage = process.memoryUsage();
    const heapUsedMB = Math.round(usage.heapUsed / 1024 / 1024);
    
    // Restart at 80% of container memory limit
    if (heapUsedMB > 1600) { // 80% of 2GB
        console.error(`Memory usage too high: ${heapUsedMB}MB`);
        process.exit(1); // PM2 or Docker will restart
    }
};

setInterval(memoryMonitor, 30000);
Q

How do I debug performance issues without taking the app offline?

A

Use 0x profiler in production. It has minimal performance impact and shows real bottlenecks:

## Install globally
npm install -g 0x

## Profile production app (low overhead)
0x --open -o /tmp/profile app.js

## Generates flamegraph after you kill the process
## Shows exactly what functions are slow

For memory issues, use --collect-memory-profile:

0x --collect-memory-profile app.js
Q

My app randomly stops responding for 30+ seconds then recovers - what causes this?

A

Garbage Collection pauses. When your heap gets large (>1GB), V8's garbage collector can pause the entire application for seconds.

Check GC stats:

node --trace-gc --trace-gc-verbose app.js
## Shows GC pause times - anything >100ms is problematic

Solutions:

  • Reduce memory usage (fix memory leaks)
  • Use streaming for large data processing
  • Implement request timeouts (clients give up waiting)
  • Consider cluster mode to distribute load
Q

How do I know if my Node.js app is CPU-bound or I/O-bound?

A

Simple test:

const { performance } = require('perf_hooks');

// Check event loop utilization
setInterval(() => {
    const usage = process.cpuUsage();
    const eluUsage = performance.eventLoopUtilization();
    
    console.log({
        cpu: {
            user: usage.user / 1000, // Convert to ms
            system: usage.system / 1000
        },
        eventLoop: {
            active: (eluUsage.active / eluUsage.idle * 100).toFixed(2) + '%'
        }
    });
}, 5000);

CPU-bound signs:

  • Event loop utilization > 80%
  • High user CPU time
  • Slow response times under load

I/O-bound signs:

  • Low CPU usage but slow responses
  • Database/API timeouts
  • High system CPU time
Q

My logs show "ECONNRESET" and "EPIPE" errors - what are these?

A

ECONNRESET: Client disconnected before the server finished responding. Usually not your fault - users close browsers, mobile connections drop, load balancers timeout.

EPIPE: You tried to write to a closed connection. Usually follows ECONNRESET.

Handle gracefully:

app.get('/slow-endpoint', (req, res) => {
    // Check if connection is still alive
    req.on('close', () => {
        console.log('Client disconnected, stopping work...');
        // Stop any ongoing processing
    });
    
    // Your slow processing here
    doSlowWork()
        .then(result => {
            if (!res.headersSent) {
                res.json(result);
            }
        })
        .catch(error => {
            if (!res.headersSent) {
                res.status(500).json({ error: error.message });
            }
        });
});

Don't log these as errors - they're normal in production. Log them as warnings or ignore entirely.

Monitoring and Alerting: Catch Problems Before Users Do

Production Monitoring That Actually Works

Most Node.js monitoring is garbage - generic dashboards that tell you your app is slow after users already hate you. You need specific metrics that catch problems before they become disasters.

Production Node.js Monitoring Dashboard

Critical Metrics to Monitor

Memory Metrics:

// Custom monitoring that actually helps
const monitorHealth = () => {
    const mem = process.memoryUsage();
    const cpu = process.cpuUsage();
    const eventLoopLag = process.hrtime.bigint();
    
    // Track memory growth rate
    const heapUsedMB = Math.round(mem.heapUsed / 1024 / 1024);
    const rssUsedMB = Math.round(mem.rss / 1024 / 1024);
    
    // Alert thresholds
    const alerts = [];
    
    if (heapUsedMB > 1600) { // 80% of 2GB heap limit
        alerts.push({ type: 'MEMORY_HIGH', heap: heapUsedMB });
    }
    
    if (rssUsedMB > 1800) { // Near container limits
        alerts.push({ type: 'RSS_HIGH', rss: rssUsedMB });
    }
    
    // Track memory growth rate over time
    const now = Date.now();
    const timeDelta = now - (global.lastMemCheck || now);
    const memDelta = heapUsedMB - (global.lastHeapUsed || heapUsedMB);
    
    if (timeDelta > 0) {
        const growthRate = memDelta / (timeDelta / 1000); // MB per second
        
        if (growthRate > 5) { // Growing >5MB/second
            alerts.push({ 
                type: 'MEMORY_LEAK', 
                growthRate: growthRate.toFixed(2) 
            });
        }
    }
    
    global.lastMemCheck = now;
    global.lastHeapUsed = heapUsedMB;
    
    return {
        memory: {
            heap: heapUsedMB,
            rss: rssUsedMB,
            external: Math.round(mem.external / 1024 / 1024)
        },
        cpu: {
            user: Math.round(cpu.user / 1000), // Convert to ms
            system: Math.round(cpu.system / 1000)
        },
        alerts
    };
};

// Check health every 30 seconds
setInterval(() => {
    const health = monitorHealth();
    
    if (health.alerts.length > 0) {
        console.error('HEALTH ALERTS:', health.alerts);
        
        // Send to your alerting system (Slack, PagerDuty, etc.)
        health.alerts.forEach(alert => sendAlert(alert));
    }
    
    // Log metrics for dashboards
    console.log('Health:', JSON.stringify(health, null, 2));
}, 30000);

Database Connection Monitoring

Database issues kill more Node.js apps than bad code. Monitor pool health aggressively:

// Monitor database pool health
const monitorDbPool = (pool) => {
    const stats = {
        total: pool.config.connectionLimit,
        active: pool._allConnections ? pool._allConnections.length : 0,
        free: pool._freeConnections ? pool._freeConnections.length : 0,
        queued: pool._connectionQueue ? pool._connectionQueue.length : 0
    };
    
    // Calculate utilization percentage
    stats.utilization = ((stats.active / stats.total) * 100).toFixed(1);
    
    // Alert if pool is stressed
    const alerts = [];
    
    if (stats.utilization > 80) {
        alerts.push({
            type: 'DB_POOL_HIGH',
            utilization: stats.utilization,
            queued: stats.queued
        });
    }
    
    if (stats.queued > 10) {
        alerts.push({
            type: 'DB_QUEUE_BACKLOG',
            queued: stats.queued
        });
    }
    
    return { stats, alerts };
};

// Monitor every minute
setInterval(() => {
    const dbHealth = monitorDbPool(pool);
    
    if (dbHealth.alerts.length > 0) {
        console.error('DB ALERTS:', dbHealth.alerts);
        dbHealth.alerts.forEach(alert => sendAlert(alert));
    }
    
    console.log('DB Pool:', dbHealth.stats);
}, 60000);

Event Loop Monitoring

The event loop is your app's heartbeat. When it stops, everything dies:

// Event loop lag monitoring
const monitorEventLoop = () => {
    let previousHrtime = process.hrtime.bigint();
    
    setInterval(() => {
        const currentHrtime = process.hrtime.bigint();
        const delta = Number(currentHrtime - previousHrtime);
        const lag = Math.max(0, delta - 1000000000); // Expected 1 second
        const lagMs = lag / 1000000; // Convert to milliseconds
        
        previousHrtime = currentHrtime;
        
        if (lagMs > 100) { // Event loop lag > 100ms
            console.warn(`Event loop lag: ${lagMs.toFixed(2)}ms`);
            
            // Critical lag alerts
            if (lagMs > 1000) {
                sendAlert({
                    type: 'EVENT_LOOP_BLOCKED',
                    lagMs: lagMs.toFixed(2)
                });
            }
        }
        
        // Track event loop utilization
        const elu = performance.eventLoopUtilization();
        const utilization = (elu.active / (elu.active + elu.idle)) * 100;
        
        if (utilization > 90) {
            sendAlert({
                type: 'EVENT_LOOP_SATURATED',
                utilization: utilization.toFixed(1)
            });
        }
        
    }, 1000);
};

monitorEventLoop();

Performance Debugging in Production

Debugging performance issues in production without killing your app requires the right tools and techniques.

Using 0x Profiler Safely

0x is the only profiler I trust in production. It has low overhead and shows real bottlenecks:

## Install globally
npm install -g 0x

## Profile with minimal impact (flame graphs)
0x --open -o ./profiles app.js

## Profile memory allocation patterns
0x --collect-memory-profile app.js

## Profile for specific duration then auto-stop
timeout 300 0x app.js  # Profile for 5 minutes then stop

The flame graphs it generates show exactly which functions consume CPU time. Look for:

  • Wide bars (functions that take lots of CPU time)
  • Deep stacks (excessive function call depth)
  • Red sections (hot paths that need optimization)

Request-Level Debugging

Track slow requests to identify bottlenecks:

// Request performance tracking
app.use((req, res, next) => {
    const startTime = process.hrtime.bigint();
    const startMemory = process.memoryUsage().heapUsed;
    
    // Track when request finishes
    res.on('finish', () => {
        const endTime = process.hrtime.bigint();
        const endMemory = process.memoryUsage().heapUsed;
        
        const duration = Number(endTime - startTime) / 1000000; // Convert to ms
        const memoryDelta = endMemory - startMemory;
        
        const logData = {
            method: req.method,
            url: req.url,
            statusCode: res.statusCode,
            duration: Math.round(duration),
            memory: Math.round(memoryDelta / 1024), // KB
            userAgent: req.get('User-Agent'),
            ip: req.ip
        };
        
        // Log slow requests
        if (duration > 1000) { // >1 second
            console.warn('SLOW REQUEST:', logData);
        }
        
        // Log memory-heavy requests
        if (memoryDelta > 50 * 1024 * 1024) { // >50MB
            console.warn('MEMORY-HEAVY REQUEST:', logData);
        }
        
        // Normal request logging
        console.log('REQUEST:', logData);
    });
    
    next();
});

Error Handling and Recovery Patterns

Production Node.js apps crash. The key is recovering gracefully and learning from failures.

Graceful Degradation

When external services fail, degrade gracefully instead of crashing:

// Circuit breaker pattern for external services
class CircuitBreaker {
    constructor(service, threshold = 5, resetTime = 60000) {
        this.service = service;
        this.failureThreshold = threshold;
        this.resetTime = resetTime;
        this.failureCount = 0;
        this.lastFailureTime = null;
        this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    }
    
    async call(request) {
        if (this.state === 'OPEN') {
            const timeSinceLastFailure = Date.now() - this.lastFailureTime;
            
            if (timeSinceLastFailure >= this.resetTime) {
                this.state = 'HALF_OPEN';
                this.failureCount = 0;
            } else {
                throw new Error('Circuit breaker is OPEN');
            }
        }
        
        try {
            const result = await this.service(request);
            
            // Success - reset failure count
            if (this.state === 'HALF_OPEN') {
                this.state = 'CLOSED';
            }
            this.failureCount = 0;
            
            return result;
        } catch (error) {
            this.failureCount++;
            this.lastFailureTime = Date.now();
            
            if (this.failureCount >= this.failureThreshold) {
                this.state = 'OPEN';
                console.error(`Circuit breaker opened after ${this.failureCount} failures`);
            }
            
            throw error;
        }
    }
}

// Usage with external API
const apiCircuitBreaker = new CircuitBreaker(
    async (request) => {
        const response = await fetch(request.url, {
            timeout: 5000, // 5 second timeout
            signal: AbortSignal.timeout(5000)
        });
        
        if (!response.ok) {
            throw new Error(`API error: ${response.status}`);
        }
        
        return response.json();
    },
    3, // Open after 3 failures
    30000 // Reset after 30 seconds
);

// Use with fallback
app.get('/api/user-data/:id', async (req, res) => {
    try {
        const userData = await apiCircuitBreaker.call({
            url: `${process.env.API_URL}/users/${req.params.id}`
        });
        
        res.json(userData);
    } catch (error) {
        console.warn('External API failed, using cached data:', error.message);
        
        // Fallback to cached data
        const cachedData = await getCachedUserData(req.params.id);
        
        if (cachedData) {
            res.json({
                ...cachedData,
                _cached: true,
                _warning: 'Using cached data due to service unavailability'
            });
        } else {
            res.status(503).json({
                error: 'User data temporarily unavailable',
                retryAfter: 30
            });
        }
    }
});

Centralized Error Reporting

Never debug production issues without centralized error reporting:

// Production error reporting
const reportError = (error, context = {}) => {
    const errorData = {
        message: error.message,
        stack: error.stack,
        timestamp: new Date().toISOString(),
        nodeVersion: process.version,
        pid: process.pid,
        memory: process.memoryUsage(),
        uptime: process.uptime(),
        ...context
    };
    
    // Log locally
    console.error('ERROR REPORTED:', JSON.stringify(errorData, null, 2));
    
    // Send to external service (Sentry, Bugsnag, etc.)
    try {
        // Replace with your error reporting service
        sendToErrorService(errorData);
    } catch (reportingError) {
        console.error('Failed to report error:', reportingError);
    }
};

// Global error handlers
process.on('uncaughtException', (error) => {
    reportError(error, {
        type: 'uncaughtException',
        fatal: true
    });
    
    // Graceful shutdown
    process.exit(1);
});

process.on('unhandledRejection', (reason, promise) => {
    reportError(reason, {
        type: 'unhandledRejection',
        promise: promise.toString(),
        fatal: false
    });
});

// Request error handling
app.use((error, req, res, next) => {
    reportError(error, {
        type: 'requestError',
        method: req.method,
        url: req.url,
        userAgent: req.get('User-Agent'),
        ip: req.ip,
        body: req.body
    });
    
    // Don't expose internal errors to users
    if (process.env.NODE_ENV === 'production') {
        res.status(500).json({
            error: 'Internal server error',
            requestId: req.id // Include request ID for support
        });
    } else {
        res.status(500).json({
            error: error.message,
            stack: error.stack
        });
    }
});

Production Node.js troubleshooting is 80% preparation and 20% panic. Set up monitoring, error reporting, and health checks before you need them. When your app crashes at 3AM, you'll thank yourself for the preparation.

Debugging Tools Comparison: What Works in Production vs What Doesn't

Tool

Good For

Sucks At

Production Safe?

Learning Curve

Clinic.js

Memory leaks, event loop analysis

Real-time debugging

✅ Yes

Easy

0x Profiler

CPU bottlenecks, flame graphs

Memory analysis

✅ Yes (low overhead)

Moderate

Node.js --inspect

Step debugging, heap snapshots

Production use

❌ No (blocks app)

Hard

PM2 Monit

Process monitoring, auto-restart

Deep debugging

✅ Yes

Easy

Chrome DevTools

Local development

Production debugging

❌ No

Moderate

New Relic

APM monitoring, alerts

Detailed profiling

✅ Yes ($$)

Easy

Essential Production Troubleshooting Resources

Related Tools & Recommendations

tool
Similar content

Node.js Memory Leaks & Debugging: Stop App Crashes

Learn to identify and debug Node.js memory leaks, prevent 'heap out of memory' errors, and keep your applications stable. Explore common patterns, tools, and re

Node.js
/tool/node.js/debugging-memory-leaks
100%
tool
Similar content

Node.js Production Deployment - How to Not Get Paged at 3AM

Optimize Node.js production deployment to prevent outages. Learn common pitfalls, PM2 clustering, troubleshooting FAQs, and effective monitoring for robust Node

Node.js
/tool/node.js/production-deployment
94%
tool
Similar content

React Production Debugging: Fix App Crashes & White Screens

Five ways React apps crash in production that'll make you question your life choices.

React
/tool/react/debugging-production-issues
76%
tool
Similar content

TaxBit Enterprise Production Troubleshooting: Debug & Fix Issues

Real errors, working fixes, and why your monitoring needs to catch these before 3AM calls

TaxBit Enterprise
/tool/taxbit-enterprise/production-troubleshooting
74%
tool
Similar content

Docker: Package Code, Run Anywhere - Fix 'Works on My Machine'

No more "works on my machine" excuses. Docker packages your app with everything it needs so it runs the same on your laptop, staging, and prod.

Docker Engine
/tool/docker/overview
69%
troubleshoot
Similar content

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

Production-tested solutions for MongoDB topology errors that break Node.js apps and kill database connections

MongoDB
/troubleshoot/mongodb-topology-closed/connection-pool-exhaustion-solutions
69%
tool
Similar content

Node.js Microservices: Avoid Pitfalls & Build Robust Systems

Learn why Node.js microservices projects often fail and discover practical strategies to build robust, scalable distributed systems. Avoid common pitfalls and e

Node.js
/tool/node.js/microservices-architecture
67%
integration
Similar content

MongoDB Express Mongoose Production: Deployment & Troubleshooting

Deploy Without Breaking Everything (Again)

MongoDB
/integration/mongodb-express-mongoose/production-deployment-guide
67%
tool
Similar content

Debugging Broken Truffle Projects: Emergency Fix Guide

Debugging Broken Truffle Projects - Emergency Guide

Truffle Suite
/tool/truffle/debugging-broken-projects
67%
tool
Similar content

Fix TaxAct Errors: Login, WebView2, E-file & State Rejection Guide

The 3am tax deadline debugging guide for login crashes, WebView2 errors, and all the shit that goes wrong when you need it to work

TaxAct
/tool/taxact/troubleshooting-guide
65%
tool
Similar content

Express.js Production Guide: Optimize Performance & Prevent Crashes

I've debugged enough production fires to know what actually breaks (and how to fix it)

Express.js
/tool/express/production-optimization-guide
65%
tool
Similar content

Express.js - The Web Framework Nobody Wants to Replace

It's ugly, old, and everyone still uses it

Express.js
/tool/express/overview
65%
tool
Similar content

Node.js Security Hardening Guide: Protect Your Apps

Master Node.js security hardening. Learn to manage npm dependencies, fix vulnerabilities, implement secure authentication, HTTPS, and input validation.

Node.js
/tool/node.js/security-hardening
63%
tool
Similar content

Node.js Docker Containerization: Setup, Optimization & Production Guide

Master Node.js Docker containerization with this comprehensive guide. Learn why Docker matters, optimize your builds, and implement advanced patterns for robust

Node.js
/tool/node.js/docker-containerization
63%
tool
Similar content

Node.js Performance Optimization: Boost App Speed & Scale

Master Node.js performance optimization techniques. Learn to speed up your V8 engine, effectively use clustering & worker threads, and scale your applications e

Node.js
/tool/node.js/performance-optimization
63%
tool
Similar content

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Explore PostgreSQL's advantages over other databases, dive into real-world production horror stories, solutions for common issues, and expert debugging tips.

PostgreSQL
/tool/postgresql/overview
61%
howto
Similar content

Install Node.js & NVM on Mac M1/M2/M3: A Complete Guide

My M1 Mac setup broke at 2am before a deployment. Here's how I fixed it so you don't have to suffer.

Node Version Manager (NVM)
/howto/install-nodejs-nvm-mac-m1/complete-installation-guide
58%
tool
Similar content

Webpack: The Build Tool You'll Love to Hate & Still Use in 2025

Explore Webpack, the JavaScript build tool. Understand its powerful features, module system, and why it remains a core part of modern web development workflows.

Webpack
/tool/webpack/overview
58%
tool
Similar content

Arbitrum Production Debugging: Fix Gas & WASM Errors in Live Dapps

Real debugging for developers who've been burned by production failures

Arbitrum SDK
/tool/arbitrum-development-tools/production-debugging-guide
58%
tool
Similar content

LM Studio Performance: Fix Crashes & Speed Up Local AI

Stop fighting memory crashes and thermal throttling. Here's how to make LM Studio actually work on real hardware.

LM Studio
/tool/lm-studio/performance-optimization
54%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization