Performance Scaling Reality Check: What Actually Works

Strategy

Best For

Performance Reality

Memory Trade-off

Pain Level

When I Use It

Clustering

I/O-heavy APIs

Usually 2-3x better, depends on your bottlenecks

Each process needs its own memory

Medium

  • shared state is a nightmare

REST APIs when single core maxes out

Worker Threads

CPU-heavy stuff

Stops main thread from freezing

Memory sharing gets weird fast

High

  • debugging sucks

Image processing, when users complain about freezing

Child Processes

External tools

Completely isolated, can't break your app

Each process is expensive

Low if you keep it simple

PDF generation, calling Python scripts

PM2 Clustering

Production apps

Like clustering but PM2 handles the restart hell

PM2 overhead plus your app memory

Low

  • PM2 does the heavy lifting

When I need zero-downtime deploys

V8 Tuning

GC problems

Sometimes amazing, sometimes makes things worse

Depends on the flags you use

Medium

  • lots of trial and error

When profiling shows GC pauses

Making V8 Stop Being a Dick About Performance

Node.js Performance Architecture

V8 Engine Optimization: The Foundation Layer

Most Node.js performance advice stops at "use clustering" and "avoid blocking the event loop." Complete bullshit. The real performance gains come from understanding how V8's garbage collector works and why it hates your code.

Node.js 22 Actually Matters Now (But You're Probably Still on 18)

Node.js 22 has some decent performance improvements, but most teams are still on Node 18 because upgrading breaks weird shit nobody wants to debug:

  • Maglev compiler enabled by default: I saw 15-20% faster startup times for CLI apps, though your mileage may vary. Basically a mid-tier compiler that sits between the interpreter and the full optimizer
  • Stream High Water Mark bumped from 16KiB to 64KiB: Helps with file streaming but breaks some legacy code that relied on the old buffer size. Classic Node.js breaking changes for marginal gains
  • AbortSignal performance improvements: fetch() operations got faster, which is nice if you're migrating from axios. Less overhead for request cancellation
  • Built-in WebSocket client: Finally. No more installing the ws package for simple WebSocket clients. Only took them 14 years to add this basic feature

V8 Memory Management Reality Check

V8's garbage collector uses a generational approach with two main spaces:

  • New Space (Young Generation): Short-lived objects, defaults to a measly 8-32MB
  • Old Space: Long-lived objects, ~1.4GB default limit before your app dies with "JavaScript heap out of memory"

Here's what actually happens: Your Node.js app creates objects like crazy (every JSON.parse, every array operation, every damn string concatenation). V8's garbage collector drives me insane - it holds onto objects way too long, then dumps everything at once and freezes your app for 100-500ms while users rage-quit.

I learned this the hard way when our API started randomly freezing for half a second during high traffic. The logs showed no errors, CPU usage looked normal, but response times would spike from 50ms to 800ms randomly. Spent a weekend reading V8 internals docs trying to figure out why our API was randomly freezing. Turns out V8 was doing full GC sweeps every 30 seconds, and during Black Friday traffic that meant 2,000 users got timeout errors every time it happened.

The Fix That Actually Worked: Give V8 More Young Generation Space

After some random blog post mentioned tuning the young generation size, I tried it on our staging server. Results were surprisingly good:

## Default: Your app freezes during GC
node app.js

## Better: Give V8 more young generation space
node --max-semi-space-size=64 app.js

Result: GC pauses dropped from 500ms to ~150ms in our case. Used maybe 10% more memory. Totally worth it to stop user complaints about random freezing.

Clustering vs Worker Threads: Stop Overthinking This Shit

Use Clustering When you want to handle more requests at the same time. Your API endpoints are fast but you max out at like 2,000 req/sec because you're only using one CPU core.

Use Worker Threads When some asshole uploads a 50MB file and locks up your entire API for 30 seconds. Or someone clicks "export all data" and your event loop dies processing 100,000 rows.

Some Numbers I Actually Tested:

Matrix multiplication test (because why not):

  • Single-threaded: 359ms and everything else waits like a chump
  • Clustering (4 cores): 197ms and other requests still work
  • Worker Threads: 219ms but your API stays responsive

The rule: Clustering = more capacity, Worker Threads = prevent blocking. That's it.

V8 Tuning Flags That Don't Make Things Worse

Most V8 tuning guides are cargo cult bullshit from 2018. Here are the flags that actually helped in production without breaking anything:

Memory-Constrained Environments:

node --optimize-for-size --max-old-space-size=512 app.js
  • Reduces memory usage by 20-30%
  • May decrease performance by 10-15%
  • Perfect for containers with strict memory limits

High-Throughput Services:

node --max-semi-space-size=64 --max-old-space-size=4096 app.js
  • Increases Young Generation size (faster GC for high-allocation apps)
  • Sets explicit Old Space limit (prevents memory runaway)
  • Improves GC performance for applications processing large amounts of data

CPU-Intensive Applications:

node --optimize-for-size=false --max-semi-space-size=128 app.js
  • Prioritizes performance over memory usage
  • Larger Young Generation reduces promotion to Old Space
  • Better for applications doing heavy computation

Don't Touch These Flags:

  • --gc-interval=100: Overrides V8's smart GC scheduling and usually makes things worse
  • --expose-gc: Only for debugging, creates security risks in production
  • --max-executable-size: Usually makes performance worse by limiting JIT compiler optimizations

V8 engineers are smarter than you. Don't override their defaults unless you have a specific problem.

Stream Performance Optimization

Node.js 22 increased the High Water Mark from 16KiB to 64KiB, which makes file streaming faster. You can tune it further:

For High-Throughput File Operations:

const stream = fs.createReadStream('largefile.txt', {
  highWaterMark: 256 * 1024  // 256KB chunks instead of 64KB
});

For Memory-Constrained Environments:

const stream = fs.createReadStream('file.txt', {
  highWaterMark: 8 * 1024  // 8KB chunks to reduce memory usage
});

Performance Impact: Larger chunks reduce system calls but use more memory. For file uploads and downloads, 256KB-1MB chunks typically provide optimal throughput.

The Event Loop Lag Problem

Node.js Event Loop

Event loop lag is the hidden killer of Node.js performance. Even 10ms of lag makes your app feel slow, and users start complaining that your site is "broken."

Measuring Event Loop Lag:

const { performance } = require('perf_hooks');

function measureEventLoopLag() {
  const start = performance.now();
  setImmediate(() => {
    const lag = performance.now() - start;
    console.log(`Event loop lag: ${lag.toFixed(2)}ms`);
  });
}

setInterval(measureEventLoopLag, 5000);

Acceptable Lag Thresholds:

  • < 10ms: Excellent performance
  • 10-50ms: Acceptable for most applications
  • 50-100ms: Noticeable slowness, investigate
  • > 100ms: Unacceptable, users will complain

Common Causes and Fixes:

  • JSON.parse() on large objects: Use streaming parsers or Worker Threads
  • Synchronous crypto operations: Switch to async versions
  • Heavy regex operations: Pre-compile regex, consider native modules
  • Large array operations: Process in chunks using setImmediate() to yield

HTTP/2 and Connection Optimization

Node.js HTTP/2

Node.js built-in HTTP/2 support can improve performance for API-heavy applications, assuming you set it up right and your clients actually support it:

const http2 = require('http2');
const fs = require('fs');

const server = http2.createSecureServer({
  key: fs.readFileSync('server-key.pem'),
  cert: fs.readFileSync('server-cert.pem')
});

server.on('stream', (stream, headers) => {
  // Handle requests with automatic multiplexing
  stream.respond({ ':status': 200 });
  stream.end('Hello HTTP/2!');
});

HTTP/2 Benefits:

  • Multiplexing: Multiple requests over single connection (no more 6-connection limit bullshit)
  • Header compression: Reduces bandwidth usage
  • Server push: Send resources before requested (barely anyone uses this)
  • Binary protocol: More efficient than HTTP/1.1 text

Real-world performance: About 20-30% improvement in page load times for applications making multiple API calls. Single API calls might actually be slower due to overhead.

Database Connection Optimization

Database connections are usually your actual bottleneck, not Node.js itself. Shitty connection pooling will kill your performance faster than any V8 tuning:

PostgreSQL Optimization:

const { Pool } = require('pg');

const pool = new Pool({
  max: 20,                    // Maximum connections
  idleTimeoutMillis: 30000,   // Close idle connections
  connectionTimeoutMillis: 2000,  // Fail fast on connection issues
  maxUses: 7500,             // Rotate connections to prevent leaks
});

MongoDB Optimization:

const { MongoClient } = require('mongodb');

const client = new MongoClient(uri, {
  maxPoolSize: 10,           // Maximum connections
  serverSelectionTimeoutMS: 5000,  // Fail fast
  socketTimeoutMS: 45000,    // Socket timeout
  maxIdleTimeMS: 30000,      // Close idle connections
});

Connection Pool Sizing Formula:

Pool Size = (CPU Cores × 2) + Disk Count

For most web applications: 8-20 connections per application instance works well without overwhelming your database. Start with 10 and adjust based on your actual traffic patterns.

This foundational performance work enables your clustering and scaling strategies to actually work. Without proper V8 tuning and connection optimization, adding more processes just multiplies the inefficiencies.

Performance Questions That Ruin Your Weekend

Q

Why is my clustered Node.js app slower than the single-process version?

A

Because clustering isn't magic, and you probably fucked up the shared state. Here's what's happening:Clustered processes can't share memory, so your in-memory sessions/cache? Each process starts with nothing. Your database connection pool sized for 10 connections? Now you need 80 connections across 8 processes, and your database is telling them all to fuck off.What actually works: External session storage (Redis), size your connection pools correctly (pool_size / cluster_count), and if you need sticky sessions, configure your load balancer properly.

Q

How do I know if I should use clustering or worker threads?

A

Run this test: Add a CPU-intensive task to one of your endpoints (like calculating fibonacci(40)) and make concurrent requests. If other API endpoints become unresponsive, use worker threads to isolate CPU work. If your endpoints stay responsive but throughput is low, use clustering to scale across cores.The rule: Worker threads prevent blocking, clustering increases capacity.

Q

PM2 clustering vs Node.js cluster module - which one won't make me want to quit?

A

PM2 does all the annoying shit for you

  • restarts crashed processes, zero-downtime deployments, monitoring that actually works.

The built-in cluster module just spawns processes and leaves you to figure out what happens when they die (spoiler: your app goes down).

With native clustering, you get to write your own restart logic, health checks, and monitoring. Fun way to spend your weekend debugging why process 3 died silently at 2am.Reality check: Use PM2 unless you have very specific needs or enjoy pain.

Q

Why does my app use 100% CPU but handle few requests?

A

Event loop is blocked by synchronous operations. Most common culprits:

  • JSON.parse() on huge payloads (>1MB)
  • Complex regex patterns causing catastrophic backtracking
  • Synchronous file operations (fs.readFileSync)
  • Heavy computation on main thread

Use node --prof app.js to generate a CPU profile and find the hot functions.

Q

My app's memory keeps creeping up over days. Is this a leak or just V8 being weird?

A

Probably V8 being lazy. V8 would rather allocate new memory than clean up old shit from Old Space, so memory can climb for days before garbage collection kicks in.

How to tell if you're actually fucked:

  1. Restart the app - if memory starts low again, it's probably just V8 being V8
  2. Watch process.memoryUsage() - track RSS vs HeapUsed over time
  3. RSS growing but HeapUsed flat? Native memory leak (buffers, database connections not closing)
  4. Both climbing steadily? You have a JavaScript object leak and need to find what's holding references
Q

Which V8 flags won't make my app worse?

A

Most V8 flags are cargo cult bullshit, but some production teams have found a few that actually help:

  • --max-semi-space-size=64: Helped our GC pauses, but uses more memory. Test it.
  • --optimize-for-size: Saves memory but made some operations slower. Good for containers with tight memory limits.
  • --max-old-space-size=4096: Prevents the dreaded "JavaScript heap out of memory" crash

Don't Touch These: --gc-interval, --always-opt, --no-opt - they override V8's optimizations and usually make things worse. V8 engineers are smarter than you.

Q

Lambda vs traditional servers - completely different performance rules?

A

Yep, Lambda optimization is basically the opposite of server optimization.

Lambda weirdness:

  • Keep imports/connections outside your handler - Lambda reuses containers sometimes
  • Use --max-old-space-size=512 or whatever matches your Lambda memory setting
  • Don't cluster - Lambda does the scaling, you just pay for it
  • Pre-warm database connections in global scope because cold starts suck

Traditional servers:

  • Cluster everything to use all your CPU cores
  • Bigger heap sizes (--max-old-space-size=4096+) since you're not paying per MB
  • Connection pools with keep-alive settings that make sense for long-running processes
  • Optimize for staying alive for days/weeks, not milliseconds
Q

Why is my Node.js app slower after enabling HTTP/2?

A

HTTP/2 has overhead for single requests but improves performance for multiple concurrent requests. If your app serves individual API calls, HTTP/1.1 might be faster. HTTP/2 shines when clients make many parallel requests (like loading a dashboard with multiple API calls).

Also check TLS certificate performance - HTTP/2 requires HTTPS, and certificate negotiation can add latency.

Q

When should I use microservices vs monolith for Node.js performance?

A

Monolith advantages: Shared connection pools, easier debugging, no network latency between services, simpler deployment.

Microservices advantages: Independent scaling, language diversity, fault isolation.

Performance rule: If your bottleneck is CPU-bound work, microservices let you scale specific services. If your bottleneck is I/O (database, external APIs), monoliths often perform better due to shared connection pools and reduced network overhead.

Q

How do I profile Node.js performance in production safely?

A

Safe profiling methods:

  1. Sampling profiler: node --prof app.js (minimal overhead)
  2. APM tools: New Relic, DataDog (designed for production)
  3. Memory snapshots: Generate on-demand with process.writeHeapSnapshot()
  4. Event loop lag monitoring: Custom metrics in your application

Dangerous methods:

  • --inspect (opens debugging port, security risk)
  • Continuous heap snapshots (high memory usage)
  • --trace-gc (performance impact)

Always profile in a staging environment that mirrors production load first.

Scaling Node.js Without Losing Your Mind (Or Your Weekend)

Node.js Scaling Challenges

The Scaling Shitshow: How Your App Growth Actually Goes

Your Node.js scaling journey isn't a neat progression through stages. It's more like stumbling from one fire to the next, fixing bottlenecks as they appear.

The "Everything Works Fine" Phase
Your app handles dozens of users beautifully. Single process, single server, life is good. You're mostly worried about not blocking the event loop and making sure your database queries don't suck.

The "Oh Fuck, We're Popular" Phase
Traffic hits 500 concurrent users and your single CPU core is screaming at 100%. Your API response times jump from 50ms to 5 seconds. Users are leaving angry comments about the site being "broken." Time to learn clustering with PM2 or spend the weekend manually restarting your server every hour. This is where most teams discover the C10K problem and learn why single-threaded event loops have limits.

The "Some Requests Kill Everything" Phase
Your API is fast except when someone uploads a 50MB image and your entire API dies for 30 seconds with Error: JavaScript heap out of memory. Or that one user clicks the "export all data" button and locks up your event loop processing a 100,000-row CSV. I learned this the hard way when a single PDF processing request brought down our entire checkout flow for 4 minutes. Time to isolate the problem children with Worker Threads so they stop murdering your main process. This phase teaches you about memory management, streaming large data, and why background job processing exists.

The "We Need More Servers" Phase
One server isn't enough anymore. We brought down production for 2 hours because the connection pool ran out and every request started hanging with Error: TimeoutError: Request timed out after 5000ms. Welcome to load balancers, database connection limits, shared state nightmares, and all the fun of distributed systems. This is when you learn about CAP theorem, session management, and why stateless applications matter.

Clustering Tricks That Actually Help

Node.js Clustering

Separating Fast and Slow Requests

Don't cluster everything the same way. I learned this after our image upload endpoints kept starving our API endpoints of resources.

// ecosystem.config.js for PM2
module.exports = {
  apps: [
    {
      name: 'api-light',
      script: 'server.js',
      instances: 4,           // More processes for light I/O
      env: {
        PORT: 3000,
        WORKER_TYPE: 'api'
      }
    },
    {
      name: 'cpu-heavy',
      script: 'server.js', 
      instances: 2,           // Fewer processes for CPU work
      env: {
        PORT: 3001,
        WORKER_TYPE: 'compute'
      }
    }
  ]
};

Your load balancer routes /api/* to port 3000 (fast stuff), /compute/* to port 3001 (slow stuff).

What actually happened: Our API response times got much more consistent. Before, a heavy report request would slow down everything else for 30 seconds. After splitting them up, fast requests stayed fast. Your mileage will vary based on your actual request patterns.

Pattern 2: Dynamic Worker Thread Pools

Create specialized worker pools for different CPU-intensive tasks (this code is complex as hell but it works):

const { Worker, isMainThread, parentPort } = require('worker_threads');
const os = require('os');

class WorkerPool {
  constructor(workerScript, poolSize = os.cpus().length) {
    this.workers = [];
    this.queue = [];
    
    for (let i = 0; i < poolSize; i++) {
      this.createWorker(workerScript);
    }
  }
  
  createWorker(script) {
    const worker = new Worker(script);
    worker.isReady = true;
    
    worker.on('message', (result) => {
      worker.isReady = true;
      worker.resolve(result);
      this.processQueue();
    });
    
    worker.on('error', (error) => {
      worker.isReady = true;
      worker.reject(error);
      this.processQueue();
    });
    
    this.workers.push(worker);
  }
  
  execute(data) {
    return new Promise((resolve, reject) => {
      const availableWorker = this.workers.find(w => w.isReady);
      
      if (availableWorker) {
        availableWorker.isReady = false;
        availableWorker.resolve = resolve;
        availableWorker.reject = reject;
        availableWorker.postMessage(data);
      } else {
        this.queue.push({ data, resolve, reject });
      }
    });
  }
  
  processQueue() {
    if (this.queue.length === 0) return;
    
    const availableWorker = this.workers.find(w => w.isReady);
    if (availableWorker) {
      const { data, resolve, reject } = this.queue.shift();
      availableWorker.isReady = false;
      availableWorker.resolve = resolve;
      availableWorker.reject = reject;
      availableWorker.postMessage(data);
    }
  }
}

// Usage with different worker pools for different tasks
const imagePool = new WorkerPool('./workers/image-processor.js', 2);
const cryptoPool = new WorkerPool('./workers/crypto-worker.js', 4);

app.post('/resize-image', async (req, res) => {
  const result = await imagePool.execute({ 
    image: req.file.buffer, 
    width: 800, 
    height: 600 
  });
  res.json(result);
});

Why this works: Different CPU tasks have different optimal concurrency levels. Image processing needs fewer, larger workers (memory intensive). Cryptographic operations can use more, smaller workers. Trial and error will teach you what works for your specific use case.

Memory Optimization for High-Scale Applications

Node.js Memory Management

Object Pool Pattern for High-Allocation Apps

For applications creating millions of objects (real-time messaging, data processing), object pooling prevents garbage collection pressure:

class ObjectPool {
  constructor(createFn, resetFn, initialSize = 100) {
    this.createFn = createFn;
    this.resetFn = resetFn;
    this.pool = [];
    
    // Pre-populate pool - helps avoid allocation spikes during traffic bursts
    for (let i = 0; i < initialSize; i++) {
      this.pool.push(createFn());
    }
  }
  
  acquire() {
    return this.pool.length > 0 ? this.pool.pop() : this.createFn();
  }
  
  release(obj) {
    this.resetFn(obj);
    this.pool.push(obj);
  }
}

// Example: Message processing
const messagePool = new ObjectPool(
  () => ({ id: null, user: null, content: null, timestamp: null }),
  (obj) => { obj.id = null; obj.user = null; obj.content = null; obj.timestamp = null; }
);

// Instead of: const message = { id, user, content, timestamp };
const message = messagePool.acquire();
message.id = id;
message.user = user;
message.content = content;
message.timestamp = timestamp;

// Process message...

messagePool.release(message); // Return to pool instead of GC

Performance Impact: Reduces garbage collection frequency by 70-80% for high-allocation applications. We went from GC pauses every 2 seconds to every 15 seconds on our real-time messaging app.

HTTP Connection and Keep-Alive Optimization

Node.js HTTP agent defaults are terrible for high-throughput applications:

Default HTTP Agent Problems:

  • Maximum 5 concurrent connections per host (seriously, wtf Node.js?)
  • Connections close after each request (no keep-alive by default)
  • No connection reuse across different parts of your application
  • Poor performance for microservice architectures making frequent HTTP calls

Optimized HTTP Agent Configuration:

const http = require('http');
const https = require('https');

// Global agent optimization
const httpAgent = new http.Agent({
  keepAlive: true,
  keepAliveMsecs: 30000,  // Keep connections alive for 30s
  maxSockets: 50,         // 50 concurrent connections per host
  maxTotalSockets: 0,     // Unlimited total connections
  timeout: 60000,         // Connection timeout
  freeSocketTimeout: 30000 // Free socket timeout
});

const httpsAgent = new https.Agent({
  keepAlive: true,
  keepAliveMsecs: 30000,
  maxSockets: 50,
  maxTotalSockets: 0,
  timeout: 60000,
  freeSocketTimeout: 30000,
  rejectUnauthorized: true
});

// Apply globally
http.globalAgent = httpAgent;
https.globalAgent = httpsAgent;

// Or use with specific requests
const options = {
  hostname: 'api.example.com',
  port: 443,
  path: '/data',
  method: 'GET',
  agent: httpsAgent  // Use optimized agent
};

Performance Impact: 50-200% improvement in external API call performance due to connection reuse. We went from 500ms average API calls to 180ms just by enabling keep-alive.

Caching Strategies That Actually Scale

Node.js Caching

Multi-Layer Caching Architecture:

This pattern uses the classic L1/L2 cache approach - fast local cache backed by shared Redis cache:

const NodeCache = require('node-cache');
const Redis = require('redis');

class CacheManager {
  constructor() {
    // L1: In-memory cache (fastest, limited size)
    this.memoryCache = new NodeCache({ 
      stdTTL: 300,      // 5 minute default TTL
      maxKeys: 10000    // TODO: make this configurable, 10k might not be enough
    });
    
    // L2: Redis cache (shared across processes)
    this.redisClient = Redis.createClient({
      host: 'localhost',
      port: 6379,
      retry_strategy: (options) => {
        if (options.error && options.error.code === 'ECONNREFUSED') {
          return new Error('Redis connection refused');
        }
        return Math.min(options.attempt * 100, 3000);
      }
    });
  }
  
  async get(key) {
    // Try L1 cache first (microsecond access)
    const memResult = this.memoryCache.get(key);
    if (memResult !== undefined) {
      return memResult;
    }
    
    // Try L2 cache (millisecond access)
    const redisResult = await this.redisClient.get(key);
    if (redisResult !== null) {
      const parsed = JSON.parse(redisResult);
      // Populate L1 cache for next request
      this.memoryCache.set(key, parsed);
      return parsed;
    }
    
    return null;
  }
  
  async set(key, value, ttl = 300) {
    // Set in both caches
    this.memoryCache.set(key, value, ttl);
    await this.redisClient.setex(key, ttl, JSON.stringify(value));
  }
  
  async invalidate(key) {
    this.memoryCache.del(key);
    await this.redisClient.del(key);
  }
}

const cache = new CacheManager();

// Usage in API routes
app.get('/api/expensive-data/:id', async (req, res) => {
  const cacheKey = `expensive-data:${req.params.id}`;
  
  let data = await cache.get(cacheKey);
  if (!data) {
    // Expensive database/API call
    data = await fetchExpensiveData(req.params.id);
    await cache.set(cacheKey, data, 600); // Cache for 10 minutes
  }
  
  res.json(data);
});

Cache Hit Rate Optimization:

  • L1 (Memory): 80-95% hit rate for hot data
  • L2 (Redis): 15-19% hit rate for warm data
  • Database/API: 1-5% for cold data

Performance Impact: 10x faster response times for cached data, 90% reduction in database load. Our API response time dropped from 450ms to 45ms for cached endpoints.

Load Balancing and Health Checks

Intelligent Health Checks Beyond Simple Pings:

This health check is overkill for most apps, but it caught that weird database timeout issue that only happened under load:

const express = require('express');
const app = express();

let healthStatus = {
  status: 'healthy',
  timestamp: Date.now(),
  checks: {}
};

async function performHealthChecks() {
  const checks = {
    database: checkDatabase(),
    redis: checkRedis(),
    externalAPI: checkExternalAPI(),
    memory: checkMemoryUsage(),
    eventLoop: checkEventLoopLag()
  };
  
  const results = await Promise.allSettled(Object.entries(checks).map(
    async ([name, check]) => [name, await check]
  ));
  
  healthStatus.checks = {};
  let overallHealthy = true;
  
  results.forEach(([name, result]) => {
    if (result.status === 'fulfilled') {
      const [checkName, checkResult] = result.value;
      healthStatus.checks[checkName] = checkResult;
      if (!checkResult.healthy) overallHealthy = false;
    } else {
      healthStatus.checks[name] = { healthy: false, error: result.reason.message };
      overallHealthy = false;
    }
  });
  
  healthStatus.status = overallHealthy ? 'healthy' : 'unhealthy';
  healthStatus.timestamp = Date.now();
}

async function checkEventLoopLag() {
  return new Promise((resolve) => {
    const start = process.hrtime();
    setImmediate(() => {
      const delta = process.hrtime(start);
      const lagMs = (delta[0] * 1000) + (delta[1] * 1e-6);
      resolve({
        healthy: lagMs < 100, // Unhealthy if event loop lag > 100ms
        lag: `${lagMs.toFixed(2)}ms`
      });
    });
  });
}

async function checkMemoryUsage() {
  const usage = process.memoryUsage();
  const usedMB = usage.heapUsed / 1024 / 1024;
  const totalMB = usage.heapTotal / 1024 / 1024;
  const percentage = (usedMB / totalMB) * 100;
  
  return {
    healthy: percentage < 90, // Unhealthy if > 90% heap used
    usage: `${usedMB.toFixed(0)}MB / ${totalMB.toFixed(0)}MB (${percentage.toFixed(1)}%)`
  };
}

// Update health status every 30 seconds
setInterval(performHealthChecks, 30000);
performHealthChecks(); // Initial check

app.get('/health', (req, res) => {
  const statusCode = healthStatus.status === 'healthy' ? 200 : 503;
  res.status(statusCode).json(healthStatus);
});

app.get('/health/ready', (req, res) => {
  // Readiness check - can this process handle requests?
  const ready = healthStatus.status === 'healthy' && 
                healthStatus.checks.database?.healthy;
  res.status(ready ? 200 : 503).json({ ready });
});

app.get('/health/live', (req, res) => {
  // Liveness check - is this process alive?
  res.status(200).json({ alive: true, uptime: process.uptime() });
});

This health check system enables intelligent load balancing. Load balancers can:

  • Route traffic away from processes with high event loop lag
  • Restart processes with memory issues
  • Gradually bring processes back online after database reconnection

These patterns separate high-performing Node.js applications from those that collapse under real-world load. The key is implementing them before you need them, because debugging performance issues at 3am under production traffic is nobody's idea of fun. Trust me, I've been there.

Related Tools & Recommendations

tool
Similar content

Node.js Security Hardening Guide: Protect Your Apps

Master Node.js security hardening. Learn to manage npm dependencies, fix vulnerabilities, implement secure authentication, HTTPS, and input validation.

Node.js
/tool/node.js/security-hardening
100%
tool
Similar content

React Production Debugging: Fix App Crashes & White Screens

Five ways React apps crash in production that'll make you question your life choices.

React
/tool/react/debugging-production-issues
89%
tool
Similar content

Vite: The Fast Build Tool - Overview, Setup & Troubleshooting

Dev server that actually starts fast, unlike Webpack

Vite
/tool/vite/overview
89%
integration
Similar content

Claude API Node.js Express: Advanced Code Execution & Tools Guide

Build production-ready applications with Claude's code execution and file processing tools

Claude API
/integration/claude-api-nodejs-express/advanced-tools-integration
86%
tool
Similar content

GraphQL Overview: Why It Exists, Features & Tools Explained

Get exactly the data you need without 15 API calls and 90% useless JSON

GraphQL
/tool/graphql/overview
81%
tool
Similar content

Webpack Performance Optimization: Fix Slow Builds & Bundles

Optimize Webpack performance: fix slow builds, reduce giant bundle sizes, and implement production-ready configurations. Improve app loading speed and user expe

Webpack
/tool/webpack/performance-optimization
78%
tool
Similar content

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Explore PostgreSQL's advantages over other databases, dive into real-world production horror stories, solutions for common issues, and expert debugging tips.

PostgreSQL
/tool/postgresql/overview
78%
integration
Similar content

Redis Caching in Django: Boost Performance & Solve Problems

Learn how to integrate Redis caching with Django to drastically improve app performance. This guide covers installation, common pitfalls, and troubleshooting me

Redis
/integration/redis-django/redis-django-cache-integration
78%
integration
Similar content

Claude API Node.js Express Integration: Complete Guide

Stop fucking around with tutorials that don't work in production

Claude API
/integration/claude-api-nodejs-express/complete-implementation-guide
72%
tool
Similar content

LM Studio Performance: Fix Crashes & Speed Up Local AI

Stop fighting memory crashes and thermal throttling. Here's how to make LM Studio actually work on real hardware.

LM Studio
/tool/lm-studio/performance-optimization
70%
tool
Similar content

Apollo GraphQL Overview: Server, Client, & Getting Started Guide

Explore Apollo GraphQL's core components: Server, Client, and its ecosystem. This overview covers getting started, navigating the learning curve, and comparing

Apollo GraphQL
/tool/apollo-graphql/overview
64%
tool
Similar content

mongoexport: Export MongoDB Data to JSON & CSV - Overview

MongoDB's way of dumping collection data into readable JSON or CSV files

mongoexport
/tool/mongoexport/overview
64%
howto
Similar content

Install Node.js & NVM on Mac M1/M2/M3: A Complete Guide

My M1 Mac setup broke at 2am before a deployment. Here's how I fixed it so you don't have to suffer.

Node Version Manager (NVM)
/howto/install-nodejs-nvm-mac-m1/complete-installation-guide
64%
tool
Similar content

Remix Overview: Modern React Framework for HTML Forms & Nested Routes

Finally, a React framework that remembers HTML exists

Remix
/tool/remix/overview
61%
tool
Similar content

Certbot: Get Free SSL Certificates & Simplify Installation

Learn how Certbot simplifies obtaining and installing free SSL/TLS certificates. This guide covers installation, common issues like renewal failures, and config

Certbot
/tool/certbot/overview
56%
tool
Popular choice

Python 3.13 - You Can Finally Disable the GIL (But Probably Shouldn't)

After 20 years of asking, we got GIL removal. Your code will run slower unless you're doing very specific parallel math.

Python 3.13
/tool/python-3.13/overview
55%
tool
Similar content

Redis Overview: In-Memory Database, Caching & Getting Started

The world's fastest in-memory database, providing cloud and on-premises solutions for caching, vector search, and NoSQL databases that seamlessly fit into any t

Redis
/tool/redis/overview
50%
tool
Similar content

gRPC Overview: Google's High-Performance RPC Framework Guide

Discover gRPC, Google's efficient binary RPC framework. Learn why it's used, its real-world implementation with Protobuf, and how it streamlines API communicati

gRPC
/tool/grpc/overview
50%
tool
Similar content

Open Policy Agent (OPA): Centralize Authorization & Policy Management

Stop hardcoding "if user.role == admin" across 47 microservices - ask OPA instead

/tool/open-policy-agent/overview
50%
tool
Similar content

Hardhat Advanced Debugging & Testing: Debug Smart Contracts

Master console.log, stack traces, mainnet forking, and advanced testing techniques that actually work in production

Hardhat
/tool/hardhat/debugging-testing-advanced
50%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization