Currently viewing the human version

BEAM is Weird and That's the Point

BEAM Virtual Machine Architecture

Erlang Official Logo

Most VMs use stacks because that's what computer science textbooks teach. BEAM said "fuck that" and went with registers instead. This actually matters when you're running 2 million WebSocket connections and your traditional thread-pool server starts swapping to death at 3am. The register architecture eliminates all the stack manipulation bullshit that other VMs waste cycles on.

Registers vs Stack: Why It's Not Just Theory

Stack machines spend their time pushing and popping shit around. Every operation is push A, push B, add, pop result. BEAM skips this dance and treats registers like variables in your code:

X-Registers

Function arguments and temp data. When you call foo(1, 2, 3), those values land in {x,0}, {x,1}, {x,2}. Return value comes back in {x,0}. Simple. The BEAM instruction set documentation shows exactly how these registers work.

Y-Registers

Stick around across function calls. Critical for tail recursion optimization - without these, your recursive functions would blow the stack faster than you can say "fibonacci". The Erlang efficiency guide explains how BEAM optimizes these patterns.

The real difference? Instead of push 5, push 3, add, pop result, BEAM does {gc_bif,'+',{f,0},3,[{x,1},{x,0}],{x,0}}. One instruction, done. This matters when you're running 2 million processes and every CPU cycle counts. The BEAM Book has detailed analysis of why this register approach is more efficient than stack machines.

The Scheduler That Actually Works

BEAM counts reductions - basically VM operations. Hit your quota? Your process gets kicked off the CPU, no questions asked. This used to be a fixed 2000 in older OTP versions, but now it's adaptive based on workload. I've seen this bite people who upgraded from OTP 23 to 25 and suddenly their tight loops started behaving differently. Spent a week debugging why our image processing pipeline slowed down 30% after what seemed like a routine OTP upgrade - turns out the reduction counting changes affected our CPU-bound image transforms.

This isn't time-slicing where some dickhead process can hog the CPU between timer interrupts. This is operation-counting. Your infinite loop gets exactly 2000 reductions (well, around 2000 - it varies), same as a well-behaved HTTP handler. The scheduler doesn't give a shit what your code thinks it's doing - learned this debugging an infinite loop that was mysteriously not blocking other processes, which confused the hell out of me until I understood reduction counting.

%% This will get preempted every ~2000 reductions
%% preventing it from fucking over other processes
busy_loop(0) -> done;
busy_loop(N) -> busy_loop(N-1).

One scheduler thread per CPU core. Each handles millions of BEAM processes. Context switch between BEAM processes? Around 2.6KB. OS thread context switch? 8-16KB plus all the kernel overhead bullshit. You do the math. WhatsApp handled 450 million users with 32 engineers - this efficiency is why, though let's be honest, they also had some brilliant engineers and probably more luck than they'll admit.

BEAM vs JVM Concurrency Performance

Per-Process GC: No More Stop-The-World Bullshit

The JVM stops everything for garbage collection. Your web server freezes. Your real-time chat app hiccups. Your monitoring dashboard shows lovely spikes every few seconds and users start complaining. BEAM said "that's fucking stupid" and gave each process its own garbage collector.

Process needs memory cleanup? Only that process pauses. The GC counts as reductions, so it gets scheduled like any other work. I've seen BEAM systems run for months without a single global pause. Try that with the JVM.

This works because BEAM processes can't share mutable state. Everything is immutable. No shared references between processes means the GC only needs to look at one process's heap. Simple, predictable, and it actually works when you're getting paged at 2am. Java's G1 collector still stops the world occasionally even with all its fancy algorithms.

Message Passing: The Good and The Ugly

Messages get copied between process heaps. No shared memory, no locks, no race conditions. Sounds great until you try to send a 100MB binary between processes and watch your memory usage double.

%% This copies the entire message to the recipient's heap
%% Hope it's not a huge data structure
Pid ! {data, "Hello World"},
receive
    {data, Message} -> io:format("~s~n", [Message])
end.

The receive mechanism pattern matches at the VM level. It scans your mailbox until it finds a match, leaving everything else for later. This is where BEAM can bite you hard - a process with 50,000 unmatched messages in its mailbox will scan all 50,000 messages every time you try to receive. I've debugged systems where one process with a massive mailbox brought down entire nodes during peak traffic.

But when it works, it really works. Process crashes? Everything else keeps running. The crashed process's memory gets cleaned up, mailbox gets flushed, and supervisors restart it in milliseconds. I've seen production systems lose individual processes and keep serving traffic like nothing happened. Discord does this at massive scale - millions of concurrent voice connections that keep working even when individual processes die. Though to be fair, when BEAM itself crashes, you're still fucked.

Why 1980s Telecom Tech Still Matters

BEAM was designed for telecom switches that couldn't go down. Ever. Not for web apps, not for microservices, not for blockchain bullshit. For systems where downtime meant people couldn't call 911.

Teams that build on BEAM learn to design around failure. The platform assumes things will break, and when they do, everything else keeps working. WhatsApp famously built their entire platform this way - designed for components to fail gracefully while the system stayed up.

The register-based design eliminates stack overhead. Reduction-based scheduling prevents process starvation. Per-process GC eliminates global pauses. Message passing isolates failures. These aren't separate features - they're a complete system designed around one principle: keep the lights on.

Is BEAM perfect? Fuck no. The tooling can be frustrating, the learning curve is steep, and some things (like floating-point math) are painfully slow. But when you need a system that stays up during Black Friday traffic, handles concurrency without deadlocks, and degrades gracefully when everything goes to shit, BEAM delivers. I learned this the hard way after years of fighting thread pools and connection limits on traditional platforms. Been running BEAM in production since OTP 17 - watched it evolve from quirky telecom tech to something you can actually build modern systems on without losing your mind.

BEAM Supervisor Tree Architecture

The diagram above shows how BEAM's supervisor tree architecture works in practice - when one process fails, only that branch of the tree needs to restart, keeping everything else running.

BEAM vs Other Virtual Machines: Architecture Comparison

Feature	BEAM	JVM	V8 (Node.js)	Python VM	Go Runtime
Architecture Type	Register-based VM	Stack-based VM	JIT Compiler	Stack-based VM	Compiled to native
Concurrency Model	Lightweight processes (Actor model)	OS threads + Virtual threads	Single event loop	Global Interpreter Lock	Goroutines
Process/Thread Cost	2.6KB per process	~8MB per OS thread	Single thread	Single thread + GIL	~2KB per goroutine
Maximum Concurrent Units	Millions of processes	Hundreds of threads	Thousands of callbacks	Limited by GIL	Millions of goroutines
Garbage Collection	Per-process GC	Global stop-the-world	Mark & sweep	Reference counting + cycles	Concurrent mark & sweep
GC Pause Impact	Isolated per process (microseconds)	Entire JVM pauses (milliseconds)	Entire process pauses	Minimal due to ref counting	Low-latency concurrent
Memory Model	Immutable data, no sharing	Shared memory + synchronization	Shared memory	Shared memory + GIL	Shared memory + channels
Message Passing	Built-in mailboxes, async	External libraries (Akka)	Callbacks/Promises	Queue/multiprocessing	Channels (CSP model)
Preemption	Reduction-based (operation counting)	Time-sliced by OS	Cooperative (event loop)	Time-sliced by OS	Cooperative + preemptive
Fault Isolation	Complete process isolation	Thread crashes affect JVM	Single point of failure	Single point of failure	Panic recovery per goroutine
Hot Code Swapping	Built-in support	Limited (debugging only)	Requires restart	Requires restart	Requires restart
Scheduler Architecture	One scheduler per CPU core	OS-managed thread pool	Single event loop	OS-managed	M:N scheduler
Context Switch Cost	Extremely low (register copying)	High (OS thread switch)	No context switching	High (OS thread switch)	Low (user-space)
Distribution Model	Built-in transparent clustering	External frameworks required	External clustering required	External clustering required	External clustering required
Production Scalability	450M users with 32 engineers (WhatsApp)	Scales with hardware	Vertical scaling mainly	Limited by GIL	Good horizontal scaling
Tail Call Optimization	Native support	Not supported	Limited support	Not optimized	Not supported
Typical Use Case	Telecom, real-time systems	Enterprise applications	Web applications	Scripting, data science	System services, web APIs

Performance Characteristics and Real-World Limitations

BEAM Performance Comparison

BEAM benchmarks are meaningless if you're using Java metrics. You want raw ops/second for number crunching? Use the JVM. You want to handle 2 million WebSocket connections without your server catching fire at 3am? BEAM is your friend. Different tools, different jobs.

The Concurrency Sweet Spot

BEAM shines when you're juggling thousands to millions of concurrent operations. The chart above shows what happens - more concurrency, BEAM keeps trucking. JVM starts sweating because threads are expensive and locks are a nightmare.

Performance characteristics in my experience:

Process creation: Around 1-2 microseconds per process (when things are going well)
Message passing: Usually 10-50 microseconds for small messages, much longer if your mailbox is fucked
Context switching: Sub-microsecond between processes (assuming you're not hitting memory pressure)
Garbage collection: 1-10 microseconds per process, concurrent with other processes

Compare this to OS thread creation (1-10 milliseconds) and context switching (1-100 microseconds), and you understand why BEAM can handle millions of concurrent processes where traditional systems start choking at hundreds.

Where BEAM Struggles: CPU-Intensive Workloads

BEAM wasn't designed for number crunching. The virtual machine adds overhead that becomes significant for CPU-bound tasks:

%% This floating-point intensive code will be slow on BEAM
matrix_multiply(A, B) ->
    %% Each arithmetic operation goes through VM dispatch
    %% No SIMD optimizations, no advanced JIT
    lists:map(fun(Row) ->
        lists:map(fun(Col) -> dot_product(Row, Col) end, B)
    end, A).

Performance bottlenecks:

Floating-point operations: Painfully slower than JVM - in my tests, matrix multiplication was roughly 10-50x slower, though YMMV. For reference, calculating FFT on a 1024-sample audio buffer takes around 50ms in BEAM vs maybe 2ms in Java - your results will vary based on the specific FFT implementation and what else is running. Don't use BEAM for DSP. BeamAsm in OTP 24 helped but didn't fix the fundamental issue
Tight numerical loops: No advanced JIT optimizations like HotSpot
Large data manipulation: Immutable copying overhead can dominate - I've debugged apps where 90% of CPU time was spent copying large lists
Single-threaded algorithms: Can't leverage BEAM's concurrency model - consider NIFs for CPU-bound work

Memory Usage Patterns

BEAM's memory model creates unique characteristics that differ dramatically from traditional VMs:

Process Memory Overhead:

Each process: 2.6KB baseline (give or take)
1 million processes: ~2.5GB just for process overhead - sounds like a lot until you compare
JVM: ~8MB per thread × 100 threads = 800MB (and good luck scaling past that)

BEAM Process Concurrency Scaling

Data Copying Costs:
Since processes can't share memory, message passing involves copying data between heaps. For large data structures, this becomes expensive:

%% Copying a 10MB binary between processes
%% Results in actual memory duplication
BigData = binary:copy(<<0:80000000>>),  % 10MB binary
Pid ! {process_data, BigData}.          % Another 10MB copied
%% That 10MB binary copy? It doesn't just double memory usage -
%% it triggers GC in both processes, potentially stalling your request pipeline.

Garbage Collection Efficiency:
Per-process GC means that short-lived processes are incredibly efficient - their entire heap gets deallocated when the process terminates. Long-lived processes with large heaps can experience longer GC pauses.

The Scheduler Under Load

BEAM's scheduler performs admirably under most conditions, but certain scenarios can cause performance degradation:

Reduction Starvation:
Processes that consistently use their full reduction quota can create uneven scheduling:

%% This process will monopolize its scheduler thread
cpu_bound_work() ->
    %% Uses exactly 2000 reductions, gets rescheduled immediately
    lists:seq(1, 1000),  % Consumes reductions
    cpu_bound_work().    % Tail call - no stack growth

Priority Inversion:
High-priority processes waiting for messages from lower-priority processes can create unexpected latency. Unlike real-time systems, BEAM doesn't implement priority inheritance. I've seen this kill latency SLAs when low-priority database processes block high-priority API handlers during traffic spikes. OTP 21's scheduler improvements helped but didn't eliminate the problem.

NIFs and Dirty Schedulers:
Native Implemented Functions (NIFs) that run too long can block scheduler threads. BEAM provides dirty schedulers for long-running native code, but they're a finite resource. NIFs longer than 1ms will fuck your entire scheduler thread. Learned this debugging a crypto library that was blocking our entire API for 50ms per call. Use erlang:system_info(dirty_io_schedulers) to check how many you have available.

Measuring What Matters

Traditional performance metrics often miss BEAM's strengths. Instead of measuring raw throughput, focus on:

Latency Under Load:
How does response time change as concurrent connections increase? BEAM typically shows flat latency curves where other systems show exponential degradation.

Memory Efficiency:
How much memory does your system use per concurrent operation? BEAM's isolated processes often use less total memory than shared-state systems with complex synchronization.

Fault Recovery Time:
How quickly can your system recover from failures? BEAM processes restart in milliseconds, while thread-based systems might need to restart entire applications.

Operational Complexity:
How much effort does it take to maintain predictable performance? BEAM's preemptive scheduling and isolated GC eliminate many performance tuning headaches.

The 80/20 Rule for BEAM

BEAM excels at the concurrent, I/O-bound operations that comprise 80% of most modern systems:

✅ Handling thousands of WebSocket connections
✅ Processing message queues with varying load
✅ Implementing real-time communication systems
✅ Building fault-tolerant distributed services

For the 20% that involves heavy computation, BEAM provides escape hatches:

NIFs for performance-critical native code
Ports for external program communication
BIFs for VM-optimized operations
Dirty schedulers for CPU-intensive work

The key insight: BEAM doesn't try to be fastest at everything. It optimizes for the performance characteristics that matter most in concurrent, distributed systems - predictable latency, efficient resource utilization, and graceful degradation under load.

Implementation Details and Production Reality

BEAM's internals matter when your system starts handling real load and the textbook examples stop working. Here's what you need to know when things go sideways at 3am and your pager is going off.

BEAM Virtual Machine Architecture

Memory Management Deep Dive

BEAM's memory architecture consists of multiple layers that work together to provide isolation and efficiency:

Process Heaps:
Each process gets two heap areas - a young generation for new allocations and an old generation for data that survives garbage collection cycles. This generational approach optimizes for the common pattern where most allocated data is short-lived.

%% This creates data in the young generation
create_temporary_data() ->
    List = lists:seq(1, 1000),           % Young generation
    process_data(List),                  % Function scope
    ok.                                  % List becomes garbage immediately

%% This data moves to old generation if it survives GC
create_persistent_data() ->
    Data = ets:new(cache, []),
    persistent_worker(Data).

System-Level Allocators:
Below process heaps, BEAM uses custom allocators that manage memory in large chunks to minimize system calls. The allocator system is complex but predictable once you understand the patterns:

LL Alloc: Low-latency allocator for frequently allocated/deallocated objects
SL Alloc: Standard allocator for general-purpose allocations
HL Alloc: High-level allocator for large objects
Binary Alloc: Specialized allocator for binary data

This is why your BEAM memory usage looks like a staircase - it goes up during load spikes but never comes back down. Your monitoring dashboard will show memory climbing then staying there forever and you'll think there's a leak. It's not a leak, it's how BEAM allocators work. Spent hours debugging this bullshit before learning to trust the VM's memory management. Still makes me nervous though.

Scheduler Internals and Tuning

BEAM's scheduler architecture involves sophisticated algorithms that most developers never need to understand - until they do.

Run Queue Management:
Each scheduler maintains four priority levels with separate queues. The scheduler uses a complex algorithm to balance between different priority levels:

%% Priority affects scheduling frequency, not preemption
spawn_opt(fun heavy_computation/0,
          [{priority, low}]),     % Gets scheduled less often

spawn_opt(fun real_time_handler/0,
          [{priority, high}]).    % Can starve normal processes

Work Stealing Between Schedulers:
When one scheduler runs out of work, it attempts to steal processes from other schedulers' run queues. This load balancing happens automatically but can be observed and tuned:

## Start BEAM with specific scheduler configuration
erl +S 8:4    # 8 schedulers, 4 online (can be changed at runtime)
erl +sbt db   # Default bind type for NUMA systems

Migration and Affinity:
Processes can migrate between schedulers, but BEAM tries to maintain processor affinity to leverage CPU cache locality. High-frequency processes benefit from staying on the same scheduler thread.

The Instruction Set Reality

While BEAM instructions look simple in documentation, the runtime performs extensive optimizations:

Instruction Rewriting:
BEAM loads instructions from .beam files but immediately rewrites them into optimized internal forms. Common patterns get converted to specialized instructions:

%% Source Erlang
case Value of
    {ok, Result} -> Result;
    error -> default_value()
end

%% Gets compiled to optimized BEAM instructions
%% that avoid generic pattern matching overhead

The JIT Factor:
OTP 24 introduced BeamAsm, a JIT compiler that converts BEAM instructions to native code on x86-64 and ARM64. This can provide 10-20% performance improvements for CPU-bound code while maintaining all of BEAM's concurrency guarantees. I've seen bigger gains in some workloads, smaller in others - it depends heavily on your specific code patterns. Our JSON parsing got around 25% faster with BeamAsm, but our WebSocket handling saw basically no improvement because it was already I/O bound. Your mileage will definitely vary.

Micro-Optimizations:
The emulator includes hundreds of micro-optimizations based on decades of production use:

Packed registers: Multiple small values packed into single words
Immediate values: Small integers embedded directly in instructions
Specialized comparisons: Optimized paths for common comparison patterns

Distribution and Clustering Realities

BEAM's built-in distribution is powerful but comes with subtle complexities:

Node Connectivity:
Every node maintains TCP connections to every other node in the cluster. This creates an O(n²) scaling problem for large clusters:

%% In a 10-node cluster: 45 connections (10 * 9 / 2)
%% In a 100-node cluster: 4,950 connections
%% Each connection consumes memory and file descriptors
net_kernel:connect_node('worker@remote.host').

Global Name Registration:
The global name registry uses a distributed algorithm that can become a bottleneck in large clusters. Most production systems avoid it:

%% Avoid this in large clusters
global:register_name(unique_worker, self()).

%% Use local registration with known node names instead
register(worker, self()).

Split-Brain Scenarios:
BEAM's distribution doesn't handle split-brain automatically. I've debugged systems where AWS maintenance windows caused network partitions and suddenly we had two "master" nodes both thinking they owned the same resources. Plan for this or it will ruin your weekend. This bit us hard during an AWS zone outage - took 6 hours to clean up the mess because we hadn't planned for the partition scenario. Learn from our pain.

Production Monitoring and Introspection

BEAM provides unprecedented visibility into running systems through built-in tools:

Process Inspector:
Every process can be introspected without affecting its execution:

%% Get detailed information about any process
{dictionary, Dict} = process_info(Pid, dictionary),
{memory, Memory} = process_info(Pid, memory),
{message_queue_len, QLen} = process_info(Pid, message_queue_len).

System Statistics:
The VM exposes detailed metrics about scheduler utilization, memory allocation, and process counts:

%% Monitor scheduler utilization across all cores
statistics(scheduler_wall_time_all).

%% Track memory allocation patterns
erlang:memory().

%% Process statistics
length(processes()).  % Total process count

Live System Debugging:
Tools like observer and recon allow real-time debugging of production systems without downtime:

%% Connect to running production system
observer:start().

%% Find processes with large mailboxes
recon:proc_count(message_queue_len, 10).

%% Trace function calls on live system
recon_trace:calls({module, function, '_'}, 100).

Configuration for Scale

BEAM's default configuration works well for development but needs tuning for production:

Process Limits:

## Default: 262,144 processes (you'll hit this faster than you think)
erl +P 2000000    # Allow 2 million processes

## When you hit the process limit, you get: {error, system_limit}
## Usually happens when WebSocket connections spike during Black Friday or some marketing asshole
## sends push notifications to your entire user base at once
## Rule of thumb: 10x expected concurrent processes (learned this the hard way)

Memory Allocators:

## Tune allocators for workload
erl +MBas aobf    # Address order best fit for binary allocator
erl +MHas aobf    # Reduce fragmentation in heap allocator

Scheduler Configuration:

## Match schedulers to CPU topology
erl +S 16:8       # 16 schedulers, 8 online
erl +sbt db       # Default bind - let OS choose
erl +sbt s        # Spread - one scheduler per CPU

Version-Specific Gotchas:

OTP 24+: BeamAsm JIT is enabled by default but changes memory patterns - don't panic when memory usage looks different
OTP 25: Reduction counting changed - affects long-running loops. Your tight CPU loops might suddenly get preempted more often
OTP 26: New memory allocator causes different fragmentation patterns. Your memory tuning from OTP 24 probably doesn't work anymore
OTP 27 (Current): Process spawning got faster but burns more memory upfront

BEAM gives you incredible power but with great complexity. Observer will crash when you need it most (usually when you have 500K+ processes and everything's on fire). Recon becomes essential for production debugging. Memory usage patterns are weird but predictable once you understand the allocators. And always remember: floating-point math will be painfully slow, but that's not why you chose BEAM anyway. I've been running BEAM in production across four major OTP versions - each brought improvements but also surprises that kept me up at night. The fundamentals stay solid, but details change.

Essential BEAM Virtual Machine Resources

42%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation

Registers vs Stack: Why It's Not Just Theory

X-Registers

Y-Registers

The Scheduler That Actually Works

Per-Process GC: No More Stop-The-World Bullshit

Message Passing: The Good and The Ugly

Why 1980s Telecom Tech Still Matters

The Concurrency Sweet Spot

Where BEAM Struggles: CPU-Intensive Workloads

Memory Usage Patterns

The Scheduler Under Load

Measuring What Matters

The 80/20 Rule for BEAM

Memory Management Deep Dive

Scheduler Internals and Tuning

The Instruction Set Reality

Distribution and Clustering Realities

Production Monitoring and Introspection

Configuration for Scale

Related Tools & Recommendations

Erlang/OTP - The Weird Functional Language That Handles Millions of Connections

Which JavaScript Runtime Won't Make You Hate Your Life

Build Trading Bots That Actually Work - IB API Integration That Won't Ruin Your Weekend

Bun vs Deno vs Node.js: Which Runtime Won't Ruin Your Weekend

Install Go 1.25 on Windows (Prepare for Windows to Be Windows)

Stop Stripe from Destroying Your Serverless Performance

Drizzle ORM - The TypeScript ORM That Doesn't Suck

Fix TaxAct When It Breaks at the Worst Possible Time

rust-analyzer - Finally, a Rust Language Server That Doesn't Suck

How to Actually Implement Zero Trust Without Losing Your Sanity

China Just Weaponized Antitrust Law Against Nvidia

jQuery - The Library That Won't Die

Slither - Catches the Bugs That Drain Protocols

OP Stack Deployment Guide - So You Want to Run a Rollup

VS Code Settings Are Probably Fucked - Here's How to Fix Them

I Burned $400+ Testing AI Tools So You Don't Have To

Container Network Interface (CNI) - How Kubernetes Does Networking

Firebase Started Eating Our Money, So We Switched to Supabase

Twistlock - Container Security That Actually Works (Most of the Time)

Build REST APIs in Gleam That Don't Crash in Production