How Supabase Realtime Actually Works

Supabase Realtime is an Elixir cluster that syncs data over WebSockets. Built on Phoenix Framework, it can supposedly handle millions of connections across regions. Works great in demos, flaky as hell in production.

Supabase Realtime Architecture

WebSocket Real-time Communication

WebSocket Broadcasting Pattern

Phoenix Channels handle the pub/sub stuff using Elixir processes. Messages supposedly take the shortest path between regions - when it works. Sometimes your Singapore users get routed through Virginia for no fucking reason.

Core Components

Phoenix Channels run the messaging using Phoenix.PubSub. Works fine until it doesn't.

Global state sync supposedly keeps presence data consistent across regions. In reality, ghost users pile up like digital zombies and your "online users" count becomes meaningless.

Database integration streams changes from PostgreSQL's WAL through replication slots. When your database gets hammered, WAL replication lags like hell and your "real-time" updates turn into "whenever-the-fuck-they-feel-like-it" updates.

Broadcast from Database (2025 Update)

The latest Broadcast from Database feature sends messages when database changes happen. Creates a partitioned realtime.messages table that publishes changes over WebSockets. Built-in authorization through RLS policies actually works.

Messages get purged after 3 days by dropping partitions - at least that part doesn't require manual cleanup.

Fault-tolerant my ass. Here's what actually breaks in production:

When Realtime Breaks (And It Will)

The "fault-tolerant" system still fails in predictable ways. I've debugged all of these at 3am:

Connection Pool Death Spiral: During traffic spikes, connection pools get exhausted and new WebSocket connections start timing out. Your users get the dreaded "connection failed" error with zero useful context. The solution? Restart everything and pray.

WAL Replication Lag Hell: WAL replication lags like hell when your DB is getting hammered. Your "real-time" changes turn into "eventual-time" changes and users see stale data while thinking everything is live.

Message Ordering Chaos: Broadcast messages don't arrive in order during network congestion. Your collaborative cursor app turns into a seizure-inducing mess as cursors jump randomly around the screen. There's no built-in message sequencing - you'll need to add timestamps and handle out-of-order delivery yourself.

The Phantom Presence Problem: Users who force-quit their browser or lose WiFi stay "online" forever. Ghost users pile up like digital zombies until your user list is meaningless. The CRDT "eventual consistency" sometimes means "never consistent."

Regional Routing Madness: Messages "usually" take the shortest path between regions, except when AWS has connectivity issues and your Singapore users get routed through Virginia for no fucking reason. Latency spikes from 50ms to 500ms and there's nothing you can debug.

The Database Connection Black Hole: When your database goes down, Realtime tries to reconnect from the "nearest available region." In practice, this means 30-60 seconds of complete silence while your users wonder if their internet broke.

Deep Dive References

For more technical details on Realtime's architecture and implementation:

Supabase Realtime Features Comparison

Feature

Postgres Changes

Broadcast

Presence

Purpose

Listen to database changes

Send ephemeral messages

Track user state

Data Persistence

Database-driven

Ephemeral (not stored)

In-memory only

Authorization

Row Level Security

RLS + Custom policies

RLS + Custom policies

Latency

~100-500ms

<50ms

<50ms

Production Reality

WAL lags when DB is hammered

500ms+ during network hiccups

Ghost users pile up forever

Use Cases

Live data sync, notifications

Chat, gaming, cursors

Online users, activity

Scalability

Limited by DB connections

High (millions of messages)

High (thousands of users)

Message Size

Unlimited (DB record)

256KB max per message

64KB max per presence

Delivery Guarantee

At-least-once

Best-effort (aka "good luck")

Best-effort (aka "good luck")

Regional Support

Global with DB proximity

Global cluster

Global cluster

Setup Complexity

Publication required

Channel subscription

Channel subscription

Pricing Impact

WAL streaming costs

2.50/million (cursor moves add up fast)

Connection-based

Implementation Guide and Best Practices

Database Changes Implementation

The most reliable (but slowest) of Realtime's three features. Postgres Changes needs a publication to stream database events to connected clients. Uses PostgreSQL's logical replication to capture INSERT, UPDATE, and DELETE operations.

-- Enable realtime for specific table
ALTER PUBLICATION supabase_realtime ADD TABLE messages;
// The docs version - works in dev, breaks in prod
const channel = supabase.channel('messages-changes')

// The version that actually works in production
const channel = supabase
  .channel('messages-changes', {
    config: {
      // Because the defaults will fail you
      heartbeat_timeout: 60000,
      reconnect_after_timeout: 5000
    }
  })
  .on('postgres_changes', 
    { event: 'INSERT', schema: 'public', table: 'messages' },
    (payload) => console.log('New message:', payload)
  )
  .on('error', (error) => {
    console.error('Realtime connection died:', error)
    // Add your resurrection logic here
  })
  .on('disconnect', (reason) => {
    console.warn('Disconnected:', reason)  
    // Mobile browsers, shitty WiFi, corporate firewalls - pick one
  })
  .subscribe()

Your replication slot becomes a chokepoint when you're doing heavy writes. Batch your operations or watch your replication lag climb like a fucking rocket.

Version-Specific Gotchas I've Learned the Hard Way:

  • Connection timeout bullshit: Newer versions of realtime-js changed the default timeout from 10s to 30s. Your users will sit there for 30 fucking seconds before getting a connection error. Override with shorter timeouts or they'll think your app is broken.

  • Authorization enforcement changes: RLS policies started getting enforced differently for Broadcast channels in recent versions. Code that worked fine suddenly fails silently. No migration guide, just stack traces and confusion.

  • WAL replication randomly stops: On Postgres 15+, logical replication occasionally just stops working during high-throughput periods. Your database changes stop flowing and you lose several minutes of updates. Restart the replication slot and pray.

Broadcast for Real-time Messaging

The fast but dangerous option. Broadcast sends instant messages between connected clients without touching the database. Messages are ephemeral and only reach currently connected users - great for live interactions until your network hiccups.

// Send broadcast message - looks simple, right?
channel.send({
  type: 'broadcast',
  event: 'cursor_move',
  payload: { x: 100, y: 200, user_id: 'user-123', timestamp: Date.now() }
})

// Receive broadcast messages - add timestamp handling or regret it later
channel.on('broadcast', { event: 'cursor_move' }, (payload) => {
  // Messages arrive out of order during network congestion
  // Without timestamps, cursors jump around like broken mice
  if (payload.timestamp < lastCursorUpdate[payload.user_id]) {
    return // Ignore outdated message
  }
  updateCursor(payload.user_id, payload.x, payload.y)
  lastCursorUpdate[payload.user_id] = payload.timestamp
})

The Broadcast Billing Trap: That innocent cursor movement? Each mouse move is a message. Users dragging their cursor generates 100+ messages per second. Our whiteboard generated something insane like 40-50 million messages in a week - think it was $120-something just for cursor movements. Rate limit everything or watch your bill explode.

Authorization: The August 2024 authorization update introduced Row Level Security policies for Broadcast channels, ensuring secure message routing based on user permissions.

Presence for User State Tracking

The feature that works great in demos and poorly in production. Presence maintains distributed state of connected users using CRDTs. Provides "eventual consistency" across nodes without database writes - when it feels like working.

// Track user presence
channel.track({
  user_id: 'user-123',
  name: 'John Doe',
  cursor_position: { x: 150, y: 300 },
  last_active: Date.now()
})

// Monitor presence changes
channel.on('presence', { event: 'sync' }, () => {
  const presences = channel.presenceState()
  console.log('Online users:', Object.keys(presences).length)
})

Client Libraries and Integration

Official client libraries:

APIs are supposedly consistent across platforms. In practice, each has its own special way of breaking.

Success with Supabase Realtime isn't avoiding the quirks - it's understanding them and coding defensively around the inevitable failures.

Essential Implementation Resources

When building production applications with Realtime, these resources will save you hours of debugging:

Questions You'll Actually Ask While Debugging at 3AM

Q

What's the difference between Broadcast and Postgres Changes?

A

Broadcast sends ephemeral messages directly between connected clients without database storage, ideal for live interactions like cursor movements or chat messages. Postgres Changes streams actual database events (INSERT, UPDATE, DELETE) to subscribers, ensuring data persistence and consistency for critical updates.

Q

Why do my WebSocket connections keep dying?

A

Because WebSockets are fragile as hell. Mobile browsers kill background connections, corporate firewalls randomly drop WebSocket traffic, and load balancers don't understand heartbeats. Your connection shows "connected" but messages disappear into the void. Add aggressive reconnection logic or your users will think your app is broken.

Q

Does Supabase Realtime guarantee message delivery?

A

Fuck no. Realtime uses "best-effort delivery" which means "we'll try but no promises." Your chat message might vanish into the digital ether during a network hiccup. For anything important, store it in the database first, then broadcast a notification. Don't trust ephemeral messaging for critical data.

Q

Why is my Realtime bill 10x higher than expected?

A

Because every heartbeat, cursor movement, presence update, and failed reconnection attempt counts as a message.

Your collaborative app with 20 users generated something like 40-60 million messages in a week

  • think the bill was around $150 just for cursor movements. Rate limit everything or your bill will explode. $2.50 per million sounds cheap until you realize a single user can generate thousands of messages per minute.
Q

Can I use Realtime across multiple regions?

A

Yes, Supabase Realtime operates as a global cluster. Messages automatically route through the shortest path between regions. A user in Singapore can communicate with users in the US with minimal added latency.

Q

What are the connection limits for Realtime?

A

The default configuration supports up to 16,384 WebSocket connections per node, with 100 acceptor processes handling incoming connections. Enterprise plans can customize these limits based on specific requirements.

Q

How do I secure Realtime channels?

A

Use Row Level Security (RLS) policies to control access to channels. Realtime respects PostgreSQL's authentication and authorization rules, ensuring users only receive data they're permitted to access.

Q

What happens if my database connection is lost?

A

Realtime automatically attempts to reconnect to the database from the nearest available region. Each region has multiple nodes for redundancy. Clients will experience a temporary interruption but should reconnect automatically once the connection is restored.

Q

Can I filter which database changes I receive?

A

Yes, you can filter by specific tables, schemas, and event types when subscribing to Postgres Changes. Use table publications and RLS policies to control data access at the database level.

Q

Is there a message size limit for Broadcast?

A

Broadcast messages are limited to 256KB per message. For larger payloads, consider storing data in the database and sending smaller notification messages through Broadcast to trigger data fetching.

Q

How does Presence handle network partitions?

A

Presence uses CRDTs (Conflict-Free Replicated Data Types) to maintain eventual consistency across network partitions. When connections are restored, presence state automatically synchronizes without conflicts.

Q

Why do my presence users never go offline?

A

Because presence state is built on "eventual consistency" which sometimes means "never consistent." Users who force-quit their browser or lose WiFi connection stay "online" forever. Ghost users accumulate like digital zombies until your user list becomes meaningless. You'll need to implement your own heartbeat system to purge stale presence data.

Q

My messages arrive out of order - is this normal?

A

Unfortunately, yes. During network congestion or server restarts, broadcast messages arrive whenever they feel like it. Your chat app shows "Hello" after "How are you?" and users get confused. Add sequence numbers to every message and sort them client-side, or accept the chaos.

Q

Can I use Realtime with my existing PostgreSQL database?

A

Maybe. Your database needs PostgreSQL 10+ with logical replication enabled, which means wal_level = logical and available replication slots. Most managed database providers (AWS RDS, Google Cloud SQL) restrict this for security reasons. You might need to upgrade your plan or migrate to Supabase's PostgreSQL to get real-time features working.

Essential Supabase Realtime Resources