Supabase's dashboard lies. Not maliciously, but it shows averages while your users experience the worst-case scenarios that get buried in the stats.
Why the Built-in Dashboard Fails You
Supabase's monitoring showed "197 active connections" during our Product Hunt launch meltdown. Looked manageable, right? Wrong. Those 197 connections were the ones PostgreSQL could see, but pgbouncer was queueing another 400+ requests from our Next.js API routes.
Users tweeted screenshots of timeout errors while my dashboard glowed green. Took 3 hours to realize the connection count excluded serverless functions that grabbed connections and never released them. Our signup endpoint was leaking one connection per failed attempt.
Dashboard averages are criminal. You see 180ms response times while 15% of users wait 12+ seconds for pages to load. Twitter fills with "is this site down?" while your graphs show healthy performance.
The Metrics That Actually Matter
Cache hit rate dropped to 87% during our busiest hour. Page loads jumped from 200ms to 6 seconds. Users abandoned shopping carts because checkout took forever to respond.
Connection count hit 184/200 and stayed there. API started throwing ECONNREFUSED within minutes. Our mobile app crashed trying to sync user data because every request timed out. Dashboard still said "healthy" while support emails poured in asking if we were down.
pg_stat_statements Saves Your Life
pg_stat_statements comes enabled, thank god. Without it, you're debugging performance blind.
Last Tuesday our API started timing out randomly. Spent hours checking server logs, Redis cache, CDN settings. Finally ran the pg_stat_statements query and found one UPDATE statement taking 45 seconds. Someone forgot to add an index on user_id in our notifications table. 50,000 rows, full table scan on every notification update.
-- Find the queries that are killing your app
SELECT
query,
calls,
total_exec_time + total_plan_time as total_time,
mean_exec_time + mean_plan_time as avg_time,
max_exec_time + max_plan_time as max_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
This query reveals which queries are burning the most cumulative time, not just the "slow" ones. That innocent-looking query that runs 1000 times per minute with 100ms average? It's killing performance more than the one big report that takes 10 seconds but runs once per hour.
CLI Commands That Don't Lie
The Supabase CLI inspection commands tell you the truth when the dashboard won't:
## Cache hit rates - mine was 73% during the incident
supabase inspect db cache-hit
## Found 8 unused indexes eating 200MB and slowing writes
supabase inspect db unused-indexes
## Shows which tables need indexes (hint: most of them)
supabase inspect db seq-scans
## Current locks preventing queries from completing
supabase inspect db blocking
Run these during quiet hours, not during a production fire. Found out the hard way that unused-indexes revealed 8 indexes I created "just in case" that were killing INSERT performance. Deleted them and write speed doubled overnight.
Connection Pool Nightmares
Our app handled 50 concurrent users fine. Hit 200 users and everything died with ECONNREFUSED errors. Each Next.js API route grabbed its own connection and held it until the serverless function timed out.
-- Check if you're about to run out of connections
SELECT
state,
count(*) as connections
FROM pg_stat_activity
WHERE state IS NOT NULL
GROUP BY state;
-- Find connections hogging resources
SELECT
datname,
usename,
state,
query_start,
now() - query_start as duration,
query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY query_start;
Free tier gives you 200 connections. Sounds like plenty until you launch. Product Hunt traffic hit us with 400 concurrent requests, each serverless function grabbed a connection, and we maxed out in 3 minutes.
?pgbouncer=true
in your connection string stops this disaster. One parameter change saved our launch day.
Supavisor is their newer pooler but pgbouncer has worked fine for two years. If it ain't broke, don't fix it.
When Your Database Hits the Disk
Cache hit rate at 89% sounds pretty good until you realize it means 11% of your queries are reading from disk instead of memory. Disk reads are 100x slower than RAM hits.
-- Check if you're reading from disk too much
SELECT
'index hit rate' as metric,
(sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read), 0) * 100 as percentage
FROM pg_statio_user_indexes
UNION ALL
SELECT
'table hit rate' as metric,
sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100 as percentage
FROM pg_statio_user_tables;
Our cache hit rate dropped to 83% during Black Friday traffic. Page loads went from 250ms to 8 seconds. Users abandoned their carts because product pages took forever to load. Had to upgrade from the 1GB instance to 4GB just to fit our product catalog in memory.
Below 90% means your database is thrashing between memory and disk on every query. Users notice immediately.
Setting Up Alerts That Don't Suck
Supabase's built-in alerts are garbage. First notification I got was "Database offline" after users had been complaining for 20 minutes.
What to Monitor Before Things Break
- Connection count >160 - found this out when we hit 198/200 and crashed
- Cache hit rate <90% for 5+ minutes - performance tanks fast after this
- Any query running >30 seconds - usually means a missing index
- Disk usage >85% - gives you time to upgrade before hitting the wall
App-Level Monitoring That Predicts Disasters
- API response time >3 seconds - users start abandoning after 3 seconds
- Error rate >0.5% - normal apps see <0.1%, anything higher needs investigation
- Connection pool >85% - gives you a few minutes before the crash
// Example monitoring setup with structured logging
export const logPerformanceMetrics = async (req: Request, startTime: number) => {
const duration = Date.now() - startTime;
console.log(JSON.stringify({
event: 'api_request',
path: req.url,
method: req.method,
duration_ms: duration,
timestamp: new Date().toISOString(),
// Add custom metrics that matter to your app
user_id: req.headers.get('user-id'),
query_count: req.headers.get('x-query-count')
}));
// Alert on slow requests
if (duration > 5000) {
console.error(JSON.stringify({
event: 'slow_request_alert',
path: req.url,
duration_ms: duration,
threshold_exceeded: '5000ms'
}));
}
};
What Tools Actually Help When Everything's Broken
Supabase Dashboard: Pretty graphs, useless for debugging. Shows you problems after users already left.
PostgreSQL Views: pg_stat_statements saved me countless hours. Learn it or suffer.
Supabase CLI: Run the inspect commands weekly during quiet hours. Found more issues with 5 minutes of CLI than hours of dashboard staring.
Sentry: Sentry works for errors but their $26/month quickly becomes $280 when your traffic spikes. Ask me how I know.
Grafana: Grafana is for people who love configuring dashboards more than fixing actual problems. Spent 2 weeks making pretty charts while users complained about slow pages.
Monitor connection exhaustion, slow queries, and cache misses. The rest is just pretty graphs that don't help when you're debugging at 3am on a Saturday.
PostgreSQL's monitoring docs contain everything you need and cure insomnia.