Emergency Fixes for Production Issues

Q

My deploy just failed with "remaining connection slots are reserved"

A

Your connection pool is fucked. Neon limits connections and something is hogging them all.

Q

App is throwing timeout errors randomly

A

Cold starts. Your database went to sleep and takes 400-800ms to wake up. If your app timeout is 5 seconds, this shouldn't matter. If it's 1 second, you're screwed.

Emergency fix:

  1. Go to Neon console → Database settings
  2. Set "Auto-suspend delay" to "Never"
  3. Your bill just went up 10x but your app works

Real fix:

  • Increase your app's timeout to 10+ seconds
  • Use connection warmup
  • Set up proper health checks that keep the DB warm
Q

Database migrations are failing with "permission denied"

A

Neon doesn't give you superuser access. Most migration tools expect it.

Works:

-- These work fine
CREATE TABLE, ALTER TABLE, CREATE INDEX, DROP INDEX
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Breaks:

-- These will throw permission errors
CREATE OR REPLACE FUNCTION pg_stat_statements_reset() -- needs superuser
LOAD 'pg_stat_statements'; -- needs superuser

Fix: Check your migration for superuser commands. Most PostgreSQL features work, but system-level stuff doesn't.

Q

My bill suddenly jumped from $5 to $150

A

Autoscaling kicked your database to 8 CUs during a traffic spike and kept it there. Autoscaling is aggressive and doesn't scale down fast enough.

Immediate damage control:

  1. Go to Neon console → Settings → Autoscaling
  2. Set max scale to 2 CUs (or whatever you can afford)
  3. Set scale-down sensitivity to "High"

Check your usage:

  • Go to Usage tab in console
  • Look for CU spikes - anything over 4 CUs gets expensive fast
  • Storage costs: each branch costs separately

I got hit with a $73 bill when a scraper hit my API and autoscaling went nuts for 3 hours.

Q

"FATAL: too many connections for role" error

A

You hit the role-level connection limit, not the database limit. This happens with pooled connections in transaction mode.

Quick fix:

## Add this to your connection string
?application_name=myapp&connect_timeout=10

Better fix:

  • Use session pooling instead of transaction pooling
  • Set connection limits at the app level, not just the database level
  • Monitor connection usage in Neon console
Q

Queries suddenly became 10x slower

A

Either you lost an index during a migration or you're hitting connection limits. Use `pg_stat_statements` to find the culprit.

Debug:

-- Check for missing indexes
SELECT schemaname, tablename, attname, null_frac, avg_width, n_distinct, correlation
FROM pg_stats WHERE tablename = 'your_slow_table';

-- Find slow queries
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

If it's connection limits: You're getting queued behind other connections. Scale up your compute or fix your connection pooling.

Deep Dive: Debugging Neon in Production Environments

Three months ago, my staging environment went dark at 2 PM on a Tuesday. Users were hitting timeout errors, deployments were failing, and I was frantically refreshing the Neon status page. Here's what I learned about actually debugging Neon when things break.

Connection Pool Exhaustion - The Silent Killer

The most common production issue isn't database performance - it's connection exhaustion. Neon limits connections based on your tier (100 on Free, 1000 on Launch, 10,000 on Scale), but here's the real problem: your application is probably using way more connections than you think.

I discovered this when my Next.js app deployed 20 Vercel functions, each using Prisma's default connection pool of 5 connections. Do the math: 20 × 5 = 100 connections. That maxed out my free tier instantly.

Real debugging steps:

-- Check current connection usage
SELECT 
    count(*) as total_connections,
    count(*) FILTER (WHERE state = 'active') as active,
    count(*) FILTER (WHERE state = 'idle') as idle,
    count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction
FROM pg_stat_activity;

-- See who's hogging connections
SELECT 
    pid, 
    usename, 
    application_name, 
    state, 
    state_change,
    query 
FROM pg_stat_activity 
WHERE state != 'idle' 
ORDER BY state_change;

The fix that actually works:

## Add this to your database URL
?connection_limit=3&pool_timeout=20

## Or in your ORM config (Prisma example)
generator client {
  provider = \"prisma-client-js\"
}

datasource db {
  provider = \"postgresql\"
  url      = env(\"DATABASE_URL\")
}

Set connection_limit=3 for most applications. I've run production apps with 2 connections and never hit bottlenecks. More connections ≠ better performance. For Vercel deployments specifically, check the Neon-Vercel integration guide for connection pool optimization.

Autoscaling Surprise Bills

Here's a $73 lesson I learned the hard way. A web scraper hit my API endpoint 10,000 times in 20 minutes. Neon's autoscaling kicked in, bumped my compute from 0.25 CU to 8 CU, and kept it there for 3 hours while I was in meetings.

The billing math: 8 CU × $0.26/hour × 3 hours = $6.24. But it happened 12 times that month due to various traffic spikes. Total damage: $73. Check the current pricing rates for accurate compute hour costs.

Neon Compute Metrics

Emergency damage control:

  1. Set strict autoscaling limits in Neon console
  2. Enable email alerts for compute usage spikes
  3. Configure your application's rate limiting properly

Monitoring that matters:

-- Check if you're hitting compute limits
SELECT 
    pg_size_pretty(pg_database_size(current_database())) as db_size,
    (SELECT setting FROM pg_settings WHERE name = 'max_connections') as max_conn,
    count(*) as current_conn
FROM pg_stat_activity;

The Neon dashboard shows real-time CU usage, but you won't notice autoscaling events unless you're actively watching. Set up consumption alerts or you'll get surprised by your bill.

Cold Start Debugging Hell

Cold starts are Neon's Achilles heel for real-time applications. When your compute suspends after 5 minutes of inactivity, the first query takes 300-800ms. For most web apps, this is fine. For WebSocket connections or real-time chat, it's game over.

I spent 2 weeks debugging "slow queries" before realizing they weren't slow - the database was just waking up. Here's how to identify cold start issues:

Check connection timing:

// Add timing to your connection attempts
const start = Date.now();
try {
    await db.query('SELECT 1');
    console.log(`Query took ${Date.now() - start}ms`);
} catch (error) {
    console.log(`Failed after ${Date.now() - start}ms:`, error.message);
}

Typical timing patterns:

  • Active database: 2-15ms for simple queries
  • Cold start: 300-800ms for the first query, then back to normal
  • Network issues: Consistent 1000+ ms or timeouts

Real solutions:

  1. Disable auto-suspend for production if you can afford the cost
  2. Database warming: Set up a cron job to ping your DB every 4 minutes
  3. Increase application timeouts to 10+ seconds for initial connections
  4. Connection keepalive: Use persistent connections where possible

The \"Prepared Statement Does Not Exist\" Nightmare

This error happens with concurrent connections using prepared statements, especially with Drizzle ORM and RLS policies. The exact error: NeonDbError: prepared statement \"s257\" does not exist.

Why it happens:
Neon's connection pooler (PgBouncer) runs in session mode, but prepared statements get mixed up when you have multiple concurrent requests hitting the same connection pool.

The workaround:

// Instead of relying on prepared statements
const result = await db.execute(
    sql`SELECT * FROM users WHERE id = ${userId}`
);

// Use unpooled connections for complex queries
const client = new Client({
    connectionString: process.env.DIRECT_DATABASE_URL // Non-pooled connection
});
await client.connect();
const result = await client.query('SELECT * FROM users WHERE id = $1', [userId]);
await client.end();

Better fix: Upgrade to Neon's latest serverless driver (v0.9.0+) which handles this issue better, or switch to session pooling instead of transaction pooling if your use case allows it.

Debugging Slow Query Performance

"My queries are slow on Neon but fast locally" is a common complaint. Here's systematic debugging:

Step 1: Rule out connection issues

-- Check if you're hitting connection limits
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';

-- Look for blocking queries
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Step 2: Enable query logging

-- Enable slow query logging (requires restart)
ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();

-- Or check current query stats
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements 
ORDER BY mean_exec_time DESC 
LIMIT 20;

Step 3: Check your indexes

-- Find tables without primary keys (performance killer)
SELECT schemaname, tablename 
FROM pg_tables 
WHERE schemaname = 'public'
AND tablename NOT IN (
    SELECT tablename 
    FROM pg_indexes 
    WHERE indexname LIKE '%_pkey'
);

-- Check for unused indexes (wasting space)
SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexname::regclass))
FROM pg_stat_user_indexes 
WHERE idx_scan = 0;

Most "slow query" issues on Neon are actually connection pool exhaustion masquerading as performance problems. Fix your connection management first, then worry about query optimization.

When Neon Support Actually Helps

I've opened 4 tickets with Neon support. Here's what they're good at and what they're not:

They'll actually fix:

  • Infrastructure outages (rare but happens)
  • Billing issues and quota adjustments
  • Configuration help for enterprise features
  • Connection pooler tuning for high-traffic apps

They can't help with:

  • Your application's connection management
  • Query optimization (that's on you)
  • Third-party integration issues
  • "Why is my app slow?" without specific debugging info

How to get useful help:
Include your project ID, exact error messages, and steps to reproduce. "My app is slow" gets a copy-paste response. "Project ep-xxx shows 100% CPU usage at 14:30 UTC with this specific query" gets real engineering attention.

The Discord community at discord.gg/92vNTzKDGp is actually more helpful for debugging application-level issues. Neon engineers hang out there and respond faster than formal support tickets. For additional community resources, check the Neon GitHub discussions.

Debugging Tools and Solutions Comparison

Issue

Neon Console

Database Queries

Application Logs

Third-Party Tools

Connection Exhaustion

✅ Shows active connections

pg_stat_activity shows exact usage

⚠️ Connection timeout errors

❌ Limited visibility

Cold Start Detection

✅ Compute status indicator

❌ No direct visibility

✅ Request timing spikes

✅ APM tools show latency

Autoscaling Costs

✅ Real-time CU usage & billing

❌ Not visible in DB

❌ Application unaware

⚠️ Some APM tools track costs

Slow Queries

⚠️ Shows CPU/memory usage

pg_stat_statements + logs

⚠️ Query timeout errors

✅ Query performance monitoring

Storage Usage

✅ Branch-by-branch breakdown

pg_size_pretty() functions

❌ Not visible

❌ Usually not tracked

Advanced Production Troubleshooting Q&A

Q

My Neon console shows "Database unavailable" but everything worked yesterday

A

Check if your compute hit the concurrent endpoint limit. Neon limits you to 20 active computes (branches) by default. I hit this when my CI/CD pipeline created a new branch for every PR and never cleaned them up. Had 23 database branches running tests simultaneously.

Q

Error: "remaining connection slots are reserved for roles with the SUPERUSER attribute"

A

You maxed out Postgres max_connections. This is different from Neon's pooler limits.Free tier: 112 max connections. Launch: 450. Scale: 900.

Quick fix:

-- Find and kill problematic connections  
SELECT pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE state = 'idle' 
AND state_change < now() - interval '5 minutes'
AND pid <> pg_backend_pid();

Real fix: Use connection pooling properly or upgrade your compute size.

Q

Why does my app work locally but fail on Vercel/Netlify?

A

Serverless functions create way more concurrent connections than you think. Each function invocation can grab 5-10 connections, and platforms like Vercel can run 50+ functions simultaneously.

Your local dev server uses 1 connection. Production uses 50. Do the math.

Fix: Set connection_limit=2 in your database URL or use a connection pool library.

Q

"query_wait_timeout SSL connection has been closed unexpectedly"

A

Your queries are sitting in PgBouncer's queue too long. Default timeout is 120 seconds.

This usually means you have:

  1. Really slow queries blocking the queue
  2. Too many connections hitting a small pool
  3. A query that's stuck waiting for locks

Debug it:

-- Check for long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query 
FROM pg_stat_activity 
WHERE (now() - pg_stat_activity.query_start) > interval '30 seconds';

-- Check for lock waits  
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock';
Q

My database randomly disconnects with "terminating connection due to administrator command"

A

Your compute went to sleep while you had an active connection. Neon suspends computes after 5 minutes by default.

Don't try to keep connections open across suspend/resume cycles. Your application should reconnect automatically.

Fix: Use a database library that handles reconnection, or disable auto-suspend if you can afford it.

Q

Branch creation fails with "cannot create branch: rate limit exceeded"

A

You're hitting Neon's API rate limits. The Management API has usage limits to prevent abuse.

I've seen this in CI pipelines that create/delete branches aggressively for every commit.

Fix: Add delays between branch operations or batch your operations.

Q

Error: "unsupported startup parameter" with my ORM

A

Your ORM is trying to set PostgreSQL parameters that PgBouncer doesn't support. Common culprits:

  • application_name with special characters
  • Custom timezone settings
  • Extensions loaded at connection time

Fix: Either remove the parameter or use a non-pooled connection string for that specific use case.

Q

Why is my bill 10x higher than expected?

A

Three common causes:

  1. Autoscaling went crazy - Check your CU usage graphs
  2. Forgot to delete branches - Each branch costs storage separately
  3. Point-in-time recovery - Adds $0.20/GB-month for write-heavy workloads

Check this:

## List all your branches with their sizes
neon branches list --project-id your-project-id

## Check point-in-time recovery usage
## (Only available in Neon console, not CLI)

Most expensive mistakes I've seen: Leaving 15 branches with 2GB each running for a month = $10.50 extra just for branch storage.

Q

"DNS lookup failed" or "getaddrinfo ENOTFOUND" errors

A

Your network can't resolve Neon's hostnames. This is usually:

  1. Corporate firewall blocking DNS
  2. Your ISP's DNS servers having issues
  3. Regional DNS propagation problems

Debug steps:

## Test DNS resolution
nslookup ep-your-endpoint.region.aws.neon.tech

## Try with public DNS
nslookup ep-your-endpoint.region.aws.neon.tech 8.8.8.8

## Check if it's a specific region issue
ping ep-your-endpoint.region.aws.neon.tech

Fix: Switch to a public DNS provider (8.8.8.8) or use a VPN.

Q

My migrations run forever and then timeout

A

Neon doesn't give you superuser access. Some migration tools expect it and hang waiting for permissions that will never come.

These will hang:

  • Loading extensions that need superuser
  • Setting system-wide configuration
  • Creating/modifying system catalogs

Check your migration logs for:

  • CREATE EXTENSION without IF NOT EXISTS
  • ALTER SYSTEM SET commands
  • Custom procedural languages

Fix: Review your migrations for superuser-only operations and remove or modify them.

Q

Connection works in development but fails in production with SSL errors

A

Your production environment is enforcing SSL but your local setup isn't. Neon requires SSL connections.

Fix: Add ?sslmode=require to your connection string. For some clients you need ?ssl=true instead.

## Local (works without SSL)
postgresql://user:pass@localhost:5432/db

## Production (needs SSL) 
postgresql://user:pass@ep-xxx.neon.tech/db?sslmode=require

Essential Troubleshooting Resources

Related Tools & Recommendations

compare
Recommended

I Tested Every Heroku Alternative So You Don't Have To

Vercel, Railway, Render, and Fly.io - Which one won't bankrupt you?

Vercel
/compare/vercel/railway/render/fly/deployment-platforms-comparison
100%
tool
Similar content

PostgreSQL: Why It Excels & Production Troubleshooting Guide

Explore PostgreSQL's advantages over other databases, dive into real-world production horror stories, solutions for common issues, and expert debugging tips.

PostgreSQL
/tool/postgresql/overview
90%
tool
Similar content

Node.js Production Troubleshooting: Debug Crashes & Memory Leaks

When your Node.js app crashes in production and nobody knows why. The complete survival guide for debugging real-world disasters.

Node.js
/tool/node.js/production-troubleshooting
86%
tool
Similar content

TaxBit Enterprise Production Troubleshooting: Debug & Fix Issues

Real errors, working fixes, and why your monitoring needs to catch these before 3AM calls

TaxBit Enterprise
/tool/taxbit-enterprise/production-troubleshooting
80%
integration
Recommended

I Spent Two Weekends Getting Supabase Auth Working with Next.js 13+

Here's what actually works (and what will break your app)

Supabase
/integration/supabase-nextjs/server-side-auth-guide
78%
pricing
Recommended

Backend Pricing Reality Check: Supabase vs Firebase vs AWS Amplify

Got burned by a Firebase bill that went from like $40 to $800+ after Reddit hug of death. Firebase real-time listeners leak memory if you don't unsubscribe prop

Supabase
/pricing/supabase-firebase-amplify-cost-comparison/comprehensive-pricing-breakdown
67%
pricing
Recommended

How These Database Platforms Will Fuck Your Budget

competes with MongoDB Atlas

MongoDB Atlas
/pricing/mongodb-atlas-vs-planetscale-vs-supabase/total-cost-comparison
67%
tool
Similar content

React Production Debugging: Fix App Crashes & White Screens

Five ways React apps crash in production that'll make you question your life choices.

React
/tool/react/debugging-production-issues
59%
troubleshoot
Similar content

Fix MongoDB "Topology Was Destroyed" Connection Pool Errors

Production-tested solutions for MongoDB topology errors that break Node.js apps and kill database connections

MongoDB
/troubleshoot/mongodb-topology-closed/connection-pool-exhaustion-solutions
59%
tool
Similar content

Debugging AI Coding Assistant Failures: Copilot, Cursor & More

Your AI assistant just crashed VS Code again? Welcome to the club - here's how to actually fix it

GitHub Copilot
/tool/ai-coding-assistants/debugging-production-failures
57%
tool
Similar content

Fix TaxAct Errors: Login, WebView2, E-file & State Rejection Guide

The 3am tax deadline debugging guide for login crashes, WebView2 errors, and all the shit that goes wrong when you need it to work

TaxAct
/tool/taxact/troubleshooting-guide
57%
tool
Similar content

LM Studio Performance: Fix Crashes & Speed Up Local AI

Stop fighting memory crashes and thermal throttling. Here's how to make LM Studio actually work on real hardware.

LM Studio
/tool/lm-studio/performance-optimization
55%
tool
Similar content

Arbitrum Production Debugging: Fix Gas & WASM Errors in Live Dapps

Real debugging for developers who've been burned by production failures

Arbitrum SDK
/tool/arbitrum-development-tools/production-debugging-guide
55%
tool
Similar content

Redis Overview: In-Memory Database, Caching & Getting Started

The world's fastest in-memory database, providing cloud and on-premises solutions for caching, vector search, and NoSQL databases that seamlessly fit into any t

Redis
/tool/redis/overview
53%
tool
Similar content

Open Policy Agent (OPA): Centralize Authorization & Policy Management

Stop hardcoding "if user.role == admin" across 47 microservices - ask OPA instead

/tool/open-policy-agent/overview
53%
compare
Recommended

PostgreSQL vs MySQL vs MongoDB vs Cassandra - Which Database Will Ruin Your Weekend Less?

Skip the bullshit. Here's what breaks in production.

PostgreSQL
/compare/postgresql/mysql/mongodb/cassandra/comprehensive-database-comparison
52%
howto
Recommended

MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide

Migrate MySQL to PostgreSQL without destroying your career (probably)

MySQL
/howto/migrate-mysql-to-postgresql-production/mysql-to-postgresql-production-migration
52%
howto
Recommended

I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too

Four Months of Pain, 47k Lost Sessions, and What Actually Works

MongoDB
/howto/migrate-mongodb-to-postgresql/complete-migration-guide
52%
tool
Similar content

Certbot: Get Free SSL Certificates & Simplify Installation

Learn how Certbot simplifies obtaining and installing free SSL/TLS certificates. This guide covers installation, common issues like renewal failures, and config

Certbot
/tool/certbot/overview
49%
tool
Similar content

Node.js Security Hardening Guide: Protect Your Apps

Master Node.js security hardening. Learn to manage npm dependencies, fix vulnerabilities, implement secure authentication, HTTPS, and input validation.

Node.js
/tool/node.js/security-hardening
49%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization