Fly.io Database Management: AI-Optimized Reference
Critical Decision Matrix
Database Option | Monthly Cost | Setup Time | 3AM Pages | Best For |
---|---|---|---|---|
MPG (Managed PostgreSQL) | $38-962+ | 5 minutes | Rare (auto-failover) | Production apps with budget |
Self-hosted PostgreSQL | $15-40 | 6+ hours | You're the DBA | Cost-conscious with expertise |
SQLite + Litestream | $5-15 | 5 minutes | 60s restore from S3 | Single-writer apps |
External Services | $25-75 | Account signup | Provider handles | Most recommended |
Configuration That Works in Production
MPG (Managed PostgreSQL)
Cost Reality: $38 minimum (Starter), $72 (Basic), $282-962 (Production)
- Only 7 regions available - permanent choice
- Built-in PgBouncer connection pooling
- Auto-failover in 30 seconds
- Automatic backups (actually work)
Implementation:
fly postgres create --name my-app-db --region iad --plan starter
fly secrets set DATABASE_URL=postgresql://username:password@my-app-db.internal:5432/my_app_production
Critical Limitation: Region choice is permanent. Wrong region = 200ms+ queries forever.
Self-hosted PostgreSQL
Hidden Costs:
- 6+ hours initial setup
- Weekend debugging sessions
- Manual backup management
- You handle all failures
Production Config:
-- postgresql.conf settings that don't crash
shared_buffers = '256MB' -- 25% of RAM max
effective_cache_size = '1GB' -- Tell PG available OS RAM
work_mem = '4MB' -- Per-query memory
max_connections = 100 -- More = more problems
log_statement = 'all' -- Essential for debugging
Backup Script (Test Before You Need It):
pg_dump $DATABASE_URL | gzip > backup-$(date +%Y%m%d-%H%M%S).sql.gz
aws s3 cp backup-*.sql.gz s3://my-backups/postgres/
SQLite + Litestream
Performance Reality: 0.1ms queries (no network hop)
- Only one writer, unlimited readers
- Litestream streams to S3 every second
- 60-second restore time from S3
- Expensify runs 4M queries/second on SQLite
Mandatory PRAGMA Settings:
PRAGMA journal_mode = WAL; -- Required for concurrency
PRAGMA synchronous = NORMAL; -- FULL too slow, OFF loses data
PRAGMA cache_size = 10000; -- More cache = faster
PRAGMA foreign_keys = ON; -- Data integrity
PRAGMA temp_store = MEMORY; -- RAM faster than disk
Litestream Config:
dbs:
- path: /data/app.db
replicas:
- url: s3://my-backup-bucket/db
retention: 72h
sync-interval: 1s # Every second backup
Critical Failure Modes
Connection Pool Hell
Symptom: "too many connections" errors during traffic spikes
Solution: Always use connection pooling
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Don't exceed database limits
idleTimeoutMillis: 30000, // Kill idle connections
connectionTimeoutMillis: 2000 // Fail fast
});
Volume Disasters
Reality: Volumes glued to one region/machine
- When volumes disappear:
ENOENT: no such file or directory
- No cross-region volume migration
- Max 500GB per volume
Check Volume Health:
fly volumes list --app my-app
fly ssh console --app my-app
df -h # Verify volume mounted
Backup Failures
Critical Test: Restore backup BEFORE production disaster
# Test PostgreSQL backup restore
pg_restore --verbose --clean --no-acl --no-owner -h localhost -U username -d test_db backup.sql
# Test Litestream restore (30 seconds if S3 responsive)
litestream restore -config litestream.yml /tmp/restored.db
sqlite3 /tmp/restored.db ".tables"
Performance Thresholds
Latency Expectations
- Same region: 1-5ms (acceptable)
- Cross-region: 50-200ms (users notice)
- SQLite: 0.1ms (stupid fast)
- External DB: 25-100ms (network tax)
Breaking Points
- UI fails at 1000 spans (distributed tracing becomes unusable)
- Connection limits hit during traffic spikes (causes cascading failures)
- Volume corruption (complete data loss without backups)
Resource Requirements
Time Investment
- MPG setup: 5 minutes
- Self-hosted setup: 6+ hours initial, ongoing maintenance
- Migration complexity: 2x estimated time, always have rollback plan
- Disaster recovery testing: Monthly or learn during outage
Expertise Requirements
- SQLite: Basic SQL knowledge sufficient
- Self-hosted PostgreSQL: DBA-level skills mandatory
- MPG: Application developer level
- External services: Minimal database knowledge needed
Security Implementation
Secrets Management (Never Hardcode)
fly secrets set DATABASE_URL="postgresql://user:pass@host:5432/dbname"
fly secrets set REDIS_URL="redis://user:pass@host:6379"
# Application should crash if missing
const dbUrl = process.env.DATABASE_URL;
if (!dbUrl) {
throw new Error('DATABASE_URL missing, check secrets');
}
Network Security
- Use private networking (never expose DB to internet)
- SSL/TLS required for all connections
- Encrypt data at rest for compliance
Global Distribution Strategy
Read Replica Setup
fly postgres create --name my-app-db-replica --region fra --fork-from my-app-db
Routing Logic
const getDatabaseUrl = (operation) => {
const region = process.env.FLY_REGION;
if (operation === 'read') {
switch (region) {
case 'fra': return process.env.DATABASE_URL_EU;
case 'nrt': return process.env.DATABASE_URL_ASIA;
default: return process.env.DATABASE_URL_US;
}
}
// All writes to primary (avoid split-brain)
return process.env.DATABASE_URL_US;
};
Monitoring Critical Metrics
Essential Alerts
- Connection count (before hitting limits)
- Slow queries (>100ms indicates problems)
- Replication lag (data consistency issues)
- Disk space (running out kills databases)
- Backup success (untested backups are lies)
Query Performance
-- Find slow queries
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Check connection usage
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
-- Verify replication health
SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();
Migration Strategies
PostgreSQL Migration
pg_dump $OLD_DATABASE_URL > migration.sql
psql $NEW_DATABASE_URL < migration.sql
Common Migration Failures
- Character encoding mismatches
- Foreign key constraint violations during restore
- Sequence resets break auto-increment IDs
- Plan for 2x estimated time + rollback strategy
Cost Optimization Decisions
When to Choose Each Option
Pay for MPG if:
- Monthly revenue > $2000
- Previous self-hosted disasters
- Sleep valued over cost optimization
- Users in MPG's 7 regions
Self-host if:
- Budget-conscious + database competent
- Custom PostgreSQL extensions needed
- Enjoy 3am debugging sessions
Use SQLite if:
- Single-writer application pattern
- Sub-millisecond query requirements
- Simple deployment preferences
External services if:
- Focus on application over infrastructure
- Global distribution required
- Team lacks database expertise
Vendor Lock-in Assessment
- MPG: Moderate (standard PostgreSQL underneath)
- Self-hosted: Zero (Docker containers portable)
- SQLite: Zero (single file format)
- External services: High (proprietary APIs)
Disaster Recovery Procedures
Automatic Failover Times
- MPG: 30 seconds auto-failover
- Self-hosted: Manual recovery (hours without clustering)
- SQLite + Litestream: 60 seconds S3 restore
- External: Provider-dependent (usually automatic)
Recovery Testing Requirements
- Test restore procedures monthly
- Verify backup integrity before disasters
- Document exact recovery steps
- Practice under time pressure
Support and Documentation Quality
Official Resources
- Fly.io docs: Good for setup, lacking troubleshooting
- PostgreSQL docs: Comprehensive but obtuse
- SQLite docs: Excellent technical reference
- Community forums: Essential for real-world problems
Community Wisdom Sources
- Fly.io Discord for real-time issues
- Database Administrators Stack Exchange
- PostgreSQL IRC for deep technical issues
- SQLite forum for performance optimization
This reference prioritizes operational intelligence over marketing claims, focusing on real-world failure modes, actual costs, and production-tested configurations that prevent 3AM emergencies.
Useful Links for Further Investigation
Essential Database Resources for Fly.io
Link | Description |
---|---|
Managed PostgreSQL (MPG) | Fly.io's fully managed PostgreSQL service pricing and features |
Fly Postgres Documentation | Self-hosted PostgreSQL setup and management guides |
Fly Volumes Overview | Persistent storage for databases, pricing at $0.15/GB/month |
Database Storage Guides | Comprehensive guide to all database options on Fly.io |
Multi-region Database Blueprint | Implementing read replicas and global database strategies |
PostgreSQL Official Documentation | The docs are comprehensive but written like a manual from 1995. Great reference, terrible for actually learning anything. |
PgBouncer Connection Pooler | You'll need this unless you enjoy connection limit errors. Docs are actually readable. |
Litestream | The thing that makes SQLite production-ready. Simple, works, saves your ass. |
pg_dump Documentation | Backup tool that actually works. Learn the flags, test your restores. |
SQLite Performance Tuning | Actually useful tuning guide. SQLite team knows their shit. |
Supabase | PostgreSQL that doesn't suck to set up. Real-time APIs actually work. |
Neon | PostgreSQL with git-like branching. Clever as hell, great for testing. |
PlanetScale | MySQL that doesn't make you hate schema migrations. Branching is magic. |
Redis Cloud | Redis without the clustering nightmare. Worth the money. |
MongoDB Atlas | Managed MongoDB that won't randomly corrupt. Still MongoDB though. |
Fly.io Metrics Dashboard | Built-in monitoring that's better than nothing |
PostgreSQL Slow Query Log | Find the queries that make your app suck |
pgAdmin | PostgreSQL admin with UI from 2005. Works but painful to use. |
Grafana Cloud | Pretty dashboards that might actually show useful data. |
New Relic Database Monitoring | Costs a fortune but actually finds problems before they fuck you. |
AWS S3 Documentation | Where you store backups when you want them to actually exist. |
Google Cloud Storage | S3 alternative that won't vendor-lock you as hard. |
Postgres Backup Best Practices | Backup strategies that might save your job. Test your restores. |
Point-in-time Recovery Guide | How to recover from that exact moment you fucked up. |
Disaster Recovery Planning | Plan for when everything goes to shit. Because it will. |
PostgreSQL Security Documentation | How to not get hacked (spoiler: it's complicated) |
Fly.io Private Networking | Keep your database traffic off the public internet. Obviously. |
SSL/TLS Configuration for PostgreSQL | Encrypt your connections or get fired when you leak data. |
Database Encryption at Rest | Encrypt your data at rest because paranoia pays off. |
GDPR and Database Compliance | Legal bullshit you need to know before storing EU data. |
Fly.io Community Forum | Where to ask when Fly.io breaks your database again. |
PostgreSQL Community | IRC from the 90s and mailing lists older than your career. |
SQLite Forum | Where SQLite developers answer your dumb questions patiently. |
Database Administrators Stack Exchange | Where DBAs explain why your schema design is garbage. |
Fly.io Discord | Real-time chat when the forum is too slow for your crisis. |
Database Migration Tools | Git for your database schema. Actually useful. |
Prisma | ORM that doesn't make you hate life. Type-safe queries. |
Hasura | Instant GraphQL API that actually works with PostgreSQL. |
PostgREST | Magically turns PostgreSQL into a REST API. No code required. |
pgloader | Migrates data without losing your sanity. Mostly. |
pgbench | Hammers your database to see when it breaks. Built-in. |
SQLite Speed Comparison | Benchmarks that prove SQLite doesn't suck. |
Database Load Testing with Artillery | Find out your database's breaking point before users do. |
PostgreSQL Explain Analyzer | Decodes PostgreSQL's cryptic query plans into English. |
Fly.io Status Page | Where you check when everything's fucked and it's not your fault. |
Related Tools & Recommendations
Railway vs Render vs Fly.io vs Vercel: Which One Won't Fuck You Over?
After way too much platform hopping
GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus
How to Wire Together the Modern DevOps Stack Without Losing Your Sanity
PostgreSQL WAL Tuning - Stop Getting Paged at 3AM
The WAL configuration guide for engineers who've been burned by shitty defaults
MySQL to PostgreSQL Production Migration: Complete Step-by-Step Guide
Migrate MySQL to PostgreSQL without destroying your career (probably)
PostgreSQL vs MySQL vs MongoDB vs Cassandra vs DynamoDB - Database Reality Check
Most database comparisons are written by people who've never deployed shit in production at 3am
Fly.io Performance Optimization - CPU Quotas, Throttling, and Scaling Guide
Navigate Fly.io performance challenges with our guide on CPU quotas, throttling, and scaling strategies. Understand the January 2025 changes and optimize your a
Fly.io - Deploy Your Apps Everywhere Without the AWS Headache
Explore Fly.io: deploy Docker apps globally across 35+ regions, avoiding single-server issues. Understand how it works, its pricing structure, and answers to co
Rust, Go, or Zig? I've Debugged All Three at 3am
What happens when you actually have to ship code that works
Fly.io Alternatives - Find Your Perfect Cloud Deployment Platform
Explore top Fly.io alternatives for cloud deployment. Compare platforms like Railway and DigitalOcean to find the perfect fit for your specific use case and bud
Render - What Heroku Should Have Been
Deploy from GitHub, get SSL automatically, and actually sleep through the night. It's like Heroku but without the wallet-draining addon ecosystem.
Render Alternatives - Budget-Based Platform Guide
Tired of Render eating your build minutes? Here are 10 platforms that actually work.
Migrate Your App Off Heroku Without Breaking Everything
I've moved 5 production apps off Heroku in the past year. Here's what actually works and what will waste your weekend.
Heroku - Git Push Deploy for Web Apps
The cloud platform where you git push and your app runs. No servers to manage, which is nice until you get a bill that costs more than your car payment.
Stop Docker from Killing Your Containers at Random (Exit Code 137 Is Not Your Friend)
Three weeks into a project and Docker Desktop suddenly decides your container needs 16GB of RAM to run a basic Node.js app
CVE-2025-9074 Docker Desktop Emergency Patch - Critical Container Escape Fixed
Critical vulnerability allowing container breakouts patched in Docker Desktop 4.44.3
Vercel - Deploy Next.js Apps That Actually Work
competes with Vercel
Deploy Next.js to Vercel Production Without Losing Your Shit
Because "it works on my machine" doesn't pay the bills
Vercel Review - I've Been Burned Three Times Now
Here's when you should actually pay Vercel's stupid prices (and when to run)
Railway - Deploy Shit Without AWS Hell
competes with Railway
Railway Killed My Demo 5 Minutes Before the Client Call
Your app dies when you hit $5. That's it. Game over.
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization