Why did my database turn to garbage overnight?

Your SQLite was blazing fast yesterday, today everything takes forever. I've debugged this exact nightmare dozens of times:

Why does my import script take all fucking day?

You're doing individual INSERTs without transactions. Each INSERT waits for disk confirmation. Import 200K rows and you're doing 200K disk syncs.```sql-- This will ruin your weekendfor row in data: conn.execute("INSERT INTO table VALUES (?)", row) -- disk sync every time-- Do this insteadconn.execute("BEGIN")for row in data: conn.execute("INSERT INTO table VALUES (?)", row)conn.execute("COMMIT") -- One disk sync for everything```I had an import that took 8 hours because of this. Added BEGIN/COMMIT and it finished in 12 minutes. Individual INSERTs get you maybe 30-50/second. Batched gets you tens of thousands per second.

Why is SQLite pegging my CPU on simple queries?

Table scans. SQLite is reading every goddamn row to find what you want.```sqlEXPLAIN QUERY PLAN SELECT * FROM users WHERE email = 'user@example.com';```If you see "SCAN TABLE users", you're fucked. SQLite is checking 2 million rows to find one email.**The indexes people always forget**:- Foreign keys: `user_id`, `order_id`, anything ending in `_id`- Status fields: `active`, `published`, `deleted`- Date columns: `created_at`, `updated_at`Filtering on multiple columns? You need a composite index: `CREATE INDEX idx_users_status_date ON users(status, created_at)`. SQLite can't combine separate indexes like PostgreSQL.

Why is my database file huge when my data isn't?

Usually one of these: 1. **Auto-vacuum is off**: Deleted data never gets reclaimed. The file just grows forever even when you delete stuff. 2. **Massive WAL file**: Your WAL file is 10GB because checkpoints aren't running. The data is sitting in the WAL file instead of the main database. 3. **Page fragmentation**: After tons of updates and deletes, pages get fragmented to hell. **Quick fix**: Run `PRAGMA wal_checkpoint(TRUNCATE)` then `VACUUM`. This will probably shrink your file by 50%. I once had a 40GB database that was only 8GB of actual data. The rest was fragmented bullshit and an enormous WAL file.

I enabled WAL mode but writes are still slow as shit

Check these fuckups: 1. **Still using synchronous=FULL**: Change to `PRAGMA synchronous = NORMAL` for way better performance 2. **Individual transactions**: WAL mode doesn't fix the overhead of doing one INSERT at a time 3. **Huge WAL files**: If your WAL file is 5GB, checkpoints take forever and block everything 4. **Network filesystems**: WAL mode needs shared memory, which doesn't work over NFS or similar network bullshit Run `PRAGMA journal_mode` to check if WAL actually enabled. If your filesystem doesn't support shared memory, SQLite silently falls back to the slow rollback journal.

The "database is locked" error from hell

This error message tells you nothing useful. Could be anything: **Some asshole started a transaction and never finished it**: Most common cause. Code called `BEGIN` then crashed without `COMMIT` or `ROLLBACK`. That connection holds the lock until it gets closed, which might be never. **A query is scanning millions of rows**: Long SELECT queries block writers in rollback journal mode. That analytics query scanning your entire users table is locking out all writes. **File permissions are fucked**: SQLite needs write access to the directory, not just the database file. Check if your app can write to `/var/lib/myapp/` not just the `.db` file. **Network filesystem bullshit**: Don't use SQLite over NFS. File locking over networks is broken and randomly fails. To debug: Set `PRAGMA busy_timeout = 30000` so SQLite retries instead of failing immediately. Add logging to find which transaction never commits.

Multiple databases vs one big database?

**Multiple databases when**: - Different access patterns (some read-only, some write-heavy) - Data that never needs to be joined - Different backup schedules - Multi-tenant apps that need isolation **One database when**: - You need transactions across tables - Foreign keys between tables - Simpler connection handling - Total data under a few hundred GB **Reality**: Multiple databases mean more file handles and duplicated caches, but better concurrency since each database locks independently. I usually start with one database and split later if needed.

How much cache should I give SQLite?

**Safe**: 32MB per database (`PRAGMA cache_size = -32000`) **Aggressive**: 25% of your available RAM **Tiny containers**: 8-16MB (`PRAGMA cache_size = -8000`) **Rule**: If your frequently accessed data fits in cache, queries run at RAM speed. Monitor memory usage and adjust. ```sql -- Check if cache is helping .stats on -- Run your normal queries -- Look for cache hit ratio ``` High cache hit ratios (>90%) mean good sizing. Low ratios mean you need more cache or your queries are all over the place.

Works great locally, sucks in production - why?

**Common production bullshit**: 1. **Shit storage**: Dev machine has NVMe SSD, production has slow network storage 2. **Resource limits**: Production container has 512MB RAM vs your 16GB laptop 3. **Actual load**: Multiple users hitting the database instead of just you 4. **Backup interference**: Nightly backups holding locks during peak hours 5. **Monitoring overhead**: APM tools constantly scanning the database **Debug**: Enable `.timer on` in sqlite3 CLI and compare query times. The difference usually shows you what's fucked.

When should I give up on SQLite and use PostgreSQL?

Switch when you hit these walls: **Write concurrency**: Thousands of writes per second or many simultaneous writers **Database size**: Multi-terabyte databases (SQLite can handle it but becomes a pain to manage) **Geographic stuff**: Need replication across regions **Complex permissions**: Row-level security or complicated user management **Advanced features**: Decent full-text search, complex JSON operations, custom data types **Don't switch too early**: Lots of successful companies run on SQLite way longer than you'd think. Expensify processes millions of transactions per day on SQLite.

How do I monitor SQLite in production?

**Built-in stuff**: ```sql -- Time your queries .timer on -- See cache stats .stats on -- Check cache settings PRAGMA cache_size; ``` **App-level monitoring**: - Log slow queries (anything over 100ms) - Track database file size (watch for runaway growth) - Monitor WAL checkpoint frequency - Alert on "database locked" errors (these are bad) **System monitoring**: - Disk I/O (WAL should show sequential writes, not random) - Memory usage (cache + memory mapping) - File descriptors (connection leaks)

My queries got slower after adding an index - what the fuck?

The query optimizer sometimes makes stupid decisions. Update statistics or force the right index: ```sql -- Update table stats (run after big data changes) ANALYZE; -- Force the right index if the optimizer is being dumb SELECT * FROM users INDEXED BY idx_users_email WHERE email = ?; ``` **Common fuckup**: You created an index on `(status, created_at)` but your query filters on `(created_at, status)`. Index column order has to match your most selective filters first.

Currently viewing the AI version

Switch to human version

SQLite Performance Optimization: AI-Optimized Technical Reference

Configuration Settings That Actually Work

Critical Performance Settings

-- Essential performance configuration
PRAGMA journal_mode = WAL;           -- Enable Write-Ahead Logging
PRAGMA synchronous = NORMAL;         -- Reduce disk sync overhead
PRAGMA cache_size = -64000;          -- 64MB cache (negative = KB)
PRAGMA mmap_size = 268435456;        -- 256MB memory mapping
PRAGMA temp_store = memory;          -- Keep temp tables in RAM
PRAGMA wal_autocheckpoint = 1000;    -- Default checkpoint interval

Performance Impact by Configuration

Setting	Write Performance	Read Performance	Memory Usage	Data Safety	Trade-off
Default Settings	30-60 writes/sec	Decent	2MB	Safe	Performance for safety
WAL + Normal Sync	3000+ writes/sec	Decent	Medium	99.9% safe	Seconds of data loss risk
WAL + Off Sync	10000+ writes/sec	Decent	Medium	High risk	All safety for speed
Large Cache (64MB+)	Variable	10x faster	High	Safe	RAM for speed
Memory Mapping	Variable	5x faster	High	Safe	RAM for read speed

Critical Failure Modes and Solutions

Transaction Batching - Most Common Performance Killer

Problem: Individual INSERTs cause disk sync per operation

Symptoms: 30-50 inserts/second maximum, import scripts taking hours
Root Cause: Each INSERT is its own transaction requiring disk confirmation
Solution: Batch operations in transactions

-- WRONG: Individual transactions
for row in data:
    INSERT INTO users (name, email) VALUES (?, ?);  -- 200K disk syncs

-- CORRECT: Batched transactions  
BEGIN;
for row in data:
    INSERT INTO users (name, email) VALUES (?, ?);
COMMIT;  -- One disk sync for entire batch

Performance Impact: 8 hours → 8 minutes (100x improvement)
Batch Size Limits: 5K-10K records per batch (larger batches lock database)

WAL Mode Silent Failures

Problem: WAL mode silently disabled on incompatible filesystems

Docker for Mac: WAL mode fails silently, falls back to DELETE journal
Network Filesystems: NFS doesn't support shared memory required for WAL
Detection: Run PRAGMA journal_mode; to verify actual mode
Symptoms: Expected performance gains don't materialize
Workaround: Use containers with native filesystem or PostgreSQL for network storage

Backup Script Failures with WAL Mode

Problem: WAL creates 3 files (.db, .db-wal, .db-shm), backup scripts often copy only .db file

Data Loss Risk: Active data sits in WAL file, not backed up
Solution: Checkpoint before backup: PRAGMA wal_checkpoint(TRUNCATE);
Alternative: Copy all three files atomically

Index Design Failures

Critical Limitation: SQLite uses only ONE index per table per query

-- INEFFECTIVE: Separate indexes don't combine
CREATE INDEX idx_users_name ON users(name);
CREATE INDEX idx_users_status ON users(status);
-- Query uses only ONE index, scans for rest
SELECT * FROM users WHERE name = 'Alice' AND status = 'active';

-- CORRECT: Composite index
CREATE INDEX idx_users_name_status ON users(name, status);

Column Order: Most selective column first
Partial Index Usage: Can't use right part without left part

Memory Configuration Disasters

OOMKill Risk: Memory mapping + large cache can exceed container limits

Kubernetes: mmap_size = 2GB + cache + app memory > container limit = OOMKill
Safe Formula: mmap_size + cache_size < 50% of available memory
Platform Issues: macOS has unpredictable virtual memory behavior, Windows not recommended

Resource Requirements and Scaling Limits

Memory Requirements by Use Case

Development: 32MB cache, 256MB mmap
Production Web App: 64-128MB cache per connection
Analytics Workload: Up to 25% of system RAM
Container (512MB): 8-16MB cache, minimal mmap
Container (2GB+): 64MB cache, 256MB mmap

When to Abandon SQLite

Hard Limits:

Concurrent Writers: More than ~100 writes/second from multiple connections
Database Size: Multi-terabyte databases (technically possible, operationally painful)
Geographic Distribution: No built-in replication
Complex Analytics: Lacks advanced JSON, full-text search, custom types

Migration Threshold: Expensify processes millions of requests/day on SQLite - don't migrate prematurely

Connection Pool Anti-Pattern

Problem: More connections hurt SQLite performance

Why: Each connection has separate cache (50 connections × 10MB = 500MB duplicated cache)
Better: 5 connections × 100MB cache each
Thread Safety: One connection per thread to avoid corruption

Debugging and Monitoring

Essential Diagnostic Commands

-- Performance analysis
EXPLAIN QUERY PLAN SELECT ...;  -- Find table scans and missing indexes
.timer on                       -- Measure query execution time
.stats on                       -- Monitor cache hit ratios

-- Health checks
PRAGMA journal_mode;            -- Verify WAL mode enabled
PRAGMA cache_size;              -- Check memory allocation
PRAGMA wal_checkpoint(TRUNCATE); -- Force checkpoint and cleanup

Critical Warning Signs

"SCAN TABLE" in query plan: Missing index, checking every row
"USING TEMP B-TREE": Building temporary indexes in memory
WAL file >1GB: Checkpoints failing, disk space risk
"Database is locked": Transaction never committed or connection leak
Cache hit ratio <90%: Insufficient cache for workload

Production Monitoring Checklist

Slow Query Threshold: >100ms indicates problems
File Size Monitoring: Database + WAL file growth
Error Rate: "Database is locked" errors indicate serious issues
Memory Usage: Cache + mmap vs available memory
Checkpoint Frequency: WAL should checkpoint regularly

Emergency Performance Recovery

Immediate Actions for Production Issues

Checkpoint WAL file: PRAGMA wal_checkpoint(TRUNCATE);
Update statistics: ANALYZE;
Increase cache: PRAGMA cache_size = -128000; (128MB)
Enable memory temp storage: PRAGMA temp_store = memory;

Maintenance Window Actions

Defragment database: VACUUM;
Rebuild indexes: REINDEX;
Check file fragmentation: filefrag -v database.db

Lock Debugging Strategy

# Log transaction lifecycle to find hanging transactions
import time, threading, logging

def debug_transaction(conn):
    thread_id = threading.get_ident()
    transaction_start = time.time()
    
    try:
        conn.execute("BEGIN")
        logging.info(f"Transaction started on thread {thread_id}")
        # ... your operations ...
        conn.execute("COMMIT")
        duration = time.time() - transaction_start
        logging.info(f"Transaction completed in {duration:.2f}s")
    except Exception as e:
        logging.error(f"Transaction failed on thread {thread_id}: {e}")
        conn.execute("ROLLBACK")

Performance Testing Framework

Load Testing with Real Data Patterns

-- Generate realistic test data
INSERT INTO test_table 
SELECT 
    random() % 1000000 as id,
    CASE WHEN random() % 10 = 0 THEN 'premium' ELSE 'standard' END as status,
    datetime('now', '-' || (random() % 365) || ' days') as created_at
FROM (
    WITH RECURSIVE series(x) AS (
        SELECT 0 UNION ALL SELECT x+1 FROM series LIMIT 1000000
    ) SELECT x FROM series
);

Automated Performance Regression Detection

def benchmark_critical_queries(conn):
    critical_queries = [
        ("User lookup", "SELECT * FROM users WHERE email = ?", ['test@example.com']),
        ("Status filter", "SELECT * FROM users WHERE status = ?", ['active']),
        ("Date range", "SELECT * FROM orders WHERE created_at > ?", ['2024-01-01'])
    ]
    
    for name, query, params in critical_queries:
        times = []
        for _ in range(100):
            start = time.perf_counter()
            conn.execute(query, params).fetchall()
            times.append(time.perf_counter() - start)
        
        avg_time = sum(times) / len(times)
        p95_time = sorted(times)[95]
        
        if avg_time > 0.1:  # 100ms threshold
            print(f"PERFORMANCE REGRESSION: {name} averaging {avg_time:.3f}s")

Technical Specifications and Thresholds

Safe Operating Limits

Transaction Batch Size: 5K-10K operations
Cache Size: 25-50% of available RAM
WAL File Size: <1GB (checkpoint when larger)
Memory Mapping: <50% of container memory limit
Connection Pool: 3-5 connections maximum
Query Timeout: 30 seconds with busy_timeout

Platform-Specific Considerations

Linux: Optimal performance, supports all features
macOS: Unpredictable memory mapping behavior
Windows: Avoid for production workloads
Docker: WAL mode fails on osxfs, use Linux containers
Network Storage: Use PostgreSQL instead, file locking unreliable

This reference provides the operational intelligence needed to successfully implement and maintain SQLite in production environments while avoiding common failure modes that cause performance degradation and data integrity issues.

Useful Links for Further Investigation

SQLite Performance Resources That Actually Help

Link	Description
SQLite PRAGMA Statements	The configuration reference you'll actually use for managing and optimizing SQLite database behavior.
Write-Ahead Logging (WAL)	Official WAL mode docs when you need to understand what broke, detailing its benefits and operational aspects.
EXPLAIN QUERY PLAN	How to debug slow queries by analyzing the execution plan of SQL statements in SQLite.
phiresky's SQLite Performance Guide	A real-world optimization guide for SQLite performance tuning that provides practical advice and actually works.
Simon Willison's SQLite TILs	Practical tips and tricks from someone who knows what they're talking about, covering various SQLite use cases.
Expensify's 4M QPS on SQLite	A detailed case study on how Expensify managed to scale SQLite to millions of requests per day on a single server.
SQLite vs Filesystem Performance	An analysis explaining why SQLite can often be faster and more efficient than direct file system access for data storage.
Node.js better-sqlite3	The fastest and most efficient SQLite wrapper for Node.js applications, offering synchronous and asynchronous operations.
Python sqlite3 docs	The official Python SQLite documentation, covering the `sqlite3` module for interacting with SQLite databases.

SQLite Performance Optimization: AI-Optimized Technical Reference

Configuration Settings That Actually Work

Critical Performance Settings

Performance Impact by Configuration

Critical Failure Modes and Solutions

Transaction Batching - Most Common Performance Killer

WAL Mode Silent Failures

Backup Script Failures with WAL Mode

Index Design Failures

Memory Configuration Disasters

Resource Requirements and Scaling Limits

Memory Requirements by Use Case

When to Abandon SQLite

Connection Pool Anti-Pattern

Debugging and Monitoring

Essential Diagnostic Commands

Critical Warning Signs

Production Monitoring Checklist

Emergency Performance Recovery

Immediate Actions for Production Issues

Maintenance Window Actions

Lock Debugging Strategy

Performance Testing Framework

Load Testing with Real Data Patterns

Automated Performance Regression Detection

Technical Specifications and Thresholds

Safe Operating Limits

Platform-Specific Considerations

Useful Links for Further Investigation

SQLite Performance Resources That Actually Help

Related Tools & Recommendations

MongoDB vs PostgreSQL vs MySQL: Which One Won't Ruin Your Weekend

Why I Finally Dumped Cassandra After 5 Years of 3AM Hell

I Survived Our MongoDB to PostgreSQL Migration - Here's How You Can Too

MySQL Replication - How to Keep Your Database Alive When Shit Goes Wrong

MySQL Alternatives That Don't Suck - A Migration Reality Check

Python 3.13 Production Deployment - What Actually Breaks

Python 3.13 Finally Lets You Ditch the GIL - Here's How to Install It

Python Performance Disasters - What Actually Works When Everything's On Fire

Android 16 Public Beta Launches with Live Updates and Dark Mode Force

Android Studio - Google's Official Android IDE

Why Enterprise AI Coding Tools Cost 10x What They Advertise

Stripe Terminal iOS Integration: The Only Way That Actually Works

Fix Kubernetes Pod OOMKilled When Memory Looks Fine

PostgreSQL vs MySQL vs MariaDB vs SQLite vs CockroachDB - Pick the Database That Won't Ruin Your Life

PostgreSQL vs MySQL vs MariaDB - Performance Analysis 2025

MariaDB - What MySQL Should Have Been

DuckDB - When Pandas Dies and Spark is Overkill

DuckDB Performance Tuning That Actually Works

Deploy Django with Docker Compose - Complete Production Guide

Stop Waiting 3 Seconds for Your Django Pages to Load