Lambda + DynamoDB works fine for most apps that don't need servers. No infrastructure to babysit, scales itself, performs well enough for 90% of what you're building. Just don't try to replace your data warehouse with this - you'll cry.
The architecture is dead simple: DynamoDB stores your data, DynamoDB Streams capture changes, Lambda processes those changes. Works great until you hit the edge cases that AWS docs pretend don't exist.
Stream Processing Reality Check
DynamoDB Streams capture every data change (INSERT, UPDATE, DELETE) and Lambda processes them in "real-time." Sounds amazing in theory. In practice, you'll want to throw your laptop out the window dealing with:
Hot Partitions Kill Everything - The 1:1:1 mapping between DynamoDB partitions, stream shards, and Lambda functions means one busy partition becomes a bottleneck for your entire processing pipeline. Design your partition keys carefully or suffer later. We learned this the hard way when a single user's activity brought down our entire notification system for 3 hours.
Stream Processing Randomly Shits the Bed - Streams work perfectly for weeks then suddenly your iterator age spikes to 2 hours and AWS support shrugs their shoulders. The docs won't tell you that DynamoDB's "adaptive capacity" takes forever to kick in during traffic spikes. Cost us a weekend trying to figure out why our analytics pipeline died during a product launch. Turns out the November 2024 DynamoDB service update changed how adaptive capacity works - no announcement, just silent breakage. Thanks AWS.
24-Hour Retention Saves Your Ass - At least when things break, you get a full day to fix them before losing data. This has literally saved production deployments when Lambda processing got backed up due to downstream service outages.
What You'll Actually Build With This
Here's what actually works in production:
Audit Logs - Stream records include old and new values, perfect for tracking who changed what. Works great until you hit DynamoDB's 400KB item limit and wonder why your audit logs are truncated. Took us 3 days to figure out why our compliance reports were missing data.
Cache Invalidation - Update Redis or ElastiCache when DynamoDB data changes. Simple pattern that works reliably, except when Lambda cold starts cause 5-second cache inconsistencies. Your users will definitely notice stale data during those moments.
Search Index Updates - Push changes to Elasticsearch automatically. This pattern works but expect occasional index inconsistencies when Lambda processing falls behind during traffic spikes. Search results lagging behind database changes is fun to explain to product managers.
Real-Time Counters - Increment view counts, likes, etc. Works well for read-heavy apps but will destroy your budget if you're tracking high-frequency events like page views.
Performance Reality Check
DynamoDB averages 2-5ms latency (not sub-millisecond like AWS marketing claims), and Lambda scales well until it doesn't.
Cold Starts Are Way More Common Than AWS Says - AWS claims <1% cold starts but in reality it's more like 5-10% during traffic spikes. The August 2025 billing changes mean you now pay for those cold starts too - budget an extra 20-30% for Lambda costs. Our monthly bill jumped $400 overnight when this kicked in. Neat surprise.
Concurrency Limits Will Bite You - The default 1,000 concurrent Lambda limit sounds high until you're processing 50,000 stream records per second. Request increases early because AWS takes 2-3 business days to approve them (or 5 minutes if you're lucky, 2 weeks if not). Learned this during Black Friday when our event processing hit the wall and died.
Stream Processing Bottlenecks - Each shard processes sequentially, so hot partitions become chokepoints. Parallelization factors up to 10 help, but they also multiply your cold start problems. More parallel processing = more cold starts = more pain and money.
Cost Optimization (AKA How Not to Go Broke)
For our 100k user app, this setup costs about $200/month. Your mileage will vary wildly based on read/write patterns.
Batch Size Matters for Your Wallet - Processing 10,000 records per Lambda invocation vs 100 records is the difference between $50/month and $500/month in Lambda costs. Use bigger batches unless latency is critical.
Memory Allocation Sweet Spot - 512MB-1GB works for most stream processing. Less memory = slower CPU = longer execution = higher costs. Use Lambda Power Tuning to find your optimal balance or just trial-and-error like the rest of us.
DynamoDB On-Demand vs Provisioned - On-demand costs 5x more per operation but scales automatically. Provisioned capacity saves money if you can predict usage (spoiler: you can't).
Monitoring (Because Production Always Breaks)
Iterator age is your most important metric - when it spikes above 30 seconds, you're in trouble.
Iterator Age = Your Stress Level - This measures how far behind Lambda processing is. Values above 1 minute mean you're losing the "real-time" part of real-time processing. Set CloudWatch alarms or prepare for angry users.
Poisoned Pills Will Ruin Your Day - One bad record can block an entire shard for 24 hours. Set MaximumRetryAttempts
to something sane like 3, enable BisectBatchOnFunctionError
, and route failures to dead letter queues before they kill your entire pipeline. Lost an entire weekend debugging this once.
X-Ray Actually Helps - Unlike most AWS services, X-Ray debugging actually works for stream processing. Enable it to see exactly where your 5-second latency is coming from (hint: it's usually your downstream API calls, not Lambda).
This architecture works well for most apps, but the real fun starts when you try to implement it. The theory sounds great - event-driven, auto-scaling, no servers to babysit. But production is where all the edge cases live, and AWS docs skip the important details like how to configure EventSourceMapping settings that won't ruin your weekend, or why your perfectly tuned function suddenly starts timing out for no goddamn reason.
Next up: the configuration and code patterns that work when you're processing millions of events and getting paged at 2am because something's broken again.