My iterator age is spiking and everything is broken. What now?

Iterator age measures how far behind Lambda processing is. When it spikes above 30 seconds, you're fucked. Here's the debug process that works: 1. Check CloudWatch for Lambda errors - one bad record can kill everything for 24 hours 2. Set `MaximumRetryAttempts` to 3 and enable `BisectBatchOnFunctionError` 3. If no errors, your function is too slow - increase memory or reduce external API calls 4. Check `ConcurrentExecutions` - you might be hitting Lambda limits

Hot partitions are throttling my DynamoDB table even though I have capacity. WTF?

Hot partitions kill performance because all traffic hits one partition key. The 1:1:1 mapping means your Lambda processing becomes single-threaded on that partition. Fix: Design better partition keys. Use `userId#timestamp` instead of just `userId`. If you're already in production, implement [write sharding](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html) - append random suffixes and modify your Lambda to read from multiple keys.

Why is my Lambda iterator age increasing and how do I fix it?

Iterator age measures how far behind Lambda processing is relative to new stream records. Increasing iterator age usually means: poison pill records blocking processing, not enough Lambda concurrency, or your function is slow as shit. First check CloudWatch for Lambda errors - one fucked up record can kill everything for 24 hours straight. Set `MaximumRetryAttempts` to something sane (3-10) and turn on `BisectBatchOnFunctionError` so Lambda isolates the poison pill. No errors? Your function's too slow - throw more memory at it or stop calling so many external APIs. Check `ConcurrentExecutions` to see if you're hitting Lambda's concurrency wall.

The new Lambda billing changes fucked my costs. How bad is it?

AWS now charges for Lambda initialization time as of August 2025. For stream processing, expect 20-30% higher Lambda costs because cold starts happen more often than AWS admits. Quick cost check: Run this CloudWatch query: `filter @type = "REPORT" | stats sum(@initDuration) by bin(5m)` to see your actual init time. Solutions: Use SnapStart for Java/Python if init time > 1 second, or Provisioned Concurrency if you have sustained traffic. Both cost more but might be cheaper than paying for constant cold starts.

My Lambda keeps getting ECONNREFUSED errors when processing streams. Help?

Getting `ECONNREFUSED 127.0.0.1:5432` or similar bullshit errors? That's usually IAM permissions, even though the error message makes it look like a network issue. Check: 1. Lambda execution role has DynamoDB stream permissions 2. If calling other services, add those permissions too 3. VPC configuration if your Lambda is in a VPC (common gotcha) 4. The actual network connectivity with `telnet` from a similar environment The error message lies - it's probably permissions, not network connectivity.

Stream processing randomly stops working and AWS support is useless. Now what?

Welcome to serverless hell. DynamoDB's "adaptive capacity" can take hours to kick in during traffic spikes, and AWS support will just tell you to "monitor your metrics." Quick fixes that actually work: - Restart the event source mapping (disable/enable) - Increase Lambda memory to handle the backlog faster - Reduce batch size temporarily to process smaller chunks - Check if you hit account limits (concurrency, API throttling)

Can I use Lambda to process DynamoDB streams from multiple regions?

Each DynamoDB table has its own stream per region, and Lambda functions can only process streams in the same region. For Global Tables, each region generates its own stream containing all writes to that region (including replicated writes from other regions). To process global changes centrally, use Lambda in each region to forward events to a central queue (SQS) or event bus (EventBridge) in your primary region. Be aware of Global Table conflicts - the "last writer wins" conflict resolution can cause data inconsistencies that your Lambda logic must handle gracefully.

How do I debug Lambda functions that process DynamoDB streams?

Enable AWS X-Ray tracing for end-to-end visibility across DynamoDB, streams, and Lambda. X-Ray shows exactly where time is spent and which operations fail. Use structured logging with correlation IDs that match your DynamoDB items to trace individual records through your processing pipeline. For local development, use the [AWS SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-cli-command-reference.html) with sample DynamoDB stream events. The DynamoDB streams event format is complex - use `JSON.stringify(event, null, 2)` to understand the record structure during development.

What's the maximum throughput I can achieve with Lambda DynamoDB stream processing?

Theoretical maximum throughput depends on the number of DynamoDB shards and your Lambda function performance. Each shard processes sequentially with one Lambda instance, but you can set parallelization factors up to 10 to process multiple batches from the same shard concurrently. In practice, most applications achieve 1,000-10,000 records per second per shard. If you need higher throughput, consider using Kinesis Data Streams instead of DynamoDB streams - Kinesis provides more control over shard management and can handle higher throughput scenarios with more complex stream processing patterns.

How do I handle Lambda timeout errors when processing large DynamoDB stream batches?

Lambda gives you 15 minutes max, but if your stream processing takes that long, something's seriously wrong. Stream functions should finish in seconds, not minutes. Timeouts usually mean you're doing too much work per batch - reduce the batch size or throw more memory at it for better CPU. Implement partial batch processing by returning `batchItemFailures` with records that couldn't be processed. Lambda will retry only the failed records rather than the entire batch. For time-intensive operations, consider using SQS to decouple the stream processing from the heavy computation work.

Should I use DynamoDB streams or Kinesis Data Streams for Lambda processing?

DynamoDB streams are simpler and cheaper for straightforward change data capture scenarios. They're automatically managed and included with your DynamoDB table at no additional cost (though Lambda processing costs still apply). Use Kinesis Data Streams when you need: more than 24-hour retention, multiple consumer applications processing the same data stream, precise control over shard management, or integration with Kinesis Analytics. Kinesis costs more ($0.014 per shard hour plus $0.014 per million PUT records) but provides more flexibility for complex stream processing architectures.

How do I prevent data loss when Lambda functions fail to process DynamoDB streams?

Stream records are automatically retained for 24 hours, providing built-in resilience against temporary failures. Set up dead letter queues or prepare to lose data when things go sideways. DLQs capture metadata about records that fail too many times - this prevents blocking but you'll need to manually recover the failures. Make your processing logic idempotent or prepare for duplicate chaos when retries kick in. Use DynamoDB conditional writes or timestamps so your function can safely retry operations without screwing things up twice. For critical data, consider dual-writing to both DynamoDB and a backup store, or implementing cross-region replication to ensure data durability.

What Lambda runtime should I choose for DynamoDB stream processing?

Python and Node.js provide the fastest cold start times (typically 100-500ms) and are ideal for simple stream processing logic. Java and .NET have slower cold starts (1-3 seconds) but better sustained performance for CPU-intensive processing. As of 2025, Lambda SnapStart dramatically improves cold start performance for Java, .NET, and Python, making these runtimes more viable for latency-sensitive stream processing. Choose based on your team's expertise and performance requirements - the language runtime typically has less impact on total processing time than your business logic and external service calls.

How do I implement blue-green deployments for Lambda functions processing DynamoDB streams?

Stream processing functions are stateful due to their connection to specific stream shards, making blue-green deployments complex. The safest approach is to temporarily disable the event source mapping, deploy the new function version, then re-enable the mapping. For zero-downtime deployments, use Lambda aliases with gradual traffic shifting. Configure the event source mapping to use an alias, then shift traffic from the old version to the new version over several minutes. This allows monitoring for errors while maintaining continuous stream processing. AWS SAM and Serverless Framework provide built-in support for these deployment patterns.

Currently viewing the AI version

Switch to human version

AWS Lambda DynamoDB: Production-Ready Integration Guide

Architecture Overview

Core Components:

DynamoDB stores data with automatic stream capture
DynamoDB Streams capture all data changes (INSERT, UPDATE, DELETE)
Lambda processes stream events in near real-time
1:1:1 mapping between DynamoDB partitions, stream shards, and Lambda functions

Performance Specifications:

DynamoDB latency: 2-5ms average (not sub-millisecond as marketed)
Stream retention: 24 hours maximum
Lambda cold starts: 5-10% in production (not <1% as claimed)
Default Lambda concurrency limit: 1,000 concurrent executions

Critical Failure Modes

Hot Partition Bottlenecks

Problem: Single busy partition becomes chokepoint for entire pipeline
Impact: Complete system failure for 3+ hours during traffic spikes
Root Cause: 1:1:1 mapping means hot partitions process sequentially
Solution: Design partition keys with high cardinality (e.g., userId#timestamp vs userId)

Stream Processing Failures

Problem: Iterator age spikes to 2+ hours without warning
Impact: Loss of "real-time" processing capability
Root Cause: DynamoDB adaptive capacity delays during traffic spikes
Solution: Monitor iterator age; alert when >30 seconds

Poison Pill Records

Problem: Single malformed record blocks entire shard for 24 hours
Impact: Complete processing halt until manual intervention
Root Cause: Default infinite retry behavior
Solution: Set MaximumRetryAttempts: 3 + enable BisectBatchOnFunctionError

Production Configuration

EventSourceMapping Settings

{
  "BatchSize": 1000-5000,           // Default 100 is too small
  "MaximumBatchingWindowInSeconds": 5,  // Reduces invocations by 90%
  "ParallelizationFactor": 2-4,     // Increases throughput but multiplies cold starts
  "MaximumRetryAttempts": 3,        // Prevents infinite retry hell
  "BisectBatchOnFunctionError": true, // Isolates poison pills
  "MaximumRecordAgeInSeconds": 300  // Discard old records
}

Error Handling Pattern

exports.handler = async (event) => {
    const failures = [];
    
    for (const record of event.Records) {
        try {
            await processRecord(record);
        } catch (error) {
            console.error(`Record ${record.dynamodb.SequenceNumber} failed:`, error);
            
            // Don't retry network timeouts or 5xx errors
            if (error.name === 'TimeoutError' || error.statusCode >= 500) {
                failures.push({ recordId: record.dynamodb.SequenceNumber });
            } else {
                await sendToDLQ(record, error);
            }
        }
    }
    
    return { batchItemFailures: failures };
};

Memory Optimization

512MB-1GB: Optimal for most stream processing
Memory = CPU = Performance = Cost: Higher memory often reduces total cost
ARM64: Better price/performance but NPM package compatibility issues

Cost Analysis

Pricing Breakdown (100k user app example)

Monthly cost: ~$200
Lambda billing change (August 2025): +20-30% due to initialization phase billing
Batch size impact: 10k records vs 100 records = $50/month vs $500/month

Cost Optimization Strategies

Batch Size: Use 1,000-5,000 records minimum
Batching Window: 5 seconds reduces invocations dramatically
Memory Allocation: Use Lambda Power Tuning tool for optimization
DynamoDB: On-demand costs 5x more but scales automatically

Monitoring Requirements

Critical Metrics

Iterator Age >30 seconds: Immediate alert required
Error Rate >1%: System degradation
Duration Increase: Check for cold starts or downstream latency
ConcurrentExecutions: Monitor for limit breaches

Debugging Tools

X-Ray: Actually works for stream processing (unlike most AWS services)
CloudWatch Query: filter @type = "REPORT" | stats sum(@initDuration) by bin(5m)
Structured Logging: Use correlation IDs matching DynamoDB items

Data Consistency Guarantees

What Works

Item-level ordering: Changes to same item always arrive in order
24-hour retention: Built-in resilience against temporary failures
Audit logs: Stream records include old and new values

What Doesn't Work

Cross-item coordination: Streams don't guarantee ordering across items
Global Tables: Last writer wins conflict resolution
Real-time guarantees: Iterator age spikes break real-time processing

Common Integration Patterns

Reliable Patterns

Cache Invalidation: Update Redis/ElastiCache on data changes
Search Indexing: Push to Elasticsearch automatically
Audit Logging: Track all data changes with old/new values
Event Publishing: Trigger EventBridge for microservices

Anti-Patterns to Avoid

Complex Queries: NoSQL query limitations
High-frequency Events: Page view tracking will destroy budget
Data Warehouse Replacement: Not suitable for analytical workloads
ACID Transactions: Limited to single-item operations

Comparison Matrix

Aspect	Lambda + DynamoDB	Lambda + RDS Proxy	Step Functions + DynamoDB	EventBridge + Lambda
Operational Overhead	✅ No servers	❌ RDS maintenance	❌ Complex debugging	✅ No servers
Performance	⚠️ 2-5ms	❌ 20-100ms	❌ 500ms+	⚠️ Variable delays
Scaling	⚠️ Hot partition limits	❌ Connection pools	✅ Unlimited	⚠️ Surprise limits
Cost	Variable	High from start	Expensive transitions	Event costs accumulate
Data Consistency	Item-level only	Full ACID	Eventually consistent	Hope and pray
Real-time	When working	Polling only	Not real-time	"Near" real-time
Complex Queries	NoSQL limitations	SQL works	Lambda spaghetti	No queries
Error Handling	When configured	Connection failures	Works well	Event replay complex
Cold Start Impact	5-10% reality	Connection delays	Every invocation	Every event

Troubleshooting Guide

Iterator Age Spiking

Check CloudWatch for Lambda errors
Set MaximumRetryAttempts to 3
Enable BisectBatchOnFunctionError
Increase Lambda memory if no errors
Check ConcurrentExecutions limits

Hot Partition Throttling

Immediate: Implement write sharding with random suffixes
Long-term: Redesign partition keys for better distribution
Pattern: Use userId#timestamp instead of userId

Lambda ECONNREFUSED Errors

Check IAM permissions (usually the cause)
Verify VPC configuration if applicable
Test network connectivity
Add missing service permissions

Stream Processing Stops

Restart EventSourceMapping (disable/enable)
Increase Lambda memory temporarily
Reduce batch size for faster processing
Check account limits

Cold Start Cost Impact

Query: filter @type = "REPORT" | stats sum(@initDuration) by bin(5m)
Solutions: SnapStart for Java/Python, Provisioned Concurrency for sustained traffic
Budget: Expect 20-30% cost increase from initialization billing

Resource Requirements

Development Time

Simple implementation: 1-2 days
Production-ready with error handling: 1-2 weeks
Complex multi-region setup: 1-2 months

Expertise Required

Basic: Understanding of NoSQL access patterns
Intermediate: Lambda configuration and monitoring
Advanced: DynamoDB partition design and stream optimization

Infrastructure Dependencies

Required: DynamoDB table with streams enabled
Recommended: CloudWatch alarms, X-Ray tracing, Dead Letter Queues
Optional: Lambda Provisioned Concurrency for latency-sensitive apps

Breaking Points and Limits

Hard Limits

Stream retention: 24 hours maximum
Lambda timeout: 15 minutes maximum
Item size: 400KB maximum (affects audit logs)
Batch size: 10,000 records maximum

Practical Limits

Concurrency: 1,000 default (request increases take days)
Throughput: 1,000-10,000 records/second per shard
Hot partitions: Single partition becomes bottleneck
Cold starts: 5-10% of invocations in production

Security Considerations

IAM Requirements

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DescribeStream",
        "dynamodb:GetRecords",
        "dynamodb:GetShardIterator",
        "dynamodb:ListStreams"
      ],
      "Resource": "arn:aws:dynamodb:region:account:table/table-name/stream/*"
    }
  ]
}

Best Practices

Never log sensitive data in Lambda functions
Use parameter store for configuration secrets
Implement least privilege access
Enable VPC endpoints for private connectivity

Migration Strategies

From Polling to Streams

Implement stream processing in parallel
Compare outputs for consistency
Gradually reduce polling frequency
Switch traffic and remove polling

Blue-Green Deployments

Disable EventSourceMapping
Deploy new function version
Re-enable mapping with new version
Monitor for errors during transition

Multi-Region Setup

Each region has separate streams
Use Lambda in each region for local processing
Forward to central queue/EventBridge for global coordination
Handle Global Tables conflict resolution

Useful Links for Further Investigation

Resources That Help When You're Debugging at 3AM

Link	Description
AWS re:Post Q&A Community	Real engineers discussing real problems. Search for "DynamoDB stream" or "Lambda iterator age" to find solutions, not marketing fluff. Way better than AWS support for real issues.
Stack Overflow - AWS Lambda Tag	The go-to place when AWS documentation fails you (which is often). Search for specific error messages and problems. I've found more useful solutions here than in AWS's official docs.
AWS Developers Slack Community	Real-time help from AWS developers and engineers who've actually debugged this stuff. Join the #lambda and #dynamodb channels for serverless discussions. Great place to vent about AWS's latest "improvements."
GitHub - AWS Samples	Real code examples that actually work, unlike most tutorials. Search for "lambda dynamodb stream" for practical implementations.
AWS Lambda Developer Guide	The official docs. Good for reference once you know what you're looking for, useless for learning from scratch. Written by people who've never debugged a production outage at 2am.
Using AWS Lambda with Amazon DynamoDB	Official guide specifically for DynamoDB stream integration, covering EventSourceMapping configuration, error handling, and performance optimization strategies.
Amazon DynamoDB Developer Guide	The official DynamoDB docs covering streams, global tables, and performance tuning. Critical for understanding partition key design and access patterns or you'll regret it later.
DynamoDB Streams Documentation	Detailed documentation on stream configuration, record format, and shard management. Essential for understanding stream ordering and consistency guarantees.
Build scalable, event-driven architectures with Amazon DynamoDB and AWS Lambda	November 2024 comprehensive guide covering stream shard management, error handling patterns, and performance optimization strategies for production workloads.
AWS Lambda standardizes billing for INIT Phase	April 2025 announcement detailing the billing changes for Lambda initialization phases, including optimization strategies and cost analysis techniques.
Operating Lambda: Performance optimization – Part 1	Deep dive into Lambda performance optimization covering memory allocation, connection reuse, and monitoring strategies for production applications.
Monitoring Amazon DynamoDB for operational awareness	Comprehensive monitoring guide covering key metrics, CloudWatch alarms, and operational best practices for DynamoDB production workloads.
AWS SAM (Serverless Application Model)	Infrastructure-as-code framework for deploying serverless applications. Provides templates and local testing capabilities for Lambda-DynamoDB integrations.
AWS Lambda Power Tuning	Open-source tool for automatically optimizing Lambda memory configuration to balance performance and cost. Essential for production performance tuning.
AWS X-Ray	Distributed tracing service providing end-to-end visibility across Lambda and DynamoDB interactions. Critical for debugging complex event-driven architectures.
AWS CLI DynamoDB Commands	Command-line tools for DynamoDB management, including stream configuration and monitoring. Useful for automation and operational scripts.
AWS Lambda Documentation Examples	Official AWS Lambda code samples and tutorials covering development, deployment, and integration patterns with DynamoDB.
AWS DynamoDB Deep Dive	Official getting started resources including tutorials, best practices, and sample applications demonstrating real-world usage patterns.
Create a CRUD HTTP API with Lambda and DynamoDB	Step-by-step tutorial for building complete serverless APIs using API Gateway, Lambda, and DynamoDB with practical code examples.
Serverless Land Patterns	Collection of serverless architecture patterns including many Lambda-DynamoDB integration examples with CDK and SAM templates.
AWS CloudWatch	Comprehensive monitoring service for Lambda and DynamoDB metrics, logs, and alarms. Essential for production monitoring and alerting.
AWS Cost Explorer	Cost analysis tool for understanding Lambda and DynamoDB spending patterns, including the impact of the August 2025 billing changes.
AWS Personal Health Dashboard	Provides alerts and guidance when AWS service events may affect your Lambda-DynamoDB applications.
Amazon CloudWatch Logs Insights	Query language and interface for analyzing Lambda logs and DynamoDB access patterns. Critical for troubleshooting and performance analysis.
AWS Well-Architected Tool	Assessment tool for reviewing Lambda and DynamoDB architectures against AWS best practices for performance, security, and reliability.
AWS Lambda on GitHub	Official AWS repositories containing runtime emulators, sample functions, and development tools for Lambda development.
AWS Community Forums	Official AWS community forums with dedicated sections for Lambda and DynamoDB discussions, AWS support participation.
AWS Serverless Application Repository	Pre-built serverless applications and components, many featuring Lambda-DynamoDB integrations ready for deployment.
AWS Pricing Calculator	Interactive tool for estimating Lambda and DynamoDB costs under different usage scenarios, including the impact of initialization phase billing.
AWS Trusted Advisor	Automated recommendations for cost optimization, performance improvement, and security best practices for Lambda and DynamoDB.
Amazon DynamoDB Accelerator (DAX)	In-memory caching service that can reduce DynamoDB latency to microseconds, useful for read-heavy Lambda applications.
AWS Lambda Provisioned Concurrency	Documentation for eliminating cold starts in latency-sensitive applications, including cost analysis and configuration best practices.

AWS Lambda DynamoDB: Production-Ready Integration Guide

Architecture Overview

Critical Failure Modes

Hot Partition Bottlenecks

Stream Processing Failures

Poison Pill Records

Production Configuration

EventSourceMapping Settings

Error Handling Pattern

Memory Optimization

Cost Analysis

Pricing Breakdown (100k user app example)

Cost Optimization Strategies

Monitoring Requirements

Critical Metrics

Debugging Tools

Data Consistency Guarantees

What Works

What Doesn't Work

Common Integration Patterns

Reliable Patterns

Anti-Patterns to Avoid

Comparison Matrix

Troubleshooting Guide

Iterator Age Spiking

Hot Partition Throttling

Lambda ECONNREFUSED Errors

Stream Processing Stops

Cold Start Cost Impact

Resource Requirements

Development Time

Expertise Required

Infrastructure Dependencies

Breaking Points and Limits

Hard Limits

Practical Limits

Security Considerations

IAM Requirements

Best Practices

Migration Strategies

From Polling to Streams

Blue-Green Deployments

Multi-Region Setup

Useful Links for Further Investigation

Resources That Help When You're Debugging at 3AM

Related Tools & Recommendations

API Gateway Pricing: AWS Will Destroy Your Budget, Kong Hides Their Prices, and Zuul Is Free But Costs Everything

AWS API Gateway - Production Security Hardening

AWS API Gateway - The API Service That Actually Works

Migrate to Cloudflare Workers - Production Deployment Guide

Why Serverless Bills Make You Want to Burn Everything Down

Cloudflare Workers - Serverless Functions That Actually Start Fast

MongoDB Alternatives: Choose the Right Database for Your Specific Use Case

Kafka + MongoDB + Kubernetes + Prometheus Integration - When Event Streams Break

MongoDB Alternatives: The Migration Reality Check

MongoDB vs DynamoDB vs Cosmos DB - Which NoSQL Database Will Actually Work for You?

Amazon DynamoDB - AWS NoSQL Database That Actually Scales

Lambda Alternatives That Won't Bankrupt You

Stop Your Lambda Functions From Sucking: A Guide to Not Getting Paged at 3am

AWS Lambda - Run Code Without Dealing With Servers

GitHub Actions Marketplace - Where CI/CD Actually Gets Easier

GitHub Actions Alternatives That Don't Suck

GitHub Actions + Docker + ECS: Stop SSH-ing Into Servers Like It's 2015

Apache Cassandra - The Database That Scales Forever (and Breaks Spectacularly)

How to Fix Your Slow-as-Hell Cassandra Cluster

Hardening Cassandra Security - Because Default Configs Get You Fired