AWS API Gateway - Production Security Hardening

WAF Integration (Your First Line of Defense)

AWS WAF blocks attacks before they hit your Lambda functions and cost you money. REST APIs only - HTTP APIs are on their own because AWS loves feature parity. We've seen production systems get hammered by 50K+ requests/minute of SQL injection attempts (mostly ' OR '1'='1 and UNION SELECT garbage) until WAF was properly configured. That attack cost $4K in Lambda invocations before we realized what was happening.

Critical WAF rules that saved our ass:

SQL Injection Protection - Catches UNION SELECT and similar garbage in query parameters
XSS Filter - Blocks <script> tags and javascript: attempts
Rate Limiting - 1000 requests per 5-minute window per IP (adjust for your traffic)
Known Bad IPs - AWS managed rule set blocks Tor exit nodes and known botnets
Size Restrictions - Reject requests over 1MB body size to prevent DoS attacks

WAF costs $1.00 per web ACL + $1.00 per rule + $0.60 per million requests. Sounds expensive until a single DDoS attack costs you $10K in API Gateway charges in one afternoon. Set up CloudWatch alarms for blocked request spikes - we learned this the hard way when 90% of our traffic was attack attempts and we didn't notice until the AWS bill arrived. "AllowedRequests" dropping below 50% of total requests is a good threshold for alarm.

Resource Policies (VPC and IP Whitelisting)

Resource policies are your kill switch. Lock down admin APIs to specific VPCs or IP ranges. That "internal only" API shouldn't be accessible from random coffee shop WiFi.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "arn:aws:execute-api:*:*:*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": ["203.0.113.0/24", "198.51.100.0/24"]
        }
      }
    }
  ]
}

Warning: This policy blocks everything except those IP ranges - test it thoroughly on a dev stage before deploying or you'll lock yourself out. I've seen engineers deploy this on prod, then panic when they can't access the API from home.

VPC endpoint restrictions work great for internal microservices. Private APIs never touch the internet - traffic stays within your VPC. Perfect for backend services that shouldn't be publicly accessible.

TLS Configuration That Doesn't Suck

API Gateway only supports TLS 1.2+ by default, which is good. But custom domain certificates require ACM certificates - don't upload your own certs, that's amateur hour.

Certificate pinning is possible but painful with CloudFront distribution changes. We've had clients implement it for high-security APIs, but expect mobile apps to break when certificates rotate. HSTS headers are easier and catch most downgrade attacks.

API Gateway TLS Termination

Edge-optimized endpoints terminate TLS at CloudFront edge locations, then re-encrypt to API Gateway. Regional endpoints terminate TLS once at the API Gateway service. Both are secure, but edge-optimized adds complexity and potential attack surface. Pick regional unless you actually need global performance.

Security Questions That Keep You Up at Night

How do I know if my API is under attack?

CloudWatch metrics spike in 4XXError and Count metrics are your first warning.

Set alarms for >1000 4XX errors in 5 minutes

that's usually not legitimate traffic unless your mobile app is completely broken. WAF blocked requests show up in separate `Blocked

RequestsandAllowedRequests` metrics.

We've seen legit traffic patterns where 20% are 4XX (mobile apps retry aggressively and send malformed requests), but 50%+ is definitely an attack. Look for spikes in specific error codes: lots of 403s usually mean auth bypass attempts, 400s often indicate SQL injection or parameter tampering.

Should I enable request logging for security?

Yes, but it'll cost you. Access logging costs $0.50/GB and generates massive Cloud

Watch Logs bills

we hit $800/month just on access logs for a medium-traffic API. We enable full logging for sensitive APIs (auth, payments, admin) and sample 10% of requests for others using $requestId.substring(0,1) == "1" in the log format. Full request/response logging with $requestBody and $responseBody is great for forensics but will bankrupt you at scale
100MB of logs per day becomes $15/month. Enable it before you need it
trying to investigate a security incident with no logs is like debugging with print() statements after deleting all your code.

What's the deal with API keys vs IAM vs Cognito?

API keys are not security

they're for billing and usage tracking.

Anyone can see them in client code. IAM authorization is for AWS services and internal APIs. Cognito is for user authentication

handles OAuth, SAML, social logins. Lambda authorizers are for custom auth logic. Pick based on your use case, not what sounds fancy.

How do I prevent DDoS attacks bankrupting me?

Usage plans with throttling limits save your AWS bill.

Set burst limits to reasonable numbers

1000 requests/second might be too high if your Lambda can't handle it. AWS Shield is automatic for API Gateway, but Shield Advanced costs $3K/month.

WAF rate limiting is cheaper and works for most attacks. Set billing alarms

we learned this when a DDoS attack cost $2K before we noticed.

Can I use API Gateway with my corporate firewall/proxy?

VPC endpoints work through AWS PrivateLink

no internet routing required.

Corporate proxies often break WebSocket APIs because they don't understand the upgrade headers. Regional endpoints play better with corporate networks than edge-optimized ones. Test your proxy configuration thoroughly

we've seen environments where only GET requests work through the corporate proxy.

What about compliance (SOC 2, HIPAA, PCI DSS)?

API Gateway is covered under AWS compliance programs.

For HIPAA, you need a BAA with AWS and must enable encryption in transit and at rest. CloudTrail logging is required for most compliance frameworks

it shows who accessed what APIs when. PCI DSS requires WAF protection and network segmentation. Store sensitive data in your Lambda functions, not API Gateway
it's easier to audit and secure.

How do I rotate API credentials without downtime?

Lambda authorizers can implement graceful credential rotation

accept both old and new tokens for a transition period.

API keys can be rotated through usage plans, but you need to coordinate client updates. AWS Secrets Manager integration helps with automatic rotation. For Cognito, use refresh tokens and short-lived access tokens. Plan for rotation failures

always have a rollback strategy.

What's the biggest security mistake people make?

Trusting client-side validation.

Your mobile app validates the request format? Great, attackers use curl. Rate limiting on the client side? Cute, but useless. Always validate and authorize on the server side. That fancy JWT token validation in your React app means nothing when someone hits your API directly with curl -X POST https://api.yourdomain.com/admin/delete-everything.

We've seen APIs get pwned because they only validated requests in the frontend

the backend would happily accept {"userId": "admin", "role": "superuser"} from anyone with curl and a basic understanding of JSON.

Performance Under Attack (When Security Meets Speed)

Cold Start Mitigation During Traffic Spikes

Security incidents create the worst performance scenarios. DDoS attacks cause Lambda cold starts across your entire function fleet. Provisioned concurrency at $0.015 per GB-second keeps functions warm, but costs add up fast during sustained attacks.

Cold start reality check: Java functions can take 15+ seconds on first request (yes, really - we've measured 18 seconds for Spring Boot). Python and Node.js typically under 1 second but can spike to 3+ seconds with heavy dependencies. Go and Rust are fastest at 100-300ms unless you're importing half of GitHub. During an attack, everything cold starts simultaneously - that's when you discover your 5-second Lambda timeout is too aggressive and everything starts failing.

We've seen production systems handle 50K legitimate requests/minute fine, then collapse under 10K attack requests because every Lambda function went cold. Connection pooling to RDS becomes critical - opening database connections during cold start adds 2-3 seconds of latency.

Caching Strategy for Security-Sensitive APIs

API Gateway caching is tricky with authentication. Cache authenticated responses and you leak user data. Cache unauthenticated responses and attackers can poison your cache. The sweet spot is caching reference data that's the same for all users.

Cache invalidation becomes a security nightmare when you need to revoke access immediately. That cached user profile with "admin": true sticks around for up to 1 hour (default TTL) even after you revoke permissions. Manual cache flushing through the console works but takes 5-15 minutes to propagate across all regions. For sensitive operations like admin APIs or payment processing, skip the cache entirely or use very short TTLs (60 seconds max). We learned this when a terminated employee's admin access was cached for 45 minutes after we disabled their account.

Throttling Configuration That Actually Works

Default throttling is 10,000 requests per second across your entire AWS account. One bad API getting attacked can break everything else. Set method-level throttling on sensitive endpoints - authentication APIs shouldn't accept 1000 requests/second from a single IP.

{
  "burstLimit": 100,
  "rateLimit": 50
}

Per-client throttling requires usage plans with API keys. Works great for legitimate clients, useless against attackers who don't use keys. WAF rate limiting is better for attack mitigation, usage plans for business logic.

VPC Integration Performance Penalties

VPC Links add 100-300ms of latency and require Network Load Balancers ($16/month minimum). VPC cold starts are brutal - first request after idle can take 10+ seconds while ENIs warm up.

The VPC tax is real: Every security boundary adds latency. Internet → CloudFront → API Gateway → VPC Link → NLB → your service. That's 4-5 network hops minimum. Regional endpoints skip CloudFront but you lose edge caching. Pick your performance vs security tradeoffs consciously.

Edge vs Regional for Security Workloads

Edge-optimized endpoints cache responses at CloudFront edge locations. Great for performance, terrible for security logs. CloudFront logs show edge locations, not original client IPs. WAF logs are delayed 5-15 minutes from edge locations.

Regional endpoints give you real client IPs immediately and simpler security monitoring. All traffic hits one region, making it easier to analyze attack patterns. Edge-optimized makes sense for public APIs with global users, regional is better for internal APIs or when you need real-time security monitoring.

Cost difference matters at scale: CloudFront adds $0.085/GB for data transfer. At 1TB/month, that's $85 extra just for edge caching. If your security team needs real-time visibility, regional endpoints are worth the performance trade-off.

Security Configuration Comparison - REST vs HTTP APIs

Security Feature	REST API	HTTP API	Production Reality
AWS WAF Integration	✅ Full integration, web ACLs work	❌ No WAF support	REST wins WAF blocks attacks before they cost you money
Resource Policies	✅ IP/VPC restrictions work	✅ Same functionality	Tie Both support IP whitelisting and VPC restrictions
Private Endpoints	✅ VPC-only APIs via VPC endpoints	❌ Internet-only	REST wins Internal APIs should never touch the internet
Request Validation	✅ Schema validation at gateway	❌ Validate in Lambda code	REST wins Blocking bad requests early saves compute costs
Built-in Throttling	✅ Per-method and per-client limits	✅ Basic throttling only	REST wins Granular controls matter during attacks
Authentication Options	IAM, Cognito, API Keys, Lambda authorizers	IAM, Cognito, JWT, Lambda authorizers	Tie Both support enterprise auth patterns
Monitoring & Logging	CloudWatch + X-Ray tracing	CloudWatch only	REST wins X-Ray helps debug security incidents
Cost During Attacks	Higher per-request cost	Lower per-request cost	HTTP wins Attacks are expensive, every penny matters
TLS Termination	CloudFront (edge) or regional	Regional only	REST wins Edge termination distributes attack load
Custom Headers	Full header manipulation	Basic parameter mapping	REST wins Security headers need flexibility

Quick Navigation

Resource Policies (VPC and IP Whitelisting)

TLS Configuration That Doesn't Suck

How do I know if my API is under attack?

Should I enable request logging for security?

What's the deal with API keys vs IAM vs Cognito?

How do I prevent DDoS attacks bankrupting me?

Can I use API Gateway with my corporate firewall/proxy?

What about compliance (SOC 2, HIPAA, PCI DSS)?

How do I rotate API credentials without downtime?

What's the biggest security mistake people make?

Cold Start Mitigation During Traffic Spikes

Caching Strategy for Security-Sensitive APIs

Throttling Configuration That Actually Works

VPC Integration Performance Penalties

Edge vs Regional for Security Workloads

Related Tools & Recommendations

AWS Lambda Overview: Run Code Without Servers - Pros & Cons

Amazon SageMaker: AWS ML Platform Overview & Features Guide

AWS API Gateway: The API Service That Actually Works

Hugging Face Inference Endpoints: Secure AI Deployment & Production Guide

Binance API Security Hardening: Protect Your Trading Bots

KrakenD Production Troubleshooting - Fix the 3AM Problems

Node.js Security Hardening Guide: Protect Your Apps

AWS Database Migration Service: Real-World Migrations & Costs

BentoML Production Deployment: Secure & Reliable ML Model Serving

GraphQL Production Troubleshooting: Fix Errors & Optimize Performance

Amazon EC2 Overview: Elastic Cloud Compute Explained

Open Policy Agent (OPA): Centralize Authorization & Policy Management

Flux GitOps: Secure Kubernetes Deployments with CI/CD

Django Production Deployment Guide: Docker, Security, Monitoring

Lock Down Kubernetes: Production Cluster Hardening & Security

Git Disaster Recovery & CVE-2025-48384 Security Alert Guide

Optimize Docker Security Scans in CI/CD: Performance Guide

Nx Monorepo Overview: Caching, Performance & Setup Guide

npm Enterprise Troubleshooting: Fix Corporate IT & Dev Problems

API Rate Limiting: Complete Implementation Guide & Best Practices