NGINX Ingress Controller: Production Implementation Guide
Configuration Options
Version Selection
Community Version (kubernetes/ingress-nginx)
- Cost: Free (Apache 2.0 license)
- Backend: NGINX OSS
- Configuration: Kubernetes Ingress resources + annotations
- Performance: 45k req/sec in production
- Memory: 200MB base + 5MB per 100 ingresses
- Maintenance: Community-supported, large user base
Commercial Version (F5 NGINX Ingress Controller)
- Cost: Commercial licensing + support
- Backend: NGINX OSS or NGINX Plus
- Configuration: Custom Resource Definitions (CRDs)
- Performance: 60k+ req/sec with NGINX Plus
- Memory: 150MB base (NGINX Plus)
- Features: HTTP/3, OpenTelemetry tracing, NGINX One Console integration
Decision Criteria
- Start with community version unless specific enterprise features required
- Upgrade to F5 version for: JWT authentication, advanced rate limiting, commercial support
- Production threshold: F5 version handles larger configurations better (1000+ ingresses)
Resource Requirements
Hardware Specifications
- Memory: 200MB base + 5MB per 100 ingresses (community)
- CPU: Scales with request rate and configuration complexity
- Network: Sub-millisecond latency for static content
- Storage: Minimal - primarily configuration and logs
Scaling Thresholds
- Configuration reload time: >10 seconds indicates performance degradation
- Maximum ingresses: 1000+ simple ingresses supported
- Complex routing rules: Significantly increase reload times
- Worker processes: Rate limits apply per worker, multiply by worker count
Time Investment
- Initial setup: 1-2 hours with Helm charts
- SSL automation: Additional 2-4 hours for cert-manager integration
- Production hardening: 1-2 days for monitoring, HA setup
- Learning curve: Medium complexity, requires NGINX knowledge
Critical Warnings
SSL Certificate Management
FAILURE MODE: Certificate expiration at production runtime
- Root cause: cert-manager renewal failures (DNS propagation, rate limiting, ACME challenges)
- Impact: Complete service outage for HTTPS traffic
- Prevention: Configure backup ACME issuers, monitor renewal 30 days before expiration
- Rate limits: Let's Encrypt allows 50 certificates per domain per week
FAILURE MODE: Wildcard certificate DNS-01 challenges
- Requirement: Cloud DNS API credentials (Route53, CloudFlare, Google DNS)
- Risk: Credential management becomes additional failure point
- Solution: Use multiple DNS providers for redundancy
Rate Limiting Misconceptions
CRITICAL ERROR: Rate limits apply per NGINX worker process, not globally
- Real behavior: Each worker process applies limits independently
- Common mistake: Expecting global rate limiting across all pods
- Actual traffic: 12x expected rate when using multiple workers
- Solution: Calculate limits as (desired_rate / workers / replicas)
Production Deployment Gotchas
FAILURE MODE: Using hostnames in upstream configurations
- Impact: DNS lookup delays cause response time degradation
- Solution: Always use IP addresses for upstream backends
FAILURE MODE: Missing resource limits on ingress pods
- Risk: OOMKiller terminates pods during traffic spikes
- Solution: Configure appropriate CPU/memory requests and limits
FAILURE MODE: Single ingress controller pod
- Downtime: 30-60 seconds minimum, up to 5 minutes under load
- Solution: Multiple replicas with anti-affinity rules across nodes
Configuration Complexity Issues
BREAKING POINT: Complex regex patterns and custom annotations
- Symptom: Reload times increase from 2 seconds to 30+ seconds
- Impact: Service disruption during configuration changes
- Solution: Simplify routing rules, use F5 version for dynamic updates
Performance Specifications
Throughput Benchmarks
Controller Type | Requests/Second | Memory Usage | HTTP/3 Support |
---|---|---|---|
Community (OSS) | 45,000 | 200MB base | ❌ No |
F5 (OSS) | 45,000 | 200MB base | ❌ No |
F5 (Plus) | 60,000+ | 150MB base | ✅ Yes |
Latency Characteristics
- Static content: Sub-millisecond latency
- Dynamic requests: Efficient connection handling
- SSL handshakes: Production-tested TLS implementation
- Configuration overhead: Minimal impact from Kubernetes integration
Security Implementation
TLS Termination
- Automatic certificates: cert-manager + Let's Encrypt integration
- SNI support: Multiple domains per IP address
- Certificate rotation: 90-day expiration with automatic renewal
Critical Vulnerabilities
SECURITY ALERT: CVE-2025-1974 and related vulnerabilities (March 2025)
- Affected: Community ingress-nginx controller
- Action required: Update to patched versions immediately
- Mitigation: Ensure proper network isolation
Advanced Security Features (F5 Only)
- WAF integration: NGINX App Protect for OWASP Top 10 protection
- Geographic restrictions: GeoIP module for compliance requirements
- Authentication delegation: auth_request module for centralized auth
- JWT-based policies: Claims-based access control
Monitoring and Debugging
Essential Metrics
- Request rates: Per-second throughput monitoring
- Response codes: Error rate tracking (4xx, 5xx)
- Upstream health: Backend service availability
- SSL statistics: Handshake success rates
- Active connections: Current load monitoring
Debugging Tools
- Debug logging:
error-log-level: debug
in configmap - Access logs: Per-ingress logging with annotations
- Request tracing: OpenTelemetry support (F5 version)
- Configuration validation:
nginx -t
automatic testing
Production Monitoring Stack
- Metrics exporter: nginx-prometheus-exporter
- Visualization: Grafana dashboards available
- Log aggregation: kubectl logs from ingress pods
- Alerting: Certificate expiration, pod health, response times
High Availability Patterns
Deployment Architecture
- Controller type: DaemonSet on dedicated nodes or Deployment with replicas
- Load balancer: Cloud LB for health checking and traffic distribution
- Service exposure: LoadBalancer or NodePort services on ports 80/443
Failure Resilience
- Node affinity: Spread pods across different worker nodes
- Zone distribution: Anti-affinity rules across availability zones
- Health checks: Kubernetes readiness probes for automatic failover
- Backup systems: Multiple certificate issuers for redundancy
Migration Strategy
From Cloud Load Balancers
- Parallel deployment: Run both systems simultaneously
- Traffic testing: Validate functionality before DNS changes
- Gradual migration: Update DNS entries incrementally
- Feature mapping: Translate cloud LB features to NGINX annotations
- Certificate management: Implement cert-manager before cutover
Cost Analysis
- Cloud LB cost: $20-50/month per load balancer
- NGINX Ingress: Runs on existing cluster nodes (compute cost only)
- Break-even point: 2-3 load balancers justify ingress controller adoption
Common Failure Scenarios
Certificate Renewal Failures
- Frequency: High risk during Let's Encrypt outages or DNS issues
- Detection: Monitor certificate expiration 30 days in advance
- Recovery: Manual certificate issuance or backup ACME provider
- Prevention: Multiple certificate issuers configured
Configuration Reload Failures
- Trigger: Invalid NGINX configuration from complex Ingress rules
- Impact: New configurations rejected, existing traffic continues
- Detection: Controller logs show
nginx -t
validation errors - Recovery: Fix Ingress resource syntax, simplify routing rules
Pod Scheduling Failures
- Cause: Resource constraints or node affinity conflicts
- Impact: Reduced ingress capacity or single points of failure
- Detection: Pod status monitoring and replica count alerts
- Recovery: Adjust resource requests or node labeling
Implementation Checklist
Basic Setup
- Install via Helm with production values
- Configure resource limits and requests
- Set up anti-affinity rules for HA
- Expose via LoadBalancer or NodePort service
SSL Configuration
- Install cert-manager
- Configure Let's Encrypt cluster issuer
- Set up backup ACME provider
- Test certificate provisioning and renewal
Monitoring Setup
- Deploy nginx-prometheus-exporter
- Configure Grafana dashboards
- Set up certificate expiration alerts
- Implement log aggregation
Production Hardening
- Enable debug logging temporarily for testing
- Configure appropriate rate limiting
- Test failover scenarios
- Document emergency procedures
Key Resources
- Primary documentation: kubernetes/ingress-nginx GitHub repository
- Installation guide: Helm chart with production values.yaml
- Annotations reference: Complete configuration options
- Troubleshooting: Step-by-step debugging procedures
- Community support: Kubernetes Slack #ingress-nginx channel
Useful Links for Further Investigation
Essential Resources for NGINX Ingress Controller
Link | Description |
---|---|
kubernetes/ingress-nginx GitHub | The community repo that actually works. Issues section has real problems with solutions that don't suck. Installation docs are decent once you ignore the minikube examples. |
NGINX Ingress Controller Helm Installation | Use this to install it. The values.yaml has everything you need to not fuck up your deployment. Production-ready defaults that mostly work out of the box. |
Configuration Annotations Reference | The only documentation that matters. Every annotation you'll ever need with examples that actually work. |
cert-manager Integration Tutorial | How to not manually manage SSL certs like an animal. Follow this exactly or spend your weekends renewing certificates. |
Troubleshooting Guide | Actually helpful when shit breaks and the logs tell you nothing useful. Real debugging steps that work. |
Stack Overflow - nginx-ingress | Better answers than official docs when everything's on fire. Real problems with solutions from people who've debugged this at 3am. |
Kubernetes Slack #ingress-nginx | Get help from people who actually use this stuff. Expect some attitude if you ask obviously Googleable questions. |
Related Tools & Recommendations
GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus
How to Wire Together the Modern DevOps Stack Without Losing Your Sanity
Making Pulumi, Kubernetes, Helm, and GitOps Actually Work Together
Stop fighting with YAML hell and infrastructure drift - here's how to manage everything through Git without losing your sanity
Helm - Because Managing 47 YAML Files Will Drive You Insane
Package manager for Kubernetes that saves you from copy-pasting deployment configs like a savage. Helm charts beat maintaining separate YAML files for every dam
Kong Gateway - Cloud-Native API Gateway That Doesn't Completely Suck
Explore Kong Gateway, the open-source, cloud-native API gateway built on NGINX. Understand its core features, pricing structure, and find answers to common FAQs
NGINX - The Web Server That Actually Handles Traffic Without Dying
The event-driven web server and reverse proxy that conquered Apache because handling 10,000+ connections with threads is fucking stupid
Set Up Microservices Monitoring That Actually Works
Stop flying blind - get real visibility into what's breaking your distributed services
cert-manager - Stops You From Getting Paged at 3AM Because Certs Expired Again
Because manually managing SSL certificates is a special kind of hell
Escape Istio Hell: How to Migrate to Linkerd Without Destroying Production
Stop feeding the Istio monster - here's how to escape to Linkerd without destroying everything
Fix Helm When It Inevitably Breaks - Debug Guide
The commands, tools, and nuclear options for when your Helm deployment is fucked and you need to debug template errors at 3am.
Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)
Google runs your Kubernetes clusters so you don't wake up to etcd corruption at 3am. Costs way more than DIY but beats losing your weekend to cluster disasters.
Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You
Stop debugging distributed transactions at 3am like some kind of digital masochist
Istio - Service Mesh That'll Make You Question Your Life Choices
The most complex way to connect microservices, but it actually works (eventually)
How to Deploy Istio Without Destroying Your Production Environment
A battle-tested guide from someone who's learned these lessons the hard way
Envoy Proxy - The Network Proxy That Actually Works
Lyft built this because microservices networking was a clusterfuck, now it's everywhere
Why Your Monitoring Bill Tripled (And How I Fixed Mine)
Four Tools That Actually Work + The Real Cost of Making Them Play Nice
Grafana Cloud - Managed Monitoring That Actually Works
Stop babysitting Prometheus at 3am and let someone else deal with the storage headaches
Falco + Prometheus + Grafana: The Only Security Stack That Doesn't Suck
Tired of burning $50k/month on security vendors that miss everything important? This combo actually catches the shit that matters.
Let's Encrypt - Finally, SSL Certs That Don't Cost a Mortgage Payment
Free automated certificates that renew themselves so you never get paged at 3am again
Kubernetes Networking Breaks. Here's How to Fix It.
When nothing can talk to anything else and you're getting paged at 2am on a Sunday because someone deployed a \
API Gateway Pricing: AWS Will Destroy Your Budget, Kong Hides Their Prices, and Zuul Is Free But Costs Everything
alternative to AWS API Gateway
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization