What's the difference between the two NGINX Ingress Controllers?

There are two completely different projects: [kubernetes/ingress-nginx](https://github.com/kubernetes/ingress-nginx) (community) and [F5 NGINX Ingress Controller](https://docs.nginx.com/nginx-ingress-controller/) (commercial). The community version is free, uses standard Kubernetes Ingress resources, and runs NGINX OSS. F5's version supports both NGINX OSS and Plus, uses custom resources for advanced configuration, and offers commercial support. Both solve the same basic problem but with different feature sets and complexity levels.

Which version should I use for production?

Start with the community [kubernetes/ingress-nginx](https://github.com/kubernetes/ingress-nginx) unless you specifically need F5's advanced features like JWT authentication, advanced rate limiting, or commercial support. The community version handles most production workloads perfectly well and has a larger user base. Upgrade to F5's version if you need enterprise features that justify the additional complexity and cost.

How does performance compare to cloud load balancers?

NGINX Ingress Controller can handle [tens of thousands of requests per second](https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/) depending on configuration and hardware. Cloud load balancers like AWS ALB or GCP Load Balancer offer similar throughput but with higher latency and less control. The real advantage is cost - cloud LBs cost $20-50/month each, while ingress controllers run on your existing cluster nodes.

Can I run multiple ingress controllers in the same cluster?

Yes, using [IngressClass resources](https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class) to specify which controller handles each Ingress. This is common for separating internal vs external traffic, or running different controllers for different applications. Each controller needs its own service and configuration to avoid conflicts.

How do I handle SSL certificates automatically?

Install [cert-manager](https://cert-manager.io/) alongside NGINX Ingress Controller. cert-manager automates certificate provisioning from Let's Encrypt, Venafi, or other ACME providers. Add `cert-manager.io/cluster-issuer: "letsencrypt-prod"` annotation to your Ingress resources, and cert-manager handles the rest. Certificates renew automatically every 60-90 days.

What happens when the ingress controller pod dies?

Everything breaks until Kubernetes gets its shit together and reschedules the pod. Usually takes 30-60 seconds if you're lucky, but I've seen it take 5 minutes when nodes were overloaded. This is why you run multiple ingress replicas behind a cloud load balancer - when it works. Spent 2 hours debugging why the cloud LB wasn't failing over correctly. Turns out the default health check path was wrong and it was marking all pods as unhealthy.

How do I debug traffic routing issues?

Enable debug logging with `kubectl edit configmap nginx-configuration -n ingress-nginx` and add `error-log-level: debug`. Check ingress controller logs with `kubectl logs -n ingress-nginx deployment/nginx-ingress-controller`. The logs show exactly how NGINX processes requests and selects upstreams. Remember to disable debug logging afterward as it impacts performance.

Can NGINX Ingress Controller handle WebSocket connections?

Yes, WebSocket connections work automatically through HTTP/1.1 upgrade headers. No special configuration required for basic WebSocket support. For sticky sessions with WebSockets, use `nginx.ingress.kubernetes.io/affinity: cookie` annotation to ensure connections stick to the same backend pod.

How do I configure rate limiting properly?

Community version: use `nginx.ingress.kubernetes.io/rate-limit-rps: "100"` annotation on Ingress resources. F5 version: create Policy resources with advanced rate limiting including burst, JWT-based tiering, and custom variables. Remember that rate limits apply per NGINX worker process, so multiply by your worker count and replica count for actual limits.

What's the maximum number of ingresses NGINX can handle?

The limit depends on NGINX configuration complexity, not absolute ingress count. Clusters with 1000+ simple ingresses work fine, but complex routing rules increase reload times. Each ingress adds server blocks and upstream configuration to NGINX. F5's version handles larger configurations better through optimized config generation and dynamic updates.

How do I monitor NGINX Ingress Controller performance?

Deploy [nginx-prometheus-exporter](https://github.com/nginx/nginx-prometheus-exporter) to expose NGINX metrics to Prometheus. Key metrics include request rate, response codes, upstream response times, and active connections. Grafana dashboards are available for visualization. The F5 version includes built-in Prometheus metrics and supports OpenTelemetry tracing.

Can I use NGINX Ingress Controller with service mesh?

Yes, but it's often redundant. Service meshes like Istio provide their own ingress gateways with similar functionality. You can run NGINX Ingress Controller as a north-south gateway while the service mesh handles east-west traffic, but this adds complexity. Choose one approach for consistency unless you have specific requirements for both.

How do I handle large file uploads through the ingress?

Increase `nginx.ingress.kubernetes.io/proxy-body-size` annotation (default 1m). For very large files, consider direct pod access or object storage integration to avoid proxying large payloads through the ingress layer. The ingress controller buffers request bodies, which can consume memory and impact performance for concurrent large uploads.

What are common production gotchas to avoid?

Don't use hostnames in upstream configurations - always use IP addresses to avoid DNS lookup delays that'll fuck up your response times. Configure appropriate resource limits and requests for ingress pods (learned this one the hard way when OOMKiller took down prod). Plan for certificate renewal failures with backup ACME issuers. Monitor NGINX reload times in large clusters - if reloads start taking more than 10 seconds, you're in for a world of pain. Use anti-affinity rules to spread ingress pods across nodes. Test configuration changes in staging before production deployment - seriously, don't be the person who breaks Friday deployments.

How do I migrate from cloud load balancers to NGINX Ingress Controller?

Start with parallel deployment - run both systems simultaneously while testing. Update DNS entries gradually to shift traffic to the ingress controller. Most cloud LB features translate to NGINX annotations or configurations. The biggest difference is certificate management - cloud LBs often handle this automatically while NGINX requires cert-manager or manual management.

Currently viewing the AI version

Switch to human version

NGINX Ingress Controller: Production Implementation Guide

Configuration Options

Version Selection

Community Version (kubernetes/ingress-nginx)

Cost: Free (Apache 2.0 license)
Backend: NGINX OSS
Configuration: Kubernetes Ingress resources + annotations
Performance: 45k req/sec in production
Memory: 200MB base + 5MB per 100 ingresses
Maintenance: Community-supported, large user base

Commercial Version (F5 NGINX Ingress Controller)

Cost: Commercial licensing + support
Backend: NGINX OSS or NGINX Plus
Configuration: Custom Resource Definitions (CRDs)
Performance: 60k+ req/sec with NGINX Plus
Memory: 150MB base (NGINX Plus)
Features: HTTP/3, OpenTelemetry tracing, NGINX One Console integration

Decision Criteria

Start with community version unless specific enterprise features required
Upgrade to F5 version for: JWT authentication, advanced rate limiting, commercial support
Production threshold: F5 version handles larger configurations better (1000+ ingresses)

Resource Requirements

Hardware Specifications

Memory: 200MB base + 5MB per 100 ingresses (community)
CPU: Scales with request rate and configuration complexity
Network: Sub-millisecond latency for static content
Storage: Minimal - primarily configuration and logs

Scaling Thresholds

Configuration reload time: >10 seconds indicates performance degradation
Maximum ingresses: 1000+ simple ingresses supported
Complex routing rules: Significantly increase reload times
Worker processes: Rate limits apply per worker, multiply by worker count

Time Investment

Initial setup: 1-2 hours with Helm charts
SSL automation: Additional 2-4 hours for cert-manager integration
Production hardening: 1-2 days for monitoring, HA setup
Learning curve: Medium complexity, requires NGINX knowledge

Critical Warnings

SSL Certificate Management

FAILURE MODE: Certificate expiration at production runtime

Root cause: cert-manager renewal failures (DNS propagation, rate limiting, ACME challenges)
Impact: Complete service outage for HTTPS traffic
Prevention: Configure backup ACME issuers, monitor renewal 30 days before expiration
Rate limits: Let's Encrypt allows 50 certificates per domain per week

FAILURE MODE: Wildcard certificate DNS-01 challenges

Requirement: Cloud DNS API credentials (Route53, CloudFlare, Google DNS)
Risk: Credential management becomes additional failure point
Solution: Use multiple DNS providers for redundancy

Rate Limiting Misconceptions

CRITICAL ERROR: Rate limits apply per NGINX worker process, not globally

Real behavior: Each worker process applies limits independently
Common mistake: Expecting global rate limiting across all pods
Actual traffic: 12x expected rate when using multiple workers
Solution: Calculate limits as (desired_rate / workers / replicas)

Production Deployment Gotchas

FAILURE MODE: Using hostnames in upstream configurations

Impact: DNS lookup delays cause response time degradation
Solution: Always use IP addresses for upstream backends

FAILURE MODE: Missing resource limits on ingress pods

Risk: OOMKiller terminates pods during traffic spikes
Solution: Configure appropriate CPU/memory requests and limits

FAILURE MODE: Single ingress controller pod

Downtime: 30-60 seconds minimum, up to 5 minutes under load
Solution: Multiple replicas with anti-affinity rules across nodes

Configuration Complexity Issues

BREAKING POINT: Complex regex patterns and custom annotations

Symptom: Reload times increase from 2 seconds to 30+ seconds
Impact: Service disruption during configuration changes
Solution: Simplify routing rules, use F5 version for dynamic updates

Performance Specifications

Throughput Benchmarks

Controller Type	Requests/Second	Memory Usage	HTTP/3 Support
Community (OSS)	45,000	200MB base	❌ No
F5 (OSS)	45,000	200MB base	❌ No
F5 (Plus)	60,000+	150MB base	✅ Yes

Latency Characteristics

Static content: Sub-millisecond latency
Dynamic requests: Efficient connection handling
SSL handshakes: Production-tested TLS implementation
Configuration overhead: Minimal impact from Kubernetes integration

Security Implementation

TLS Termination

Automatic certificates: cert-manager + Let's Encrypt integration
SNI support: Multiple domains per IP address
Certificate rotation: 90-day expiration with automatic renewal

Critical Vulnerabilities

SECURITY ALERT: CVE-2025-1974 and related vulnerabilities (March 2025)

Affected: Community ingress-nginx controller
Action required: Update to patched versions immediately
Mitigation: Ensure proper network isolation

Advanced Security Features (F5 Only)

WAF integration: NGINX App Protect for OWASP Top 10 protection
Geographic restrictions: GeoIP module for compliance requirements
Authentication delegation: auth_request module for centralized auth
JWT-based policies: Claims-based access control

Monitoring and Debugging

Essential Metrics

Request rates: Per-second throughput monitoring
Response codes: Error rate tracking (4xx, 5xx)
Upstream health: Backend service availability
SSL statistics: Handshake success rates
Active connections: Current load monitoring

Debugging Tools

Debug logging: error-log-level: debug in configmap
Access logs: Per-ingress logging with annotations
Request tracing: OpenTelemetry support (F5 version)
Configuration validation: nginx -t automatic testing

Production Monitoring Stack

Metrics exporter: nginx-prometheus-exporter
Visualization: Grafana dashboards available
Log aggregation: kubectl logs from ingress pods
Alerting: Certificate expiration, pod health, response times

High Availability Patterns

Deployment Architecture

Controller type: DaemonSet on dedicated nodes or Deployment with replicas
Load balancer: Cloud LB for health checking and traffic distribution
Service exposure: LoadBalancer or NodePort services on ports 80/443

Failure Resilience

Node affinity: Spread pods across different worker nodes
Zone distribution: Anti-affinity rules across availability zones
Health checks: Kubernetes readiness probes for automatic failover
Backup systems: Multiple certificate issuers for redundancy

Migration Strategy

From Cloud Load Balancers

Parallel deployment: Run both systems simultaneously
Traffic testing: Validate functionality before DNS changes
Gradual migration: Update DNS entries incrementally
Feature mapping: Translate cloud LB features to NGINX annotations
Certificate management: Implement cert-manager before cutover

Cost Analysis

Cloud LB cost: $20-50/month per load balancer
NGINX Ingress: Runs on existing cluster nodes (compute cost only)
Break-even point: 2-3 load balancers justify ingress controller adoption

Common Failure Scenarios

Certificate Renewal Failures

Frequency: High risk during Let's Encrypt outages or DNS issues
Detection: Monitor certificate expiration 30 days in advance
Recovery: Manual certificate issuance or backup ACME provider
Prevention: Multiple certificate issuers configured

Configuration Reload Failures

Trigger: Invalid NGINX configuration from complex Ingress rules
Impact: New configurations rejected, existing traffic continues
Detection: Controller logs show nginx -t validation errors
Recovery: Fix Ingress resource syntax, simplify routing rules

Pod Scheduling Failures

Cause: Resource constraints or node affinity conflicts
Impact: Reduced ingress capacity or single points of failure
Detection: Pod status monitoring and replica count alerts
Recovery: Adjust resource requests or node labeling

Implementation Checklist

Basic Setup

Install via Helm with production values
Configure resource limits and requests
Set up anti-affinity rules for HA
Expose via LoadBalancer or NodePort service

SSL Configuration

Install cert-manager
Configure Let's Encrypt cluster issuer
Set up backup ACME provider
Test certificate provisioning and renewal

Monitoring Setup

Deploy nginx-prometheus-exporter
Configure Grafana dashboards
Set up certificate expiration alerts
Implement log aggregation

Production Hardening

Enable debug logging temporarily for testing
Configure appropriate rate limiting
Test failover scenarios
Document emergency procedures

Key Resources

Primary documentation: kubernetes/ingress-nginx GitHub repository
Installation guide: Helm chart with production values.yaml
Annotations reference: Complete configuration options
Troubleshooting: Step-by-step debugging procedures
Community support: Kubernetes Slack #ingress-nginx channel

Useful Links for Further Investigation

Essential Resources for NGINX Ingress Controller

Link	Description
kubernetes/ingress-nginx GitHub	The community repo that actually works. Issues section has real problems with solutions that don't suck. Installation docs are decent once you ignore the minikube examples.
NGINX Ingress Controller Helm Installation	Use this to install it. The values.yaml has everything you need to not fuck up your deployment. Production-ready defaults that mostly work out of the box.
Configuration Annotations Reference	The only documentation that matters. Every annotation you'll ever need with examples that actually work.
cert-manager Integration Tutorial	How to not manually manage SSL certs like an animal. Follow this exactly or spend your weekends renewing certificates.
Troubleshooting Guide	Actually helpful when shit breaks and the logs tell you nothing useful. Real debugging steps that work.
Stack Overflow - nginx-ingress	Better answers than official docs when everything's on fire. Real problems with solutions from people who've debugged this at 3am.
Kubernetes Slack #ingress-nginx	Get help from people who actually use this stuff. Expect some attitude if you ask obviously Googleable questions.