Currently viewing the AI version
Switch to human version

cert-manager: Kubernetes Certificate Management - AI Reference

Core Value Proposition

Problem Solved: Eliminates manual SSL certificate management that causes production outages during certificate expiration
Critical Failure Scenario: SSL certificate expiration during high-traffic periods (Black Friday example: 4-hour e-commerce outage, direct revenue impact)
Automation Benefit: Prevents 3 AM paging incidents from expired certificates

Technical Specifications

Core Components

  • Certificate Resources: Kubernetes custom resources defining domain certificate requirements
  • Issuer/ClusterIssuer Resources: Certificate Authority configuration (ClusterIssuer = cluster-wide, Issuer = namespace-scoped)
  • CertificateRequest Resources: X.509 standard certificate signing requests (auto-generated, rarely manually managed)

Resource Requirements (Production)

resources:
  limits:
    cpu: 200m      # Default 100m insufficient under load
    memory: 256Mi   # Default 128Mi causes OOM during mass renewals
  requests:
    cpu: 50m
    memory: 64Mi

Challenge Methods

Method Use Case Failure Mode Requirements
HTTP-01 Public services Ingress controller conflicts, firewall blocks port 80 Public domain, ingress controller
DNS-01 Wildcard certs, internal services DNS propagation delays (GoDaddy: 30+ minutes) DNS API access

Critical Configuration

Production-Ready Installation

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true  # CRITICAL: Prevents webhook failures

Multi-Issuer Setup

  • Let's Encrypt Production: Public certificates, free, 90-day lifecycle
  • Let's Encrypt Staging: Testing environment, avoids rate limits
  • Internal CA: Enterprise PKI integration
  • HashiCorp Vault: Internal certificate management

DNS-01 Configuration (AWS Route53 Example)

spec:
  acme:
    solvers:
    - dns01:
        route53:
          region: us-east-1
          # Use IRSA instead of hardcoded keys for security

Critical Failure Modes

Rate Limiting (Let's Encrypt)

  • Limit: 50 certificates per domain per week
  • Consequence: 3-7 day lockout during mass renewals
  • Prevention: Use staging environment for testing, plan certificate requests

Resource Exhaustion

  • Scenario: Mass certificate renewal causes cert-manager pod crashes
  • Root Cause: Default resource limits too low (100m CPU, 128Mi memory)
  • Solution: Increase limits to 200m CPU, 256Mi memory minimum

DNS Propagation Delays

  • Common Providers: GoDaddy (30+ minutes), Namecheap (slow), Cloudflare (fast)
  • Impact: DNS-01 challenge timeouts, renewal failures
  • Mitigation: Increase webhook timeout to 30 seconds

Webhook Timeout Issues

webhook:
  timeoutSeconds: 30  # Default 10s too short for DNS delays

Monitoring and Alerting

Essential Prometheus Metrics

  • cert_manager_certificate_expiration_timestamp_seconds: Certificate expiration tracking
  • cert_manager_certificate_renewal_timestamp_seconds: Renewal success monitoring
  • cert_manager_acme_client_request_count: ACME API request monitoring

Critical Alert (7-day expiration warning)

- alert: CertManagerCertificateExpirySoon
  expr: cert_manager_certificate_expiration_timestamp_seconds - time() < 86400 * 7
  for: 1h
  annotations:
    summary: "Certificate expires in less than 7 days"

Technology Comparison Matrix

Solution Setup Complexity Failure Support CA Support Automation Multi-cluster Vendor Lock-in
cert-manager Medium Good docs/community Universal Full Yes None
Traefik Built-in Low Community forums Let's Encrypt focus Limited Per-cluster Traefik
Manual ACME Scripts High Self-support Let's Encrypt Cron-based Custom tooling None
Cloud Provider Low Paid support Provider-specific Partial Per-cloud High
Vault PKI High Enterprise support Internal only Manual approval Complex HashiCorp

Common Troubleshooting

HTTP-01 Challenge Failures

Debug Commands:

kubectl describe certificate <name>
kubectl describe certificaterequest <name>
kubectl describe challenge <name>

Root Causes:

  • Load balancer not publicly accessible
  • Incorrect ingress.class annotation
  • Firewall blocking port 80
  • Multiple ingress controllers conflict
  • CloudFlare proxy mode interference

DNS-01 Challenge Failures

Root Causes:

  • Invalid DNS API credentials
  • Hosted zone mismatch
  • DNS propagation delays (provider-specific)
  • Missing IAM permissions for DNS record creation
  • DNS provider API rate limiting

Security Considerations

Production Hardening

  • Disable HTTP-01 solver if only using DNS-01: global.disableHTTP01Solver: true
  • Use IAM Roles for Service Accounts (IRSA) instead of hardcoded AWS keys
  • Implement certificate approval workflows with approver-policy for compliance
  • Consider CSI driver for ephemeral certificates in high-security environments

Resource Planning

Scale Considerations

  • Normal operation: 50MB RAM, 10m CPU per pod
  • Mass renewal periods: Resource requirements spike significantly
  • High availability: Run multiple replicas (replicaCount: 2)
  • Multi-cluster: Use ClusterIssuer for organization-wide policies

Cost Factors

  • Let's Encrypt certificates: Free
  • DNS API costs: Route53 charges per query for DNS-01
  • Compute resources: Minimal in normal operation
  • Commercial CA certificates: Variable pricing
  • Operational time: Significantly reduced vs manual management

Migration Strategy

From Manual to Automated

  1. Install cert-manager alongside existing certificates
  2. Configure ClusterIssuer for existing CA
  3. Create Certificate resources for new services
  4. Migrate existing services incrementally
  5. Decommission manual certificate scripts

Critical: Never migrate all certificates simultaneously - gradual migration prevents production impact

Known Limitations

Let's Encrypt Constraints

  • Public domains only (no private/internal domains)
  • Rate limits enforce weekly planning requirements
  • ACME protocol dependencies on external validation

DNS Provider Reliability

  • DNS propagation inconsistency across providers
  • API reliability varies significantly
  • Some providers have extended propagation delays

Kubernetes Dependencies

  • Requires functional ingress controller for HTTP-01
  • Webhook validation adds cluster dependency
  • etcd storage for certificate keys (unless using CSI driver)

Useful Links for Further Investigation

Actually Useful cert-manager Links

LinkDescription
cert-manager Installation GuideThe only installation guide you need. Helm method works best. Skip the kubectl apply bullshit.
GitHub RepositoryCheck this for actual release notes and known issues. The README has useful examples.
cert-manager TroubleshootingWhen things break (and they will), start here. Actually helpful unlike most Kubernetes troubleshooting docs.
Getting Started TutorialBasic nginx + Let's Encrypt setup. Works as advertised, which is rare for Kubernetes tutorials.
ACME HTTP-01 ChallengesFor public-facing services. Simple but requires ingress controller cooperation.
ACME DNS-01 ChallengesFor wildcard certs and internal services. More complex but more flexible.
Route53 DNS-01 SetupMost common DNS-01 provider. Use IRSA for authentication, not hardcoded keys.
Cloudflare DNS-01 SetupPopular alternative to Route53. API tokens work better than global API keys.
HashiCorp Vault IntegrationFor internal PKI. Complex setup but worth it for enterprise environments.
ACME TroubleshootingLet's Encrypt-specific debugging. Actually tells you how to fix common problems.
Prometheus MetricsMonitor certificate expiration and renewal failures. Essential for production.
cert-manager SlackActive community. Real people answer real questions, usually quickly.
Common Issues on GitHubSearch existing issues before creating new ones. Maintainers are helpful but busy.
istio-csr for Service MeshReplaces Istio's built-in certificate management. Only use if you need unified cert policies.
approver-policy for ComplianceManual certificate approval workflows. Breaks automation but compliance teams love it.
trust-managerDistributes CA bundles across clusters. Useful for multi-cluster deployments.
CSI DriverEphemeral certificates that never touch etcd. For paranoid security environments.
Supported DNS Providers ListFull list of DNS-01 challenge providers. Most major providers supported.
Let's Encrypt Rate Limits50 certificates per domain per week. Plan accordingly for large deployments.
Let's Encrypt Staging EnvironmentUse this for testing to avoid hitting rate limits in production.
ACME Challenge TypesHTTP-01 vs DNS-01 explained by Let's Encrypt themselves.
CNCF Project PageOfficial project status and governance. cert-manager graduated in November 2024.
Release NotesWhat changed in each version. Usually minor fixes, occasionally breaking changes.
cert-manager Security AdvisoriesSecurity updates and CVE notifications. Essential reading for production users.
Traefik Built-in ACMEWorks fine if you only use Traefik. Simpler than cert-manager for basic setups.
AWS Certificate ManagerGood if you're all-in on AWS. Tight ALB/CloudFront integration but locks you to AWS.
CertbotManual ACME client. Use for non-Kubernetes environments or when you like writing cron jobs.

Related Tools & Recommendations

integration
Recommended

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

How to Wire Together the Modern DevOps Stack Without Losing Your Sanity

kubernetes
/integration/docker-kubernetes-argocd-prometheus/gitops-workflow-integration
100%
integration
Recommended

Kafka + MongoDB + Kubernetes + Prometheus Integration - When Event Streams Break

When your event-driven services die and you're staring at green dashboards while everything burns, you need real observability - not the vendor promises that go

Apache Kafka
/integration/kafka-mongodb-kubernetes-prometheus-event-driven/complete-observability-architecture
100%
tool
Recommended

NGINX Ingress Controller - Traffic Routing That Doesn't Shit the Bed

NGINX running in Kubernetes pods, doing what NGINX does best - not dying under load

NGINX Ingress Controller
/tool/nginx-ingress-controller/overview
53%
tool
Recommended

HashiCorp Vault - Overly Complicated Secrets Manager

The tool your security team insists on that's probably overkill for your project

HashiCorp Vault
/tool/hashicorp-vault/overview
53%
pricing
Recommended

HashiCorp Vault Pricing: What It Actually Costs When the Dust Settles

From free to $200K+ annually - and you'll probably pay more than you think

HashiCorp Vault
/pricing/hashicorp-vault/overview
53%
tool
Recommended

Let's Encrypt - Finally, SSL Certs That Don't Cost a Mortgage Payment

Free automated certificates that renew themselves so you never get paged at 3am again

Let's Encrypt
/tool/lets-encrypt/overview
53%
tool
Recommended

Fix Helm When It Inevitably Breaks - Debug Guide

The commands, tools, and nuclear options for when your Helm deployment is fucked and you need to debug template errors at 3am.

Helm
/tool/helm/troubleshooting-guide
53%
tool
Recommended

Helm - Because Managing 47 YAML Files Will Drive You Insane

Package manager for Kubernetes that saves you from copy-pasting deployment configs like a savage. Helm charts beat maintaining separate YAML files for every dam

Helm
/tool/helm/overview
53%
integration
Recommended

Making Pulumi, Kubernetes, Helm, and GitOps Actually Work Together

Stop fighting with YAML hell and infrastructure drift - here's how to manage everything through Git without losing your sanity

Pulumi
/integration/pulumi-kubernetes-helm-gitops/complete-workflow-integration
53%
tool
Recommended

Certbot - Get SSL Certificates Without Wanting to Die

alternative to Certbot

Certbot
/tool/certbot/overview
48%
integration
Recommended

Automate Your SSL Renewals Before You Forget and Take Down Production

NGINX + Certbot Integration: Because Expired Certificates at 3AM Suck

NGINX
/integration/nginx-certbot/overview
48%
integration
Recommended

Stop Debugging Microservices Networking at 3AM

How Docker, Kubernetes, and Istio Actually Work Together (When They Work)

Docker
/integration/docker-kubernetes-istio/service-mesh-architecture
48%
tool
Recommended

Istio - Service Mesh That'll Make You Question Your Life Choices

The most complex way to connect microservices, but it actually works (eventually)

Istio
/tool/istio/overview
48%
howto
Recommended

How to Deploy Istio Without Destroying Your Production Environment

A battle-tested guide from someone who's learned these lessons the hard way

Istio
/howto/setup-istio-production/production-deployment
48%
integration
Recommended

Prometheus + Grafana + Jaeger: Stop Debugging Microservices Like It's 2015

When your API shits the bed right before the big demo, this stack tells you exactly why

Prometheus
/integration/prometheus-grafana-jaeger/microservices-observability-integration
48%
tool
Popular choice

Aider - Terminal AI That Actually Works

Explore Aider, the terminal-based AI coding assistant. Learn what it does, how to install it, and get answers to common questions about API keys and costs.

Aider
/tool/aider/overview
48%
tool
Popular choice

jQuery - The Library That Won't Die

Explore jQuery's enduring legacy, its impact on web development, and the key changes in jQuery 4.0. Understand its relevance for new projects in 2025.

jQuery
/tool/jquery/overview
38%
integration
Recommended

RAG on Kubernetes: Why You Probably Don't Need It (But If You Do, Here's How)

Running RAG Systems on K8s Will Make You Hate Your Life, But Sometimes You Don't Have a Choice

Vector Databases
/integration/vector-database-rag-production-deployment/kubernetes-orchestration
36%
alternatives
Recommended

MongoDB Alternatives: Choose the Right Database for Your Specific Use Case

Stop paying MongoDB tax. Choose a database that actually works for your use case.

MongoDB
/alternatives/mongodb/use-case-driven-alternatives
36%
news
Popular choice

vtenext CRM Allows Unauthenticated Remote Code Execution

Three critical vulnerabilities enable complete system compromise in enterprise CRM platform

Technology News Aggregation
/news/2025-08-25/vtenext-crm-triple-rce
36%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization