Your dev Jenkins setup running on your MacBook won't survive production. Here's what you actually need to deploy Jenkins properly without getting fired when it inevitably breaks.
Hardware Resources That Matter
Controller Requirements: Don't believe the official docs saying 256MB RAM. For production, start with 16GB RAM and 8 CPU cores minimum. The Jenkins controller is a memory hog, and you'll be restarting it monthly if you skimp on resources.
Real-world sizing from teams who've been burned:
- Small team (1-10 developers): 16GB RAM, 8 cores, 500GB SSD
- Medium team (10-50 developers): 32GB RAM, 16 cores, 1TB SSD
- Large team (50+ developers): 64GB RAM, 24+ cores, 2TB+ SSD
The disk grows forever because Jenkins stores build logs, artifacts, and workspace checkouts indefinitely unless you configure retention policies.
Network Architecture
Load Balancer Setup: Put Jenkins behind a proper load balancer with SSL termination. Use nginx or Apache as reverse proxies. Don't expose Jenkins directly to the internet - that's how you end up on r/sysadmin for all the wrong reasons.
Configure your load balancer for:
- SSL termination with proper certificates
- WebSocket support for modern UI features
- Session stickiness (Jenkins isn't stateless)
- Health checks on
/login
endpoint
Agent Connectivity: Production agents connect back to the controller through firewalls and NAT. The inbound agent protocol works better than SSH in enterprise environments where network admins change firewall rules without warning.
High Availability Architecture
Active-Passive Setup: Jenkins isn't designed for active-active clustering. Use shared storage with active-passive failover instead. Mount $JENKINS_HOME
on shared storage (NFS, EFS, or similar) and run a secondary controller ready to take over.
Backup Strategy: Automated daily backups of the entire $JENKINS_HOME
directory. Include:
- Job configurations (XML files)
- Plugin data and settings
- Build histories and artifacts
- Secret encryption keys
- User and permission data
Store backups off-site and test recovery monthly. I've seen teams lose months of build history because they assumed their cloud provider handled backups.
Container Deployment
Docker in Production: Use the official LTS images with proper volume mounts. Don't run Jenkins as root - create a jenkins user with UID 1000.
FROM jenkins/jenkins:lts-jdk17
USER root
RUN apt-get update && apt-get install -y docker.io
USER jenkins
Kubernetes Deployment: The Jenkins Helm chart handles most configuration. Use persistent volumes for $JENKINS_HOME
and configure pod security contexts properly.
Resource Limits: Set memory limits high enough (16GB+) or Jenkins will OOMKill during large builds. CPU limits should be generous - Jenkins needs burst capacity for parallel builds.
Database and Storage
Job Configuration: Jenkins stores everything as XML files in $JENKINS_HOME
. This scales poorly but it's what we've got. Use fast SSD storage and configure regular XML optimization to prevent corruption.
Artifact Storage: Don't store build artifacts in Jenkins long-term. Configure artifact cleanup policies and use external storage (S3, Nexus, Artifactory) for important artifacts.
Log Management: Build logs accumulate quickly. Set up log rotation and consider external log aggregation with ELK stack or similar.
Monitoring and Alerting
Essential Metrics: Monitor these or you'll be debugging outages at 2am:
- Memory usage (Jenkins leaks memory)
- Disk space (builds consume storage)
- Build queue length (indicates resource constraints)
- Agent connection status
- Plugin update failures
Use the Prometheus plugin for metrics collection and Grafana dashboards for visualization. Set up alerts for disk space (80%+) and memory usage (90%+).
The Monitoring plugin provides basic health checks, but external monitoring catches issues Jenkins can't report on itself.