How to Reduce Kubernetes Costs in Production - Complete Optimization Guide

Phase 1: Assessment and Baseline - Know Where Your Money Goes

You need visibility into pod-level costs because AWS billing is about as helpful as a chocolate teapot - it shows you EC2 instances but not which app is eating your budget. Most teams skip this step and start randomly fucking with resource limits, then wonder why their costs went up instead of down.

Step 1: Install Cost Monitoring (Choose Your Poison)

KubeCost Dashboard Overview: Once installed, the KubeCost dashboard provides detailed breakdowns of costs by namespace, deployment, and individual pods. The interface shows real-time spend, resource utilization percentages, and identifies over-provisioned workloads through intuitive charts and cost allocation views.

You need visibility into pod-level costs. Native cloud billing shows you EC2 instances but not which application ate your budget.

Option A: KubeCost (Recommended for Speed)

## Quick install for immediate visibility
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer -n kubecost --create-namespace \
  --set prometheus.server.resources.requests.memory=4Gi \
  --set prometheus.server.resources.limits.memory=8Gi

Option B: OpenCost (Free Forever)

## CNCF project, no licensing limits
kubectl apply -f https://raw.githubusercontent.com/opencost/opencost/develop/kubernetes/opencost.yaml

Option C: Manual Setup with Existing Prometheus
If you already have Prometheus, configure KubeCost to use it:

prometheus:
  server:
    enabled: false
  prometheusEndpoint: "http://prometheus-server.monitoring.svc.cluster.local"

Step 2: Gather Resource Usage Data (The Reality Check)

Run these commands to see your actual vs requested resources. The gap will shock you.

## Check current resource usage
kubectl top pods --all-namespaces --sort-by memory
kubectl top nodes

## Find pods without resource limits (danger zone)
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.metadata.name}{"	"}{.spec.containers[*].resources}{"
"}{end}' | grep -v limits

## Identify biggest memory consumers
kubectl top pods --all-namespaces --sort-by=memory | head -20

## Find long-running pods that might be idle
kubectl get pods --all-namespaces --field-selector=status.phase=Running -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,AGE:.metadata.creationTimestamp" | sort -k3

What you're looking for:

Pods using 10% of their memory limit (over-provisioned)
Pods with no resource limits set (resource bombs waiting to happen)
Development/staging namespaces consuming production-level resources
Long-running jobs that should have finished hours ago

Essential Cost Monitoring Resources:

KubeCost Documentation - Complete installation and configuration guide
OpenCost Project - CNCF-backed open source alternative
Kubernetes Resource Management Guide - Official resource documentation
Prometheus Node Exporter - Required for accurate cost metrics
AWS EKS Cost Monitoring - Cloud provider integration
GKE Cost Optimization - Google Cloud specific guidance
Azure AKS Cost Analysis - Microsoft Azure cost tracking
KubeCost API Documentation - For automated reporting and alerts
Helm Chart Values Reference - All configuration options
KubeCost Troubleshooting - When installation fails
Resource Quota Documentation - Namespace-level controls
Prometheus Querying Guide - Custom cost metrics

Step 3: Analyze Your Current Spend Patterns

Kubernetes Cost Analysis Dashboard

Access your cost monitoring tool (KubeCost UI is typically at kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090) and document:

Cost by Namespace (The Shock Factor)

Which namespaces cost the most per month?
Are dev/test environments costing more than production?
Any rogue namespaces you forgot about?

Cost by Workload Type

Deployments vs StatefulSets vs Jobs
Which applications have the worst cost-to-traffic ratio?
Identify batch jobs that never finished

Idle Resource Detection

CPU utilization under 20% consistently
Memory utilization under 50% consistently
Network traffic near zero

Step 4: Document Your Baseline Numbers

Create a spreadsheet with current monthly costs:

Namespace	Monthly Cost	Pod Count	Avg CPU %	Avg Memory %	Notes
production	$12,000	45	65%	78%	Acceptable
staging	$8,500	52	12%	23%	WASTE
dev-team-1	$3,200	28	8%	15%	WASTE
ml-training	$15,000	3	95%	90%	Check if still needed

Red flags to investigate:

Any environment with <30% resource utilization
Staging/dev costing >50% of production
Individual pods costing >$500/month
Persistent volumes growing >10GB/month

Step 5: Identify Quick Wins (The Low-Hanging Fruit)

Before diving into complex optimizations, grab the obvious savings:

Unused Resources Audit

## Find PVCs that aren't mounted
kubectl get pvc --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,STATUS:.status.phase,VOLUME:.spec.volumeName" | grep -v Bound

## Find services without endpoints (dead load balancers)
kubectl get svc --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,TYPE:.spec.type,ENDPOINTS:.status.loadBalancer" | grep LoadBalancer | grep -v "<nil>"

## Find deployments scaled to zero (but still consuming storage)
kubectl get deployments --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,REPLICAS:.spec.replicas,AVAILABLE:.status.availableReplicas" | grep "0.*0"

Development Environment Cleanup

Shut down development clusters outside business hours (save 60% on dev costs)
Use smaller node types for non-production workloads
Implement auto-shutdown for feature branch environments

Storage Cleanup

## Find large persistent volumes
kubectl get pv -o custom-columns="NAME:.metadata.name,SIZE:.spec.capacity.storage,CLASS:.spec.storageClassName" --sort-by=.spec.capacity.storage

## Check which PVs are actually being used
kubectl describe pv | grep -A5 -B5 "Status:.*Available"

Expected Results After Baseline Assessment

If you're typical, you'll discover:

40-60% of CPU requests are unused
30-50% of memory requests are unused
Dev/staging environments consume 40-70% of total spend
10-20% of storage is attached to deleted resources
5-15% of load balancers serve zero traffic

Immediate 30-day savings potential:

Rightsizing over-provisioned resources: 15-25% cost reduction
Shutting down unused environments: 10-20% cost reduction
Storage cleanup: 5-10% cost reduction
Total quick wins: 30-55% cost reduction

This baseline gives you the data needed for Phase 2: actually implementing the optimizations. Most teams see 30-40% savings just from Phase 1 cleanup, before touching any advanced strategies.

The next phase covers implementing the technical changes: resource right-sizing, spot instances, and autoscaling configurations.

Cost Optimization Strategy Comparison - What Actually Works

Strategy	Cost Savings	Implementation Time	Risk Level	Best For	Gotchas
Resource Right-sizing	20-40%	1-2 weeks	⚠️ Medium	Over-provisioned workloads	Can cause OOM kills if too aggressive
Spot Instances	50-90%	2-4 weeks	⚠️ Medium-High	Stateless, fault-tolerant apps	Interruptions can break poorly designed apps
Reserved Instances	30-72%	1 day	✅ Low	Predictable, steady workloads	1-3 year commitment, less flexibility
Cluster Autoscaling	15-30%	1 week	⚠️ Medium	Variable traffic patterns	Can be too slow during traffic spikes
Namespace Shutdown	40-80%	1 day	✅ Low	Dev/test environments	Requires scheduling/automation
Storage Optimization	10-25%	1-2 weeks	✅ Low	Large persistent volumes	Data migration can be complex
Node Pool Optimization	25-45%	2-3 weeks	⚠️ Medium	Mixed workload types	Requires understanding workload requirements
Preemptible/Spot Pods	60-80%	3-5 weeks	🔴 High	Batch jobs, ML training	Needs robust retry mechanisms
Network Optimization	5-15%	1-3 weeks	✅ Low	Multi-AZ deployments	Complex to measure impact
Karpenter (AWS)	30-50%	2-4 weeks	⚠️ Medium	EKS clusters with mixed workloads	Requires EKS 1.23+, Karpenter v1.0+ is now stable

Phase 2: Implementation - Resource Right-Sizing and Spot Instance Integration

Now that you know where the waste is, it's time to fix it. This phase focuses on the two strategies with the biggest impact: right-sizing your resources and integrating spot instances safely.

Step 1: Resource Right-Sizing (The 40% Solution)

Kubernetes CPU and Memory Resource Management

Right-sizing means stop guessing what your apps need and actually look at what they're using. Most developers ask for 4GB of RAM when their app uses 200MB because they're scared of OOM kills. I've seen this reduce costs by 40%, but I've also seen it crash production when someone got too aggressive with the memory limits.

Gather Usage Data (2-4 weeks minimum)

Don't right-size based on one day's data unless you enjoy explaining outages. I learned this when I right-sized based on Sunday traffic and then Monday morning murdered everything:

## Get historical usage data with Prometheus queries
## CPU usage over time
rate(container_cpu_usage_seconds_total[5m])

## Memory usage patterns  
container_memory_working_set_bytes / (1024*1024*1024)

## If you don't have Prometheus, use kubectl top for current data
kubectl top pods --all-namespaces --containers --sort-by=memory
kubectl top pods --all-namespaces --containers --sort-by=cpu

The Right-Sizing Formula

Memory: Set requests to 80th percentile of usage + 20% buffer
CPU: Set requests to 95th percentile of usage (no buffer needed)
Memory limits: 150-200% of requests (prevents OOM kills)
CPU limits: Usually avoid them (they cause throttling)

Example Right-Sizing Process

Original wasteful configuration:

resources:
  requests:
    memory: \"2Gi\"    # App actually uses 400Mi
    cpu: \"1000m\"     # App actually uses 100m
  limits:
    memory: \"4Gi\"
    cpu: \"2000m\"

Right-sized configuration:

resources:
  requests:
    memory: \"500Mi\"  # was 400Mi but kept OOMing, added buffer
    cpu: \"150m\"      # app spikes to ~120m during deployments
  limits:
    memory: \"750Mi\"  # TODO: check if this is still needed after prometheus fix
    # No CPU limit - causes weird throttling during traffic spikes

Mass Right-Sizing Script

For deployments with dozens of applications, automate the process:

#!/bin/bash
## Script: rightsize-deployments.sh
## Updates resource requests based on actual usage data

NAMESPACE=${1:-default}
USAGE_DATA_FILE=\"usage-analysis.csv\"  # Pre-generated from monitoring

while IFS=, read -r deployment memory_p80 cpu_p95; do
    # Calculate new resource requests
    MEMORY_REQ=$(echo \"$memory_p80 * 1.2\" | bc)Mi
    CPU_REQ=$(echo \"$cpu_p95 * 1.5\" | bc)m
    
    echo \"Updating $deployment: Memory $MEMORY_REQ, CPU $CPU_REQ\"
    
    kubectl patch deployment $deployment -n $NAMESPACE -p '{
        \"spec\": {
            \"template\": {
                \"spec\": {
                    \"containers\": [{
                        \"name\": \"'$deployment'\",
                        \"resources\": {
                            \"requests\": {
                                \"memory\": \"'$MEMORY_REQ'\",
                                \"cpu\": \"'$CPU_REQ'\"
                            },
                            \"limits\": {
                                \"memory\": \"'$(echo \"$MEMORY_REQ * 1.5\" | bc)'Mi\"
                            }
                        }
                    }]
                }
            }
        }
    }'
done < $USAGE_DATA_FILE

Right-Sizing Validation

After applying changes, monitor for issues:

## Watch for OOMKilled pods
kubectl get events --all-namespaces --field-selector reason=OOMKilling

## Monitor pod restart counts  
kubectl get pods --all-namespaces -o custom-columns=\"NAMESPACE:.metadata.namespace,NAME:.metadata.name,RESTARTS:.status.containerStatuses[0].restartCount\" | sort -k3 -n

## Check for CPU throttling
kubectl top pods --all-namespaces --sort-by=cpu

Step 2: Spot Instance Integration (The 70% Solution)

AWS Spot Instance Architecture

Spot instances provide 50-90% cost savings but require architectural changes. Here's how to implement them safely.

Pre-Spot Architecture Assessment

Before using spot instances, verify your applications can handle node failures:

## Check if apps are stateless (good for spot)
kubectl get all --all-namespaces -o wide | grep -E \"(StatefulSet|PersistentVolumeClaim)\"

## Verify replica counts (need 2+ for resilience)
kubectl get deployments --all-namespaces -o custom-columns=\"NAMESPACE:.metadata.namespace,NAME:.metadata.name,REPLICAS:.spec.replicas\" | grep \" 1$\"

## Check for PodDisruptionBudgets (required for spot)
kubectl get pdb --all-namespaces

Step 2a: Create Mixed Node Groups (AWS EKS Example)

Create separate node groups for different workload tiers:

## spot-nodes.yaml - 70% of capacity on spot instances
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production-cluster
  region: us-west-2

nodeGroups:
  # Spot instance group for stateless workloads  
  - name: spot-workers
    instancesDistribution:
      maxPrice: 0.50  # Never pay more than 50% of on-demand
      instanceTypes: [\"m5.large\", \"m5.xlarge\", \"m5a.large\", \"m4.large\"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotInstancePools: 4  # Diversify across instance types
    desiredCapacity: 6
    minSize: 0
    maxSize: 20
    taints:
      - key: \"spot-instance\"
        value: \"true\"
        effect: \"NoSchedule\"
    labels:
      node-type: \"spot\"
    tags:
      cost-optimization: \"spot-instances\"

  # On-demand group for critical workloads
  - name: ondemand-workers  
    instanceType: \"m5.large\"
    desiredCapacity: 2
    minSize: 1
    maxSize: 5
    labels:
      node-type: \"ondemand\"
    tags:
      cost-optimization: \"reserved-instances\"

Step 2b: Configure Workload Tolerations

Update deployments to use spot instances where appropriate:

## deployment-spot-ready.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3  # Minimum 3 for spot resilience
  selector:
    matchLabels:
      app: web-app
  template:
    spec:
      # Allow scheduling on spot instances
      tolerations:
      - key: \"spot-instance\"
        operator: \"Equal\"
        value: \"true\"
        effect: \"NoSchedule\"
      
      # Prefer spot instances but fallback to on-demand
      nodeSelector:
        node-type: \"spot\"
      
      # Spread across AZs to handle spot interruptions
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web-app
            
      # Required: Pod Disruption Budget
      terminationGracePeriodSeconds: 30
      
      containers:
      - name: web-app
        image: nginx:latest
        resources:
          requests:
            memory: \"128Mi\"  # Right-sized from Phase 1
            cpu: \"100m\"
          limits:
            memory: \"256Mi\"
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  selector:
    matchLabels:
      app: web-app
  maxUnavailable: 1  # Ensure at least 2 pods always running

Step 2c: Install Spot Instance Interruption Handler

Critical for graceful handling of spot interruptions:

## Install AWS Node Termination Handler
kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.21.0/all-resources.yaml

## Or using Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true

Step 3: Implement Cluster Autoscaling

Configure autoscaling to handle variable workloads efficiently:

Horizontal Pod Autoscaler (HPA)

## hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up at 70% CPU
  - type: Resource  
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale up at 80% memory
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30  # React quickly to load
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300  # Scale down slowly to avoid flapping
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Cluster Autoscaler Configuration

## cluster-autoscaler.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    spec:
      # Pin to on-demand nodes (critical system component)
      nodeSelector:
        node-type: \"ondemand\"
      tolerations:
      - key: \"spot-instance\"
        operator: \"Equal\"
        value: \"true\"
        effect: \"NoSchedule\"
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.27.3
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production-cluster
        - --balance-similar-node-groups
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --skip-nodes-with-system-pods=false

Step 4: Advanced Node Optimization with Karpenter (AWS Only)

For AWS EKS, Karpenter provides more intelligent node provisioning than Cluster Autoscaler:

## Install Karpenter
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version 0.32.0 \
  --namespace karpenter --create-namespace \
  --set settings.aws.clusterName=production-cluster \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi

## karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  # Template for node configuration
  template:
    metadata:
      labels:
        node-type: \"karpenter\"
    spec:
      # Mix of spot and on-demand with preference for spot
      requirements:
      - key: \"karpenter.sh/capacity-type\"
        operator: In
        values: [\"spot\", \"on-demand\"]
      - key: \"node.kubernetes.io/instance-type\"  
        operator: In
        values: [\"m5.large\", \"m5.xlarge\", \"m5a.large\", \"m5.2xlarge\"]
      - key: \"kubernetes.io/arch\"
        operator: In
        values: [\"amd64\"]
        
      # Prefer spot instances for cost optimization
      nodeClassRef:
        name: default
      taints:
      - key: \"karpenter\"
        value: \"true\"
        effect: \"NoSchedule\"
        
  # Automatic node termination to reduce costs
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    expireAfter: 2160h # 90 days
    
  # Limits to prevent runaway scaling
  limits:
    cpu: 1000
    memory: 1000Gi
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass  
metadata:
  name: default
spec:
  # Use latest EKS optimized AMI
  amiFamily: AL2
  subnetSelectorTerms:
  - tags:
        karpenter.sh/discovery: production-cluster
  securityGroupSelectorTerms:
  - tags:
        karpenter.sh/discovery: production-cluster
  instanceStorePolicy: RAID0  # Use local SSD for better performance
  userData: |
    #!/bin/bash
    /etc/eks/bootstrap.sh production-cluster

What to Expect When You Do This Shit

Week 1-2: Resource Right-sizing

Expected savings: 20-30%
Risk: Medium (potential OOM kills)
Validation: Monitor pod restarts and memory usage

Week 3-4: Basic Spot Integration

Expected additional savings: 15-25%
Risk: Medium (application interruptions)
Validation: Test interruption handling in staging

Week 5-6: Advanced Autoscaling

Expected additional savings: 10-15%
Risk: Low (mostly efficiency gains)
Validation: Monitor scaling behavior during traffic spikes

Total Expected Results:

Combined cost reduction: 45-70%
Implementation time: 4-6 weeks
Engineering effort: 40-80 hours
Ongoing maintenance: 2-4 hours/month

Critical Success Metrics:

No increase in application downtime
Pod restart rate <2% increase
Scaling response time <2 minutes
Cost monitoring showing projected savings

Resource Right-Sizing and Spot Instance Resources:

Kubernetes Resource Management - Official resource documentation
AWS Spot Instance Best Practices - Spot interruption handling
Node Termination Handler - Graceful spot instance termination
EKS Managed Node Groups - Creating mixed instance groups
Karpenter Documentation - Next-generation AWS cluster autoscaler
Pod Disruption Budgets - Ensuring availability during disruptions
Vertical Pod Autoscaler - Automated resource right-sizing
Cluster Autoscaler FAQ - Troubleshooting autoscaling issues
GKE Preemptible Instances - Google Cloud spot equivalent
Azure Spot VMs with AKS - Microsoft Azure spot instances
Prometheus Resource Monitoring - Custom resource metrics
Grafana Resource Dashboards - Monitoring resource utilization

The next phase covers automation and monitoring to ensure these optimizations continue working long-term without manual intervention.

Frequently Asked Questions

My right-sizing changes either save nothing or crash production - what the fuck am I doing wrong?

Kubernetes Resource Usage Monitoring

Here's how to tell if you're helping or just breaking things:

Before making changes: Record baseline metrics for 1 week

# Record current costs and performance
kubectl top pods --all-namespaces > baseline-usage.txt
# Document response times and error rates from APM tool

After right-sizing: Wait 48 hours, then compare:

# Check for increased restart rates (bad sign)
kubectl get pods --all-namespaces -o custom-columns="NAME:.metadata.name,RESTARTS:.status.containerStatuses[0].restartCount" | sort -k2 -n | tail -20

# Monitor for OOMKilled events
kubectl get events --all-namespaces --field-selector reason=OOMKilling

# Compare resource usage patterns
kubectl top pods --all-namespaces --sort-by=memory

Cost validation: Use your monitoring tool (KubeCost/OpenCost) to compare monthly projections

Red flags that mean you went too aggressive:

Pod restart count increased >20%
Response times increased >10%
Any OOMKilled events for production pods
CPU throttling showing up in monitoring

My spot instances are getting murdered left and right and taking my app down with them - what the hell am I doing wrong?

Your app isn't spot-ready. Here's what you fucked up:

Check replica count - You need minimum 3 replicas for spot resilience:

kubectl get deployments --all-namespaces -o custom-columns="NAME:.metadata.name,REPLICAS:.spec.replicas" | grep -E " [12]$"

Verify Pod Disruption Budgets exist:

kubectl get pdb --all-namespaces
# Should return PDBs for all spot-instance workloads

Test your app's shutdown behavior:

# Simulate spot interruption
kubectl drain node-name --ignore-daemonsets --delete-emptydir-data
# Watch how your app handles graceful shutdown

Check if you're using inappropriate workloads on spot:
- Databases or StatefulSets (move to on-demand)
- Single-replica deployments (increase replicas)
- Apps that can't handle 30-second shutdown windows

Pro tip: Start with batch jobs and dev environments on spot before moving production workloads.

I implemented autoscaling but it's either too slow or scaling too aggressively

The autoscaling tuning that actually works:

For HPA being too slow:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 30  # React faster (default 300s)
    policies:
    - type: Percent
      value: 100  # Double pods quickly
      periodSeconds: 15

For HPA being too aggressive:

basis:
  scaleDown:
    stabilizationWindowSeconds: 600  # Wait longer before scaling down
    policies:
    - type: Percent
      value: 10   # Scale down slowly
      periodSeconds: 60

For Cluster Autoscaler being slow:

## Reduce scale-down delays
--scale-down-delay-after-add=5m
--scale-down-unneeded-time=5m
--scale-down-delay-after-failure=1m

The real issue: Default settings assume you prefer stability over cost. For cost optimization, you want faster scaling with shorter delays.

KubeCost shows I'm spending $30k but AWS bill is only $25k - which is right?

KubeCost includes estimated costs AWS doesn't show in the same place:

Network transfer costs - Check your Data Transfer line items in AWS billing
EBS storage and snapshots - Look at EC2-EBS and snapshot costs
Load balancer costs - ELB charges are separate line items
Reserved instance amortization - KubeCost spreads RI costs differently

Fix this with bill reconciliation:

## Enable AWS Cost and Usage Report integration
kubecostProductConfigs:
  athenaProjectID: my-athena-database
  athenaBucketName: aws-cur-bucket
  athenaRegion: us-east-1
  awsSpotDataRegion: us-east-1
  awsSpotDataBucket: spot-data-feed-bucket

Wait 24-48 hours after enabling bill reconciliation. The numbers should align within 2-3%.

How do I convince management that spending 2 months on cost optimization is worth it?

Build a business case with real numbers:

Calculate current waste (from Phase 1 assessment):

Current monthly spend: ~$47k (give or take)
Estimated waste (maybe 40%): ~$19k/month
Annual waste: somewhere around $230k

Project savings (conservative estimate):

Phase 1 (right-sizing): maybe $8-12k/month if we're lucky
Phase 2 (spot instances): probably another $12-15k/month  
Total savings: $20-27k/month = $240-320k/year (if nothing breaks)

Calculate ROI:

Engineering cost (160 hours @ $150/hour): $24,000
Annual savings: $282,000
ROI: 1,175% (pays for itself in 1 month)

Present risk mitigation:
- "Without optimization, costs will grow 30% annually as we scale"
- "Competitors using spot instances have 70% lower compute costs"
- "Current over-provisioning masks performance issues we should fix"

My dev team says resource limits will hurt performance - how do I handle this?

Performance vs cost conversation that works:

Show actual usage data:

# Prove most apps use <20% of allocated resources
kubectl top pods --all-namespaces --containers | awk '{print $1, $3, $4}'

Offer A/B testing:
- Right-size 25% of non-critical workloads first
- Measure performance impact over 2 weeks
- Let data drive the conversation
Explain the performance benefits:
- Better resource utilization = more predictable performance
- Proper limits prevent noisy neighbor problems
- Right-sizing forces fixing actual performance issues
Give them a performance budget:
"We'll optimize for cost, but maintain <5% performance degradation. If we exceed that, we'll adjust."

I followed the guide but only saved 15% instead of 50% - what went wrong?

Common reasons optimizations underperform:

Your workloads were already somewhat optimized - Some teams only have 20% waste, not 60%

You didn't implement spot instances properly:

# Check what percentage is actually on spot
kubectl get nodes -l node-type=spot --no-headers | wc -l
kubectl get nodes --no-headers | wc -l

Reserved Instances are masking savings - If 80% of your capacity is covered by RIs, spot savings are limited
Monitoring timeframe is too short - Cost optimization benefits compound over months, not days
You optimized the wrong workloads - Focus on the highest-cost namespaces first

Debug your optimization:

Run the Phase 1 assessment again - where is remaining waste?
Check if right-sizing actually took effect (compare current vs baseline)
Verify spot instances are being used for appropriate workloads

Our compliance team says spot instances violate our SLA requirements

How to use spot instances while meeting SLAs:

Architecture approach:
- Critical path: 100% on-demand instances
- Background jobs: 100% spot instances
- Web tier: 70% spot, 30% on-demand

SLA-safe spot implementation:

# Ensure critical replicas on on-demand
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: critical-api
      topologyKey: "node-type"
nodeSelector:
  node-type: "ondemand"  # First 2 replicas

Document the math:
- Spot interruption rate: <5% per instance per month
- With 3+ replicas across AZs: >99.9% availability
- On-demand failover: <30 second failover time
Start with non-SLA workloads:
- Development environments (no SLA)
- Batch processing (time-flexible)
- Background workers (retry-able)

How do I automate this so I don't have to babysit cost optimization forever?

Set up automation that actually works:

Automated right-sizing with Vertical Pod Autoscaler:

# Clone the autoscaler repository and install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh

Cost monitoring alerts:

# Prometheus alert for cost spikes
- alert: KubernetesCostSpike
  expr: increase(kubecost_cluster_cost_total[24h]) > increase(kubecost_cluster_cost_total[24h] offset 7d) * 1.3
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Kubernetes costs increased >30% vs last week"

Automated dev environment shutdown:

# Cron job to shutdown dev namespaces at 6 PM
apiVersion: batch/v1
kind: CronJob
metadata:
  name: shutdown-dev-environments
spec:
  schedule: "0 18 * * 1-5"  # 6 PM Monday-Friday
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: shutdown
            image: bitnami/kubectl
            command: ["/bin/sh"]
            args:
            - -c
            - kubectl scale deployment --all --replicas=0 -n dev-team-1 -n dev-team-2 -n staging

Monthly cost review automation:
- Set up monthly reports from your cost monitoring tool
- Automated Slack alerts for >10% cost increases
- Automated recommendations for new right-sizing opportunities

The key: Automate the monitoring and alerting, but keep human judgment in the optimization decisions.

Phase 3: Automation and Long-term Monitoring

You spent weeks optimizing everything and saved 50% on your bill. Six months later you're back where you started because developers gonna develop and costs gonna creep. This phase stops that bullshit from happening again.

Step 1: Implement Continuous Right-Sizing with VPA

Vertical Pod Autoscaler Architecture

VPA is great in theory but in practice it'll restart your pods at the worst possible times. Start with recommendation mode unless you enjoy explaining outages. Manual right-sizing works for getting started, but VPA keeps things optimal as your traffic patterns change.

Install VPA (Production-Ready Configuration)

## Install VPA with proper resource settings (requires Kubernetes 1.25+)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

## Check compatibility - VPA v1.2.0+ works with Kubernetes 1.28+
./hack/vpa-up.sh

## Or use the production-ready manifests:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/vpa-rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/recommender-deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/updater-deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/admission-controller-deployment.yaml

Configure VPA for Each Application

## vpa-web-app.yaml - Start with recommendation-only mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"  # Start with recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 50m
        memory: 50Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

VPA Monitoring and Graduated Rollout

## Check VPA recommendations vs current settings
kubectl describe vpa web-app-vpa

## Graduate from recommendations to automatic updates
## After 2 weeks of stable recommendations:
kubectl patch vpa web-app-vpa -p '{"spec":{"updatePolicy":{"updateMode":"Auto"}}}'

## Monitor VPA-initiated pod restarts
kubectl get events --field-selector reason=EvictedByVPA --all-namespaces

Step 2: Set Up Cost Monitoring and Alerting

Kubernetes Cost Alerts Setup

Automated cost monitoring catches problems before they hit your monthly bill.

Prometheus-Based Cost Alerts

groups:
- name: kubernetes-cost-optimization
  rules:
  # Alert on 30% cost increase week-over-week
  - alert: KubernetesCostSpike
    expr: increase(kubecost_cluster_cost_total[7d]) > increase(kubecost_cluster_cost_total[7d] offset 7d) * 1.3
    for: 2h
    labels:
      severity: warning
      team: platform
    annotations:
      summary: "Kubernetes cluster costs increased >30% vs last week"
      description: "Weekly cost: ${{ $value | humanize }}, up from ${{ query \"increase(kubecost_cluster_cost_total[7d] offset 7d)\" | first | value | humanize }} last week"
      
  # Alert on individual pod cost spikes  
  - alert: ExpensivePodDetected
    expr: kubecost_pod_cpu_cost_total + kubecost_pod_memory_cost_total > 200
    for: 1h
    labels:
      severity: info
      team: platform
    annotations:
      summary: "Pod {{ $labels.pod }} in {{ $labels.namespace }} costs >$200/month"
      description: "Consider right-sizing or moving to spot instances"
      
  # Alert on wasted resources
  - alert: UnderutilizedResources  
    expr: avg_over_time(kubecost_cluster_cpu_utilization[24h]) < 0.3
    for: 4h
    labels:
      severity: info
    annotations:
      summary: "Cluster CPU utilization <30% for 24+ hours"
      description: "Consider scaling down or using smaller instance types"

Slack Integration for Cost Alerts

## alertmanager.yml
global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'cost-optimization'

receivers:
- name: 'cost-optimization'
  slack_configs:
  - channel: '#platform-costs'
    username: 'cost-bot'
    title: 'Kubernetes Cost Alert'
    text: |
      *Alert:* {{ .GroupLabels.alertname }}
      *Severity:* {{ .CommonLabels.severity }}
      *Description:* {{ .CommonAnnotations.description }}
      *Runbook:* <https://wiki.company.com/k8s-cost-optimization|Cost Optimization Guide>

Step 3: Automated Development Environment Management

Dev environments are the biggest source of cost creep. Automate their lifecycle.

Time-Based Environment Shutdown

## dev-environment-scheduler.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: shutdown-dev-environments
  namespace: platform-automation
spec:
  schedule: "0 19 * * 1-5"  # 7 PM Monday-Friday
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: environment-manager
          containers:
          - name: shutdown
            image: bitnami/kubectl:latest
            command: ["/bin/sh"]
            args:
            - -c
            - |
              # Shut down development namespaces
              for ns in dev-feature-* staging-* qa-*;
                if kubectl get namespace "$ns" 2>/dev/null; then
                  echo "Shutting down namespace: $ns"
                  kubectl scale deployment --all --replicas=0 -n "$ns"
                  kubectl annotate namespace "$ns" shutdown-time=$(date -Iseconds)
                fi
              done
          restartPolicy: OnFailure
---
## Startup job for business hours
apiVersion: batch/v1
kind: CronJob  
metadata:
  name: startup-dev-environments
  namespace: platform-automation
spec:
  schedule: "0 8 * * 1-5"  # 8 AM Monday-Friday
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: environment-manager
          containers:
          - name: startup
            image: bitnami/kubectl:latest
            command: ["/bin/sh"]
            args:
            - -c
            - |
              # Start up development namespaces
              for ns in dev-feature-* staging-* qa-*;
                if kubectl get namespace "$ns" 2>/dev/null; then
                  echo "Starting up namespace: $ns"
                  # Restore from backup or default replica counts
                  kubectl scale deployment --all --replicas=1 -n "$ns" || true
                  kubectl annotate namespace "$ns" startup-time=$(date -Iseconds)
                fi
              done
          restartPolicy: OnFailure

Idle Environment Detection and Cleanup

#!/bin/bash
## idle-environment-cleanup.sh - Run weekly to find unused environments

IDLE_THRESHOLD_DAYS=7
CURRENT_DATE=$(date +%s)

echo "Scanning for idle development environments..."

for namespace in $(kubectl get namespaces -o name | grep -E "(dev-|feature-|staging-)" | cut -d'/' -f2); do
    # Check last deployment activity
    LAST_ACTIVITY=$(kubectl get events -n "$namespace" --sort-by='.lastTimestamp' -o jsonpath='{.items[-1:].lastTimestamp}' 2>/dev/null)
    
    if [[ -n "$LAST_ACTIVITY" ]]; then
        LAST_ACTIVITY_TS=$(date -d "$LAST_ACTIVITY" +%s)
        DAYS_IDLE=$(( ($CURRENT_DATE - $LAST_ACTIVITY_TS) / 86400 ))
        
        if [[ $DAYS_IDLE -gt $IDLE_THRESHOLD_DAYS ]]; then
            echo "⚠️  Namespace $namespace has been idle for $DAYS_IDLE days"
            
            # Get cost information
            POD_COUNT=$(kubectl get pods -n "$namespace" --no-headers | wc -l)
            
            echo "   Pods: $POD_COUNT"
            echo "   Last activity: $LAST_ACTIVITY"
            echo "   Suggested action: Delete or hibernate namespace"
            
            # Optional: Auto-delete very old environments
            if [[ $DAYS_IDLE -gt 30 ]]; then
                echo "   🗑️  Auto-deleting namespace older than 30 days"
                kubectl delete namespace "$namespace" --wait=false
            fi
        fi
    fi
done

Step 4: Implement Resource Policy Enforcement

Prevent future waste with admission controllers and resource quotas.

OPA Gatekeeper Policies for Cost Control

## resource-limits-policy.yaml
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredresources
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredResources
      validation:
        properties:
          maxCpu:
            type: string
          maxMemory:
            type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredresources
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.requests.cpu
          msg := "CPU request is required"
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.requests.memory
          msg := "Memory request is required"
        }
        
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          cpu_req := container.resources.requests.cpu
          cpu_req_numeric := units.parse_bytes(cpu_req)
          max_cpu_numeric := units.parse_bytes(input.parameters.maxCpu)
          cpu_req_numeric > max_cpu_numeric
          msg := sprintf("CPU request %v exceeds maximum %v", [cpu_req, input.parameters.maxCpu])
        }
---
apiVersion: config.gatekeeper.sh/v1alpha1
kind: K8sRequiredResources
metadata:
  name: must-have-resources
spec:
  match:
    - apiGroups: ["apps"]
      kinds: ["Deployment", "StatefulSet"]
      namespaces: ["dev-*", "staging-*", "production"]
  parameters:
    maxCpu: "2000m"
    maxMemory: "4Gi"

Namespace Resource Quotas with Cost Limits

## namespace-quotas.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-team-quota
  namespace: dev-team-1
spec:
  hard:
    # Compute limits to control costs
    requests.cpu: "20"      # ~$400/month max
    requests.memory: "40Gi" # ~$200/month max
    limits.cpu: "40"
    limits.memory: "80Gi"
    
    # Storage limits  
    requests.storage: "100Gi"
    persistentvolumeclaims: "10"
    
    # Network resources
    services.loadbalancers: "2"  # $40/month max
    
    # Object count limits
    pods: "50"
    replicationcontrollers: "20"
    secrets: "20"
    configmaps: "20"
---
## Cost-based quota (requires KubeCost integration)
apiVersion: v1
kind: ConfigMap
metadata:
  name: cost-quotas
  namespace: platform-automation
data:
  quotas.yaml: |
    dev-team-1: $500
    dev-team-2: $500
    staging: $1000
    qa: $300
    production: $10000  # No limit on prod

Step 5: Advanced Cost Optimization Automation

Deploy tools that continuously optimize without human intervention.

Automated Spot Instance Management with Karpenter

## advanced-karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: cost-optimized
spec:
  template:
    metadata:
      labels:
        node-type: "cost-optimized"
    spec:
      # Aggressive cost optimization settings
      requirements:
      - key: "karpenter.sh/capacity-type"
        operator: In
        values: ["spot"]  # Spot only for maximum savings
      - key: "node.kubernetes.io/instance-type"
        operator: In
        # Cheaper, less popular instance types
        values: ["m5.large", "m5a.large", "m4.large", "t3.large", "t3a.large"]
      - key: "kubernetes.io/arch"  
        operator: In
        values: ["amd64"]
        
      nodeClassRef:
        name: cost-optimized
      taints:
      - key: "spot-only"
        value: "true"
        effect: "NoSchedule"
        
  # Aggressive cost optimization policies  
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 15s  # Very aggressive consolidation
    expireAfter: 1440h     # 60 days (shorter lifecycle)
    
  limits:
    cpu: 500   # Smaller maximum to encourage right-sizing
    memory: 1000Gi

Automated Resource Optimization Reports

#!/bin/bash
## weekly-cost-report.sh - Automated weekly optimization report

REPORT_FILE="/tmp/k8s-cost-report-$(date +%Y%m%d).md"

cat > "$REPORT_FILE" << 'EOF'
## Kubernetes Cost Optimization Report
Generated: $(date)

### Summary
EOF

## Get cost data from KubeCost API
TOTAL_COST=$(curl -s "http://kubecost-cost-analyzer.kubecost:9090/model/allocation?window=7d" | jq '.data[0].totalCost')

cat >> "$REPORT_FILE" << EOF
- Weekly cluster cost: $$(echo "$TOTAL_COST" | cut -d. -f1)
- Change from last week: $(curl -s "http://kubecost-cost-analyzer.kubecost:9090/model/allocation?window=7d,7d" | jq -r '.data[0].totalCost - .data[1].totalCost | if . > 0 then "+$\(.)" else "-$\( -1 * .)" end')

### Top Cost Contributors
EOF

## Get top expensive pods
kubectl get pods --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,CPU:.spec.containers[0].resources.requests.cpu,MEMORY:.spec.containers[0].resources.requests.memory" | sort -k3 -n | tail -10 >> "$REPORT_FILE"

cat >> "$REPORT_FILE" << 'EOF'

### Optimization Opportunities  
EOF

## Find pods without resource limits
UNLIMITED_PODS=$(kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.containers[0].resources.limits == null) | .metadata.namespace + "/" + .metadata.name' | wc -l)

cat >> "$REPORT_FILE" << EOF
- Pods without resource limits: $UNLIMITED_PODS
- Recommended action: Add resource limits to prevent resource hogging

EOF

## Send report to Slack
curl -X POST -H 'Content-type: application/json' \
  --data "{\"text\":\"Weekly K8s Cost Report Available\", \"attachments\":[{\"text\":\"\`\`\`$(cat $REPORT_FILE)\`\`\`\"}]}" \
  "$SLACK_WEBHOOK_URL"

echo "Cost report generated: $REPORT_FILE"

What Happens After the Initial Chaos

Month 1-3: Stabilization

Automated systems catch 90% of cost drift
VPA recommendations become more accurate with historical data
Development teams adapt to new resource constraints

Month 4-6: Optimization

Continuous right-sizing maintains optimal resource allocation
Spot instance interruption handling becomes seamless
Cost monitoring prevents surprise bills

Month 7-12: Maturity

Platform team spends <2 hours/week on cost management
New applications automatically follow cost-optimized patterns
Cost per unit of business value continues decreasing

Sustainable Cost Reduction:

Initial optimization: 50-70% cost reduction
Ongoing automation: Maintains savings + 5-10% annual improvement
Risk reduction: No surprise bills, predictable cost growth

Success Metrics to Track:

Monthly cost trend (should be flat or declining relative to usage)
Resource utilization (should stay 60-80% for CPU, 70-85% for memory)
Application performance (should remain stable or improve)
Engineering time spent on cost management (should decrease to <5% of platform team time)

Automation and Monitoring Resources:

Vertical Pod Autoscaler Documentation - Automated resource right-sizing
Horizontal Pod Autoscaler Guide - Traffic-based scaling
Cluster Autoscaler Configuration - Node-level autoscaling
KEDA Event-Driven Autoscaling - Queue and metric-based scaling
Prometheus Alerting Rules - Cost monitoring alerts
AlertManager Configuration - Notification routing
Grafana Dashboard Templates - Cost visualization
Kubernetes Cronjob Documentation - Scheduled optimizations
OPA Gatekeeper Policies - Resource limit enforcement
Resource Quotas and LimitRanges - Namespace-level controls
KubeCost Automated Actions - Automated optimization
Falco Runtime Security - Detecting resource abuse

This automation framework ensures your cost optimizations continue working long-term without constant manual intervention, while catching new sources of waste before they impact your budget.

Control Your Kubernetes Costs with KubeCost | Track, Forecast, and Optimize K8s by Akamai Developer

## The Only KubeCost Video That Isn't Complete Garbage

Found this after watching like 20 other KubeCost tutorials that were either outdated or basically marketing bullshit. This guy actually shows you real production clusters, not some clean demo environment.

Video: Control Your Kubernetes Costs with KubeCost
Duration: 45 minutes (skip to 8:30 if you already know what KubeCost is)

Watch: Control Your Kubernetes Costs with KubeCost

What's actually useful in this video:
- He installs KubeCost on a real cluster that's already fucked up (not a pristine demo)
- Shows the UI when it's displaying $47k/month in costs (my kind of mess)
- Around 22:00 he finds a StatefulSet that's been burning $400/day for 3 months because nobody noticed
- The Prometheus integration actually fails first try (finally, some reality)

The part that saved me 4 hours: At 31:45 he shows how to fix the "no cost data" issue when your cluster-autoscaler keeps recreating nodes. Turns out you need to set --persistent-volume-claim-gc or historical data gets fucked.

Skip this if: You want perfect step-by-step instructions. This is more "here's how I debugged cost monitoring on a real cluster" than "here's the ideal setup process."

📺 YouTube

Tools I Actually Use (No Bullshit List)

15%

news

Recommended

Goldman Sachs: AI Will Break the Power Grid (And They're Probably Right)

Investment bank warns electricity demand could triple while tech bros pretend everything's fine

/news/2025-09-03/goldman-ai-boom

15%

news

Recommended

Google Hit with €2.95 Billion EU Fine for Antitrust Violations

European Commission penalizes Google's adtech monopoly practices in landmark ruling

OpenAI/ChatGPT

/news/2025-09-05/google-eu-antitrust-fine

15%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation