Why are my pods stuck pending for 10 minutes?

The autoscaler only scales up when pods are `Pending` because there's literally no node that can fit them. It doesn't scale preemptively. So your [resource requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) better be accurate, or you'll wait forever. It checks every [10 seconds](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-up-work), runs a simulation to see which [node group](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-nodegroups) fits, then starts the slow dance of talking to cloud APIs.

Why won't it kill these expensive empty nodes?

It waits [10 minutes by default](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca) after scale-up before considering scale-down. Then it checks if node utilization is below [50% or so](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-nodes-with-low-utilization-but-they-are-not-scaled-down-why) and whether it can [safely drain](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) all pods. If you have [DaemonSets](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/), [local storage](https://kubernetes.io/docs/concepts/storage/volumes/#local), or restrictive [PodDisruptionBudgets](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/), those nodes are immortal. Also, nodes with the annotation `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` won't die no matter what.

Will spot instances save me money or ruin my weekend?

Both! The autoscaler works with [spot instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) but it's not smart about failover. You need separate node groups for spot and on-demand, and it won't automatically switch when [spot capacity is unavailable](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#spot-fleet-and-capacity). When AWS yanks your spot instances with [2 minutes notice](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html), your pods get stuck pending until the autoscaler figures out it needs to scale up the on-demand node group. Great for saving money, terrible for reliability.Image: ![AWS EKS with Spot Instances](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2018/08/30/EKS-Spot-Blog.png)

Why did it scale 10 c5.large when I needed 1 c5.4xlarge?

The autoscaler assumes all instances in a [node group](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html) are identical for scheduling. With [mixed instance policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-purchase-options.html), it uses the first instance type for its simulation. So if your policy lists `c5.large, c5.xlarge, c5.4xlarge`, it thinks everything is a c5.large and scales accordingly. Result: your 16GB pod gets scheduled on 10 nodes with 8GB each. This garbage is why [instance type diversity](https://aws.amazon.com/blogs/compute/diversifying-your-spot-fleet-across-aws-compute-services/) is often more trouble than it's worth.

What are the things that will make me hate this tool?

You have to pre-define every possible node configuration - it can't just [auto-provision](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview) the perfect instance. It scales on [resource requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/), not actual usage, so badly configured requests will screw you over financially. [On-premises support](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider) is limited and painful. It can only use one node group per pod requirement, so complex workloads requiring multiple resource types are screwed. And if your pods actually use 4GB but request 1GB, you'll get mysterious OOMKilled errors on over-provisioned nodes.

How does it compare to cloud-native autoscaling?

Cloud providers have their own autoscaling (EKS Node Groups, GKE Autopilot, AKS Virtual Nodes) but they're usually more limited. Cluster Autoscaler gives you more control and works the same way across different clouds, which is useful if you're not married to one vendor.

What's the recommended configuration?

Don't create 50 tiny node groups - use a few big ones. Make sure all instances in a group have the same specs or the simulator gets confused. Enable auto-discovery so you don't have to manually configure every group. And yeah, use HPA too if you don't want your scaling to completely suck.

Can it scale to zero nodes?

Nope. You need at least one node running because the autoscaler itself needs somewhere to live. If you want to shut down the whole cluster, you need external tooling.

How does priority affect scaling decisions?

High-priority pods jump the queue. If multiple pods are stuck pending, it scales for the important shit first. Set [priority classes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) on your critical workloads so they don't get starved during resource crunches.

Why does it ignore my expensive nodes when scaling down?

The autoscaler prioritizes minimizing disruption over saving money. A node with 10 small pods gets scaled down before a node with 1 large pod, even if the expensive node is mostly empty. You can influence this with [expander priorities](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders) but it's still not cost-optimal. This is where Karpenter's bin-packing shines.

How do I handle workloads that need mixed architectures?

ARM and x86 nodes require separate node groups because the autoscaler can't magically change CPU architectures. Use [node affinity rules](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity) and [taints/tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to keep workloads on the right architecture. Multi-arch clusters are possible but significantly more complex to manage.

What metrics should I monitor?

Watch for scaling failures, long pending pod times, and stupid expensive node utilization. Set up [Prometheus alerts](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md) for when the autoscaler stops working (not if, when). Monitor your AWS costs because misconfigured resource requests will bankrupt you. Key metrics: `cluster_autoscaler_cluster_safe_to_autoscale`, `cluster_autoscaler_scale_up_total`, `cluster_autoscaler_failed_scale_ups_total`.When this breaks in ways I haven't covered (and it will), here's where to find help that doesn't suck.

Currently viewing the AI version

Switch to human version

Kubernetes Cluster Autoscaler: AI-Optimized Technical Reference

Overview

Kubernetes Cluster Autoscaler automatically adjusts cluster node count based on workload demands. Critical limitation: scales on resource requests, not actual usage - misconfigured requests cause financial waste and scaling failures.

Production Configuration

Version Requirements

Production version: 1.32.x (late 2025)
Avoid: Bleeding edge versions (causes 3am debugging sessions)
Key improvements: DRA support, parallelized cluster snapshots, least-waste expander default

Resource Requirements (Autoscaler Pod)

Small clusters (<100 nodes): 300MB memory minimum
Large clusters (1000+ nodes): 1GB+ memory minimum
Architecture limitation: Single replica only, not horizontally scalable
Failure mode: If autoscaler pod crashes during traffic spike, cluster stops scaling

Critical Scaling Timelines

Cloud Provider	Marketing Claims	Production Reality	Failure Modes
AWS	2-5 minutes	12+ minutes during peak	API rate limits (5 req/sec), service quotas
GCP	2-4 minutes	Usually accurate	Silent quota failures
Azure	5-15 minutes	Completely unpredictable	VM Scale Set delays
Scale-down	"Immediate"	30+ minutes	Paranoid safety checks

Breaking Points and Failure Modes

Resource Request Misconfiguration

Pod requests 4 CPU, uses 200m: Triggers massive scale-up
Pod requests 1GB, uses 4GB: OOMKilled on over-provisioned nodes
Impact severity: Financial waste + application failures
Detection: Monitor actual vs requested resource utilization

API Rate Limiting

AWS limit: 5 requests/second to Auto Scaling Groups
Failure scenario: During Black Friday traffic spikes, scaling stops silently
No warning indicators: Just stops working without alerts
Mitigation: Implement external monitoring of scaling operations

Spot Instance Interruptions

Warning time: 2 minutes (insufficient for graceful draining)
Common failure: Pods stuck pending while autoscaler attempts to replace non-existent nodes
Required tooling: AWS Node Termination Handler or equivalent
Business impact: Service degradation during cost optimization attempts

Node Group Configuration Hell

Mixed instance policies: Autoscaler uses first instance type for simulation
Example failure: Policy with c5.large, c5.xlarge, c5.4xlarge assumes all are c5.large
Result: 16GB pod scheduled across 10 nodes with 8GB each
Operational rule: Instance type diversity often causes more problems than benefits

Implementation Requirements

Pre-requisites

Node groups: Must pre-configure every possible instance type combination
Cannot auto-provision: No dynamic instance type selection
Cloud provider constructs:
- AWS: Auto Scaling Groups or EKS managed node groups
- GCP: Instance Groups or GKE node pools
- Azure: VM Scale Sets or AKS node pools

Critical Configuration Settings

# Essential flags that prevent 3am incidents
--scale-down-delay-after-add=10m     # Default, increase for stability
--scale-down-unneeded-time=10m       # How long before considering scale-down
--skip-nodes-with-local-storage=true # Prevents data loss
--skip-nodes-with-system-pods=false  # Allow DaemonSet nodes to scale down

Node Protection Mechanisms

Annotation: cluster-autoscaler.kubernetes.io/scale-down-disabled=true makes nodes immortal
DaemonSets: Prevent node termination without proper tolerations
Local storage: Blocks scale-down permanently
PodDisruptionBudgets: Can prevent all scale-down operations

Comparison Matrix: Scaling Solutions

Capability	Cluster Autoscaler	Karpenter	HPA	VPA
Node provisioning speed	2-12 minutes	30-60 seconds	N/A	N/A
Pre-configuration required	Yes (node groups)	No (auto-provisions)	N/A	N/A
Production readiness	High (5+ years)	High (AWS), Medium (others)	High	Medium
Single point of failure	Yes	No (multiple replicas)	No	No
Spot instance optimization	Manual configuration	Automatic	N/A	N/A
Cost optimization	Basic	Advanced bin-packing	N/A	Right-sizing

Operational Intelligence

When Cluster Autoscaler is Worth the Pain

Multi-cloud deployments: Same behavior across AWS/GCP/Azure
Regulatory compliance: Need predictable, auditable scaling behavior
Existing infrastructure: Already have node group configurations
Conservative scaling: Prefer stability over speed

When to Choose Alternatives

AWS-only deployments: Karpenter provides 10x faster provisioning
Cost optimization priority: Karpenter's bin-packing saves 20-40% on compute
Dynamic workloads: Need automatic instance type selection
High-frequency scaling: Sub-minute response requirements

Common Misconceptions

"It scales based on actual usage": FALSE - scales on resource requests only
"Works out of the box": FALSE - requires extensive node group pre-configuration
"Saves money automatically": FALSE - saves money only with correct resource requests
"Handles spot instances intelligently": FALSE - basic support, no intelligent failover

Critical Monitoring Requirements

# Prometheus alerts for production
cluster_autoscaler_cluster_safe_to_autoscale: false  # Scaling is broken
cluster_autoscaler_failed_scale_ups_total: >0       # Scale-up failures
cluster_autoscaler_nodes_count: variance >20%       # Unexpected scaling

Resource Investment Required

Initial setup: 1-2 weeks for proper node group configuration
Ongoing maintenance: 2-4 hours/month troubleshooting scaling issues
Expertise required: Deep understanding of Kubernetes scheduling and cloud provider APIs
Hidden costs: Over-provisioning due to conservative defaults, spot instance management complexity

Decision Criteria

Use Cluster Autoscaler when:

Multi-cloud strategy is essential
Existing node group infrastructure
Stability trumps speed
Team has Kubernetes scheduling expertise

Choose Karpenter when:

AWS-only deployment
Cost optimization is priority
Need sub-minute scaling
Dynamic workload requirements

Avoid both when:

Predictable workloads (static provisioning cheaper)
Extremely cost-sensitive (manual scaling with monitoring)
Compliance requires manual approval for infrastructure changes

Useful Links for Further Investigation

Resources That Don't Suck

Link	Description
Cluster Autoscaler GitHub	The source code. Read the issues to see what's actually broken.
FAQ	This answers 90% of your questions. Read it before asking on Stack Overflow.
AWS Setup Guide	Decent guide, ignore their "best practices" - half of them break in production.
GKE Docs	Google's version works better but has different gotchas.
Azure AKS	Good luck, Azure networking is special.
DigitalOcean DOKS	Simple setup, limited features.
Helm Chart	Use this instead of raw YAML unless you enjoy pain.
Command Line Flags	The docs won't tell you which ones actually matter.
Prometheus Metrics	Set up alerts for when scaling stops working.
Troubleshooting Guide	You'll need this at 3am.
Common Issues	GitHub issues marked critical - these are the real problems.
Spot Instance Hell	Why your scaling fails when AWS yanks your cheap nodes.

Kubernetes Cluster Autoscaler: AI-Optimized Technical Reference

Overview

Production Configuration

Version Requirements

Resource Requirements (Autoscaler Pod)

Critical Scaling Timelines

Breaking Points and Failure Modes

Resource Request Misconfiguration

API Rate Limiting

Spot Instance Interruptions

Node Group Configuration Hell

Implementation Requirements

Pre-requisites

Critical Configuration Settings

Node Protection Mechanisms

Comparison Matrix: Scaling Solutions

Operational Intelligence

When Cluster Autoscaler is Worth the Pain

When to Choose Alternatives

Common Misconceptions

Critical Monitoring Requirements

Resource Investment Required

Decision Criteria

Useful Links for Further Investigation

Resources That Don't Suck

Related Tools & Recommendations

Anthropic Raises $13B at $183B Valuation: AI Bubble Peak or Actual Revenue?

Docker Desktop Hit by Critical Container Escape Vulnerability

Yarn Package Manager - npm's Faster Cousin

PostgreSQL Alternatives: Escape Your Production Nightmare

AWS RDS Blue/Green Deployments - Zero-Downtime Database Updates

Three Stories That Pissed Me Off Today

Aider - Terminal AI That Actually Works

jQuery - The Library That Won't Die

vtenext CRM Allows Unauthenticated Remote Code Execution

Django Production Deployment - Enterprise-Ready Guide for 2025

HeidiSQL - Database Tool That Actually Works

Fix Redis "ERR max number of clients reached" - Solutions That Actually Work

QuickNode - Blockchain Nodes So You Don't Have To

Get Alpaca Market Data Without the Connection Constantly Dying on You

OpenAI Alternatives That Won't Bankrupt You

Migrate JavaScript to TypeScript Without Losing Your Mind

Docker Compose 2.39.2 and Buildx 0.27.0 Released with Major Updates

Google Vertex AI - Google's Answer to AWS SageMaker

Google NotebookLM Goes Global: Video Overviews in 80+ Languages

Figma Gets Lukewarm Wall Street Reception Despite AI Potential - August 25, 2025