Fly.io Performance Optimization - CPU Quotas, Throttling, and Scaling Guide

The January 2025 CPU Quota Crisis: What Changed and Why It Matters

On January 14, 2025, Fly.io fully enabled CPU quota enforcement changes that have fundamentally altered how applications perform on their platform. What used to be 30-second deploys now take 8+ minutes for Django, Rails, and Node.js applications, catching developers completely off-guard with zero email notification about the breaking change.

Understanding the New CPU Throttling System

The core change is brutal in its simplicity: shared vCPUs are now limited to 1/16th of a CPU core (6.25% baseline), while performance vCPUs get 100% of a dedicated core. This means a shared-cpu-1x machine that previously had access to burst CPU power can now only sustain 62.5 milliseconds of CPU time per second.

Fly.io CPU Throttling Graph - What Throttling Looks Like

The throttling system works on 80ms cycles, where each cycle grants you 5ms of CPU time on a shared-cpu-1x. Any time you don't use gets banked as "burst balance" - but with a low accrual rate that makes startup-heavy applications suffer dramatically.

The Real-World Impact

Deploy Time Explosions: Applications that previously deployed in 30 seconds now take 8-20 minutes to boot because startup processes get throttled immediately. Django apps loading models, Rails apps precompiling assets, and Node.js apps warming up V8 all hit this wall hard.

Production Outages: Multiple users reported immediate production outages when the enforcement went live. Rolling deployments failed because new instances couldn't pass health checks within reasonable timeframes.

Scaling Confusion: A shared-cpu-8x machine still gets the same 6.25% baseline as shared-cpu-1x, making the scaling meaningless for CPU-bound workloads. Users found themselves forced to upgrade to performance instances just to deploy successfully.

Why Fly.io Made This Change

The Predictable Processor Performance initiative aimed to address "noisy neighbor" problems where some applications consumed excessive CPU, affecting others on the same hardware. The change aligns with industry standards - AWS Lambda has similar throttling, Google Cloud Run limits concurrent requests, and most cloud providers limit shared resources.

However, Fly.io completely botched this rollout. Despite their bullshit claims that "a tiny fraction of organizations" would be affected, the change broke apps everywhere. My production Rails app shit the bed middle of the day with zero warning - I found out when users started complaining about 10-second page loads, not from any email from Fly.io.

Fly.io Global Regions - Where Your App Gets Throttled

Zero fucking email notifications about a change this massive violates every change management practice that real platform providers follow.

The Burst Balance System

Fly.io provides a "burst balance" mechanism to soften the throttling impact. Unused CPU time accumulates and can be spent in bursts, but the math is unforgiving:

Idle Accumulation: A completely idle shared-cpu-1x accrues only 3.75 minutes of burst balance per hour, similar to AWS EC2 burst credits but with much lower accrual rates
Startup Penalty: New machines start with just 5 seconds of burst balance, insufficient for most application startup sequences and unlike Google Cloud Run's generous startup allowances
Reset on Restart: Restarting a machine after startup mysteriously works better - took me 3 hours of debugging to figure this shit out, but restart the machine and suddenly it's fast again

Performance vs. Shared CPU Economics

So now you're stuck choosing between broken or expensive:

shared-cpu-1x: $2/month but doesn't work
performance-1x: $7/month but actually functions

shared-cpu-1x costs $2/month but doesn't work, performance-1x costs $7/month but actually functions. Do the math - about 4x more expensive for basic functionality that worked fine before their throttling disaster. AWS and Google do similar throttling, but they don't spring it on you without warning like Fly.io did.

Critical Performance Questions: CPU Quotas and Optimization

Why did my deploy time jump from 30 seconds to 8+ minutes?

The January 2025 CPU quota enforcement throttles shared v

CPUs to 1/16th of a core (6.25% baseline).

Your application's startup process

Django loading models, Rails precompiling assets, Node.js warming V8
now gets throttled during the most CPU-intensive phase. New machines start with only 5 seconds of burst balance, nowhere near enough for typical startup sequences. My Rails app went from like 45 seconds to deploy to over 10 minutes overnight.

Should I upgrade all my shared CPUs to performance CPUs?

**For anything you actually care about:

Yes, because shared CPUs are now basically useless**. The math sucks but it's reality

shared-cpu-1x at $0.0027/hour takes 8+ minutes to deploy a simple Django app, while performance-1x at $0.01/hour actually fucking works. $7/month beats explaining to your users why your staging environment is down for 10 minutes during a "quick" deploy.

For hobby projects that you check once a month, shared CPUs might work with blue-green deployments (if you don't use volumes).

Will scaling to shared-cpu-8x help with performance?

No, surprisingly not. The baseline CPU quota remains 6.25% regardless of the number of shared vCPUs. Users reported identical throttling on shared-cpu-8x as shared-cpu-1x. The additional vCPUs only help if your application can effectively use parallel processing while staying within the collective quota limits.

Why does restarting my machine after deployment fix the slowness?

This is a documented workaround that proves their throttling system is buggy as hell.

I spent 3 hours debugging why our Rails 7.1.2 app was throwing `ActionController::Unknown

Format` errors and responding in 2+ seconds after deployment, then some random person on Discord mentioned restarting fixes it. Sure enough

restart the machine and suddenly it's fast again. The burst balance clearly doesn't reset properly during deployment, but Fly.io won't admit their system is broken so they call it a "feature."

How can I monitor CPU throttling on my applications?

Check the Fly.io Metrics dashboard for these key indicators:

fly_instance_cpu_throttle - Shows throttling time in centiseconds
fly_instance_cpu_balance - Your current burst balance
fly_instance_cpu_baseline - Your baseline quota (should be 0.0625 for shared CPUs)

You can also access these via the Prometheus API or managed Grafana instance.

Can I optimize my application to work better with CPU quotas?

Startup optimization strategies:

Lazy load heavy dependencies instead of loading everything at boot
Pre-build assets in your Docker image rather than at runtime
Use blue-green deployments to avoid health check timeouts (requires giving up volumes)
Consider splitting CPU-heavy initialization into background jobs

Runtime optimization:

Profile your application to identify CPU hotspots during normal operation
Implement efficient caching to reduce CPU-intensive operations
Use asynchronous processing for heavy workloads

What about memory optimization - does that help with CPU throttling?

Memory won't fix CPU throttling, but optimizing both together improves overall performance and cost efficiency. High memory usage can trigger swapping, which increases CPU usage and makes throttling worse. Monitor fly_instance_memory_swap_free and ensure your application stays within allocated RAM limits.

Is there any way to get more burst balance for startup?

Currently, Fly.io provides 5 seconds of initial burst balance, and they've indicated this might be adjusted based on feedback. The accrual rate is tied to your CPU quota

idle time below your baseline accumulates as burst balance, but at the low 6.25% baseline, accumulation is painfully slow.

Should I migrate away from Fly.io because of these changes?

If you're running anything important, fuck yes. I moved two production apps to Railway after this disaster

no CPU throttling bullshit and the pricing is actually honest. Fly.io's global edge stuff is nice, but not when your app can't deploy reliably and you're paying 4x more just to have basic functionality.

Will Fly.io add intermediate CPU tiers between shared and performance?

Fly.io has indicated they're "quite likely" to add other vCPU options in the future, potentially offering something between the current 6.25% and 100% allocations. However, no timeline has been provided, and developers need solutions today, not promises for future improvements.

Practical Performance Optimization Strategies for Fly.io

The CPU quota disaster broke every optimization strategy that actually worked before January 2025. I spent 6 hours trying every Docker trick in the book and still got 6-minute deploy times for a basic Node.js app. Here's what actually works now that Fly.io throttled shared CPUs into oblivion.

Docker Tricks That Actually Help Now

I learned this the hard way - put all your heavy shit in the Dockerfile, don't run it at startup where Fly.io will throttle you to death. Pre-compile everything during the build phase.

Docker Multi-stage Build Optimization

## Good: Pre-build assets during image creation
RUN npm run build:production
RUN bundle exec rails assets:precompile

## Bad: These run every time the container starts
CMD [\"sh\", \"-c\", \"npm run build && rails s\"]

Lazy loading saved my ass - don't load everything at startup. Our Django app was loading dozens of models at boot and it would timeout. Now I lazy-load models only when needed and startup is actually reasonable.

Background Processing: Move CPU-intensive startup tasks to background jobs that can run after the health checks pass. I use Sidekiq for Ruby apps, Celery for Python, and Bull for Node.js to offload the heavy shit from startup.

Memory Tricks That Actually Help

Garbage Collection Tuning: Language-specific garbage collection can trigger CPU spikes that push you over quota limits. Configure GC settings for your runtime:

Node.js: Use `--max-old-space-size` to control V8 heap limits and reduce GC pressure
I spent a weekend tuning Ruby GC settings and MALLOC_ARENA_MAX=2 made the biggest difference - learned that the hard way after Rails kept timing out on startup. For Python, disabling GC during Django startup actually helps with the throttling.

Memory Profiling: High memory usage triggers swapping, which increases CPU consumption and makes throttling worse. Monitor these Fly.io metrics:

`fly_instance_memory_swap_free` - Should stay close to total swap
`fly_instance_memory_mem_available` - Should remain above 20% of total RAM
`fly_instance_memory_cached` - Higher is better for file-heavy applications

Use tools like Valgrind for C/C++, memory_profiler for Python, or Node.js built-in heap profiling to identify memory bottlenecks that contribute to CPU throttling.

How to Size Your Instances Without Going Broke

The new economics make instance sizing critical. Here's what I figured out after burning through my budget testing this shit:

Hobby projects: shared-cpu-1x with autostop enabled, accept slow deploys
Development/staging: shared-cpu-2x or shared-cpu-4x for slightly better startup, but consider blue-green deploys
Production: performance-1x minimum for reliable deployment and operation
High-traffic production: performance-2x or higher based on actual load testing

Regional Distribution: Spread instances across regions to reduce individual instance load. Fly.io's WireGuard mesh networking makes this operationally simple, and regional distribution can improve both performance and reliability.

Autoscaling Configuration: Set up autoscaling based on CPU metrics rather than just concurrent connections. Monitor fly_instance_cpu metrics and scale before hitting throttling limits.

Deployment Strategy Optimization

Blue-Green Deployments: If you don't use volumes, blue-green deployment strategy works great until you realize you lose persistent storage - which rules out most real applications. I tried this for our Rails app and it worked beautifully until we needed file uploads and realized we'd have to rebuild our entire storage architecture. Also breaks session storage if you're not using Redis or some external store.

[build]
  strategy = \"bluegreen\"

[http_service]
  processes = [\"web\"]
  internal_port = 8080
  force_https = true

  # Give instances more time to warm up
  [http_service.http_options]
    grace_period = \"30s\"

Rolling Deployment Tuning: If you must use rolling deployments (volumes, stateful applications), adjust these settings:

Increase health check timeouts to accommodate slower startups
Reduce the number of concurrent replacements during deployment
Consider using fly deploy --strategy immediate for faster rollbacks

Monitoring and Alerting Setup

The metrics that actually matter (learned this the expensive way):

fly_instance_cpu_throttle - Throttling time indicates performance issues
fly_instance_cpu_balance - Low balance predicts future throttling
fly_app_http_response_time_seconds - End-user impact measurement
fly_edge_http_response_time_seconds - Global performance perspective

Set up alerts for:

CPU throttling over 10% of total time
Burst balance below 60 seconds
Response times above your SLA thresholds
Deploy times exceeding expected duration.

Framework-specific bullshit I had to figure out (after 2 hours of prod downtime):

Django Applications:

Use --lazy-apps flag with Gunicorn to delay model loading
Pre-build static files in Docker image, not at startup
Consider django-extensions for profiling startup bottlenecks
Tune DATABASES['default']['CONN_MAX_AGE'] to reduce connection overhead

Rails Applications:

Enable config.eager_load = true in production to front-load initialization
Use bootsnap gem for faster boot times
Pre-compile assets in Docker build, not at startup
Consider derailed_benchmarks gem for memory profiling

Node.js Applications:

Use --expose-gc flag and manually trigger garbage collection during idle periods
Implement clustering with PM2 or Node.js cluster module to distribute load
Pre-build and cache expensive computations in the Docker image
Monitor V8 heap usage with --heap-prof during development.

Cost Optimization Strategies

Hybrid Instance Strategy: Use performance instances for critical services and shared instances for less demanding workloads like background workers or admin interfaces.

Autostop Configuration: For development and staging environments, autostop functionality can significantly reduce costs, despite the cold start penalty.

Resource Right-sizing: Regularly review your instance utilization. Over-provisioned performance instances are expensive, but under-provisioned shared instances are unreliable. Use the metrics data to find the sweet spot.

The new CPU quota reality has made performance optimization from a nice-to-have into a mission-critical requirement. Applications that worked fine with previous Fly.io defaults now require careful tuning and potentially higher-tier instances to maintain acceptable performance and deployment reliability.

CPU Performance Tiers: Cost vs. Performance Analysis

CPU Type	Baseline	Monthly Cost (24/7)	Deploy Time	Real-World Performance	Best Use Case
shared-cpu-1x	6.25% (1/16 core)	$2.00	anywhere from 8 to 20 minutes (or longer if Mercury is in retrograde)	Severe throttling during startup, barely usable for production	Hobby projects, demos, services that can tolerate very long deploy times
shared-cpu-2x	6.25% (still 1/16 core)	$2.00	maybe 8-15 minutes, hard to tell the difference	No meaningful improvement over 1x due to quota limits	Not recommended same throttling, no cost savings
shared-cpu-4x	6.25% (still 1/16 core)	$2.00	like 6-12 minutes if you're lucky	Marginal improvement if app can parallelize within quota	CPU-parallel workloads that can work within severe limits
shared-cpu-8x	6.25% (still 1/16 core)	$2.00	still painful, maybe 5-10 minutes	Slight improvement but still throttled heavily	Background workers, batch processing with patience
performance-1x	100% (full core)	$7.30	30-60 seconds	Normal, reliable performance for most workloads	Production web applications, APIs, most real workloads
performance-2x	200% (2 full cores)	$14.60	15-30 seconds	Fast deployment and operation, good for CPU-heavy apps	High-traffic applications, CPU-intensive processing
performance-4x	400% (4 full cores)	$29.20	10-20 seconds	Excellent performance for demanding applications	Large production apps, heavy compute workloads
performance-8x	800% (8 full cores)	$58.40	5-15 seconds	Maximum performance tier, handles extreme workloads	Enterprise applications, ML inference, heavy databases

Quick Navigation

Understanding the New CPU Throttling System

The Real-World Impact

Why Fly.io Made This Change

The Burst Balance System

Performance vs. Shared CPU Economics

Why did my deploy time jump from 30 seconds to 8+ minutes?

Should I upgrade all my shared CPUs to performance CPUs?

Will scaling to shared-cpu-8x help with performance?

Why does restarting my machine after deployment fix the slowness?

How can I monitor CPU throttling on my applications?

Can I optimize my application to work better with CPU quotas?

What about memory optimization - does that help with CPU throttling?

Is there any way to get more burst balance for startup?

Should I migrate away from Fly.io because of these changes?

Will Fly.io add intermediate CPU tiers between shared and performance?

Docker Tricks That Actually Help Now

Memory Tricks That Actually Help

How to Size Your Instances Without Going Broke

Deployment Strategy Optimization

Monitoring and Alerting Setup

Framework-specific bullshit I had to figure out (after 2 hours of prod downtime):

Cost Optimization Strategies

Related Tools & Recommendations

Fly.io - Deploy Your Apps Everywhere Without the AWS Headache

Heroku Alternatives: Vercel, Railway, Render, Fly.io Compared

Jira Software Enterprise Deployment Guide: Large Scale Implementation

Render vs. Heroku: Deploy, Pricing, & Common Issues Explained

Kubernetes Operators: Custom Controllers for App Automation

TypeScript Compiler Performance: Fix Slow Builds & Optimize Speed

Fly.io Alternatives: Best Cloud Deployment Platforms Compared

Jsonnet Overview: Stop Copy-Pasting YAML Like an Animal

OpenCost: Kubernetes Cost Monitoring, Optimization & Setup Guide

NVIDIA Container Toolkit: Production Deployment, Docker & Kubernetes GPU

Open Policy Agent (OPA): Centralize Authorization & Policy Management

Debug Kubernetes Issues: The 3AM Production Survival Guide

Migrate Node.js to Bun 2025: Complete Guide & Best Practices

JavaScript Runtime Cost Analysis: Node.js, Deno, Bun Hosting

Google Cloud Developer Tools: SDKs, CLIs & Automation Guide

Grafana: Monitoring Dashboards, Observability & Ecosystem Overview

Python 3.12 New Projects: Setup, Best Practices & Performance

jQuery - The Library That Won't Die

pyenv-virtualenv: Stop Python Environment Hell - Overview & Guide

Rancher Desktop: The Free Docker Desktop Alternative That Works