ECS vs EKS - which one should I pick?

If you're already on AWS and just want containers to work without learning Kubernetes, use ECS. If you need portability or your team knows K8s, use EKS. ECS has no control plane costs but locks you into AWS. EKS costs [$0.10/hour per cluster](https://aws.amazon.com/eks/pricing/) but gives you standard Kubernetes.

Why does my Fargate task take forever to start?

Fargate has a 1-3 minute cold start time because AWS needs to provision the underlying infrastructure. This is just how it works. If you need faster startup, use EC2 launch type with pre-warmed instances, or keep your services scaled to at least 1 task so you have warm containers ready.

How much is this actually going to cost me?

Fargate pricing at [$0.04048 per vCPU-hour and $0.004445 per GB-hour](https://aws.amazon.com/fargate/pricing/) adds up fast. A small container (0.5 vCPU, 1GB RAM) costs about $18/month if you run it 24/7. Don't forget about CloudWatch logs ($0.50/GB), data transfer, and NAT Gateway costs for internet access. I've seen bills double because of logging.

Can I run Windows containers on ECS?

Yes, but only on EC2 instances, not Fargate. Windows containers need Windows Server instances, which cost more due to Microsoft licensing. Also, Windows containers are about as fun as debugging JavaScript in Internet Explorer - they work, but you'll question your life choices.

My task just says "PENDING" forever, what's wrong?

Usually it's one of these: insufficient CPU/memory capacity in your cluster (error: `InsufficientCapacity`), ENI limits in your subnet (error: `CannotPullContainerError`), or security group issues blocking the ALB health check. I spent 2 hours once debugging this before realizing my security group wasn't allowing traffic on port 80. Check the ECS console events tab - it'll tell you exactly what's wrong instead of making you guess.

Why can't my containers talk to each other?

Service discovery DNS can take 30+ seconds to propagate, so your app might be trying to connect before the DNS record exists. Also check security groups - each Fargate task gets its own ENI, so the security group rules apply at the task level, not the instance level.

How do I handle secrets in ECS?

Use [Secrets Manager](https://aws.amazon.com/secrets-manager/) or [Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html) and reference them in your task definition. ECS pulls secrets at runtime and injects them as environment variables. Don't put secrets directly in your task definition - they'll show up in the console and logs.

Should I run databases in ECS?

No. Just use [RDS](https://aws.amazon.com/rds/) or another managed database service. Running stateful services in containers is a pain in the ass - you'll spend more time managing storage and backups than solving actual problems. Save yourself the headache.

ECS vs plain EC2 - what's the point?

ECS gives you health monitoring, rolling deployments, load balancer integration, and service discovery out of the box. You could build all this yourself on EC2, but why? ECS costs the same as plain EC2 (for EC2 launch type) but handles all the orchestration complexity.

My deployment keeps failing, what now?

Check the service events in the ECS console first - they usually tell you exactly what's wrong. Common issues: health check failures (check your ALB target group settings - health check path `/health` returning 404), resource constraints (task definition requesting 4GB but instance only has 2GB free), or networking problems (security groups, subnets). The error messages are actually pretty helpful if you read them. I've debugged deployments that failed because the health check timeout was 5 seconds but the app took 8 seconds to start responding.

What are the actual limits I'll hit?

The [official limits](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-quotas.html) say 1,000 services per cluster and 5,000 tasks per service, but there's a catch: service discovery limits you to 1,000 tasks per service because of Cloud Map restrictions. You'll hit ENI limits in your subnets before hitting most other limits.

How do I do blue-green deployments?

ECS supports blue-green through [CodeDeploy](https://aws.amazon.com/codedeploy/) integration, but honestly, just use rolling deployments unless you have a specific reason not to. They're simpler and work fine for most use cases. Blue-green is overkill for most applications.

Can I use ECS on-premises?

[ECS Anywhere](https://aws.amazon.com/ecs/anywhere/) lets you run ECS on your own hardware for [$0.01025/hour per instance](https://aws.amazon.com/ecs/anywhere/pricing/). It works, but you're paying AWS to manage containers on your own servers. If you want on-premises container orchestration, Kubernetes might make more sense.

How do I debug what's happening in my containers?

Use [ECS Exec](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html) to shell into running containers - it's like SSH but goes through AWS Session Manager. Enable [Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html) for detailed metrics, but be prepared for the CloudWatch costs to add up.

What networking mode should I use?

Use awsvpc mode (default for Fargate) where each task gets its own ENI. It's more secure and easier to understand than bridge mode. Host mode is only useful for special cases where you need direct host network access.

Currently viewing the AI version

Switch to human version

Amazon ECS: AI-Optimized Technical Reference

What ECS Is

AWS container orchestration service that manages Docker containers without requiring Kubernetes expertise. Three main components:

Control Plane: AWS-managed scheduling and monitoring (vendor lock-in trade-off)
Data Plane: Container execution environment (EC2, Fargate, or ECS Managed Instances)
Task Definitions: JSON configuration files (verbose compared to Docker Compose)

Launch Types and Cost Reality

Fargate

Cost: $0.04048 per vCPU-hour, $0.004445 per GB-hour
Startup Time: 1-3 minutes (problematic for real-time applications)
Use Case: Teams wanting zero infrastructure management
Critical Limitation: No host-level access, no GPU support
Hidden Cost: Pay for allocated resources, not usage

EC2 Launch Type

Cost: EC2 pricing + no additional charges
Trade-off: Lower cost but requires server management
Failure Mode: Instance death kills all containers
Optimization: Use Reserved Instances and Spot for cost savings

ECS Managed Instances (New - Sept 2025)

Status: Too new for production (you become beta tester)
Promise: AWS handles patching while preserving EC2 flexibility
Risk: Pricing unknown, unproven in production

Production Failure Modes

Task Placement Issues

Spread Placement: Uneven distribution causes AZ overloading
Binpack Placement: Single instance failure affects multiple services
Reality: Custom constraints fail silently with unclear error messages

Scaling Limitations

Service Auto Scaling: 5+ minute CloudWatch metric lag makes reactive scaling ineffective
Capacity Provider Scaling: 2-5 minute instance provisioning creates PENDING state delays
Service Limits: 1,000 tasks per service with service discovery (Cloud Map restriction)
Cluster Limits: 1,000 services per cluster maximum

Networking Gotchas

ENI Limits: Each Fargate task consumes one ENI (subnet capacity planning critical)
DNS Propagation: 30+ second delays for service discovery
Security Groups: Applied per-task in Fargate (not instance-level)
Load Balancer Health Checks: Independent timeout settings can fail deployments

Cost Optimization Strategies

Spot Instance Usage

Fargate Spot: 70% savings but 2-minute termination notice
EC2 Spot: 90% savings but requires resilient application design
Interruption Rate: Varies by region and time

Regional Cost Differences

US East: $0.04048 per vCPU-hour baseline
São Paulo: $0.0696 per vCPU-hour (72% more expensive)
Impact: Significant for global deployments

Hidden Costs

CloudWatch logs: $0.50 per GB ingested
NAT Gateway: Required for Fargate internet access
Data transfer charges
Container Insights: Additional CloudWatch costs

When ECS Makes Sense

Ideal Use Cases

Batch Processing: Tolerates 2-5 minute startup times
AWS-Native Shops: Already using RDS, S3, other AWS services
Teams Avoiding Kubernetes: Lack of container orchestration expertise
Financial/Healthcare: Simplified compliance through AWS shared responsibility

Performance Characteristics

Scientific Computing: Good for overnight processing, poor for real-time
Media Processing: Excellent with spot instances (80%+ cost savings)
AI Inference: Cold start times problematic, requires pre-warming
AI Training: SageMaker usually better choice

Decision Matrix: ECS vs Alternatives

Requirement	ECS	EKS	Recommendation
AWS Lock-in Acceptable	✓	✗	Choose ECS
Multi-cloud Portability	✗	✓	Choose EKS
Kubernetes Expertise Available	✗	✓	Choose EKS
Simple Container Deployment	✓	✗	Choose ECS
Advanced Scheduling Needs	✗	✓	Choose EKS
Control Plane Cost Sensitivity	✓	✗ ($0.10/hour)	Choose ECS

Critical Configuration Warnings

Task Definition Gotchas

Resource Allocation: Pay for requested resources, not actual usage
Memory Limits: Exit code 137 indicates memory limit exceeded
CPU Units: 1024 CPU units = 1 vCPU (non-intuitive scaling)

Security Configuration

IAM Roles: Assign per-task, not per-service
Secrets Management: Use Parameter Store/Secrets Manager, never hardcode
ECS Exec: Must be enabled at service level for debugging access

Production Settings That Fail

Default Health Check: 5-second timeout often insufficient for application startup
Rolling Deployment: Default minimum healthy percent can cause downtime
Service Discovery: DNS caching issues with short TTL values

Troubleshooting Common Issues

PENDING Tasks

InsufficientCapacity: Cluster lacks CPU/memory resources
ENI Provisioning Failed: Subnet ENI limits exceeded
CannotPullContainerError: Network/security group issues

Service Start Failures

Health Check Failures: Verify ALB target group configuration
Resource Constraints: Task definition exceeds available capacity
Security Group Rules: Check task-level network access

Performance Problems

Slow Response Times: CloudWatch metrics lag prevents effective scaling
Container Crashes: Memory limits too low, check Container Insights
Network Latency: Service discovery DNS propagation delays

Resource Requirements

Technical Expertise Needed

Minimal: Basic AWS services knowledge, Docker fundamentals
Learning Curve: 1-2 weeks for basic proficiency
Compared to Kubernetes: 10x easier to achieve production deployment

Time Investment

Initial Setup: 1-3 days for basic service
Production Hardening: 1-2 weeks for proper monitoring, scaling, security
Operational Overhead: Minimal ongoing maintenance vs self-managed K8s

Team Size Requirements

Minimum: 1 engineer with AWS experience
Optimal: 2-3 engineers for production workloads
DevOps Savings: No dedicated Kubernetes specialists required

Migration Considerations

From VM/Bare Metal

Containerization Effort: Major application refactoring likely needed
Stateful Services: Move to managed AWS services (RDS, ElastiCache)
Timeline: 3-6 months for typical enterprise application

From Kubernetes

Vendor Lock-in Risk: Complete AWS dependency
Feature Loss: Advanced scheduling, custom operators unavailable
Cost Change: Often 20-30% increase due to Fargate pricing

Exit Strategy

Portability: Minimal - requires complete rewrite for other platforms
Container Images: Portable, but orchestration configuration is not
Timeline: 6-12 months to migrate off ECS to another platform

Useful Links for Further Investigation

Resources That Actually Help

Link	Description
ECS Developer Guide	The official docs are actually decent. Start here for task definitions and service configuration. The troubleshooting section is surprisingly useful.
ECS API Reference	When you need to automate ECS with code. The examples are helpful, and the error codes section will save you time debugging.
Fargate Pricing Calculator	Essential for figuring out if Fargate will bankrupt you. Compare regions - pricing varies a lot.
ECS Troubleshooting Guide	Bookmark this. You'll need it when things inevitably break. Covers the most common "WTF is happening" scenarios.
ECS CloudFormation Reference Architecture	Working code examples for microservices deployment with ECS and CloudFormation. Much better than trying to piece together docs.
ECS FireLens Examples	Sample logging architectures for ECS and Fargate. Real patterns you can copy and adapt.
Containers on AWS Blog	Occasionally has useful real-world case studies. Skip the marketing posts, look for the technical deep-dives.
ECS Workshop	Hands-on tutorials that actually work. Good for learning beyond the basics.
AWS Copilot CLI	Command-line tool that makes ECS deployment less painful. Generates sensible defaults and handles a lot of the AWS complexity.
Terraform ECS Modules	If you're using Infrastructure as Code, these modules are solid. Better than writing Terraform from scratch.
ECS CLI (Deprecated but still useful)	AWS is deprecating this in favor of Copilot, but it still works for simple use cases.
AWS Community Forums	Official AWS community forums where real engineers ask real questions. Search for "ECS" to find solutions to problems you didn't know you'd have.
AWS re:Invent ECS Sessions	Getting up and running with Amazon ECS from re:Invent 2020. Skip the marketing sessions, watch the deep technical talks.
AWS Events YouTube Channel	Official AWS Events channel with re:Invent sessions and webinars. Search for "ECS" to find specific container talks.
Stack Overflow ECS Questions	When Google fails you, Stack Overflow probably has the answer. The ECS tag is pretty active.
Container Insights Setup	You'll need this for production. Just be prepared for the CloudWatch costs to add up quickly.
ECS Exec Documentation	How to shell into running containers when things go sideways. Much better than trying to debug through logs alone.
App2Container	AWS tool for containerizing legacy apps. Works better than expected, though you'll still need to do the hard work of making your app stateless.

Amazon ECS: AI-Optimized Technical Reference

What ECS Is

Launch Types and Cost Reality

Fargate

EC2 Launch Type

ECS Managed Instances (New - Sept 2025)

Production Failure Modes

Task Placement Issues

Scaling Limitations

Networking Gotchas

Cost Optimization Strategies

Spot Instance Usage

Regional Cost Differences

Hidden Costs

When ECS Makes Sense

Ideal Use Cases

Performance Characteristics

Decision Matrix: ECS vs Alternatives

Critical Configuration Warnings

Task Definition Gotchas

Security Configuration

Production Settings That Fail

Troubleshooting Common Issues

PENDING Tasks

Service Start Failures

Performance Problems

Resource Requirements

Technical Expertise Needed

Time Investment

Team Size Requirements

Migration Considerations

From VM/Bare Metal

From Kubernetes

Exit Strategy

Useful Links for Further Investigation

Resources That Actually Help

Related Tools & Recommendations

K8s 망해서 Swarm 갔다가 다시 돌아온 개삽질 후기

Amazon ECS - Container orchestration that actually works

AWS Fargate - Run Containers Without the Server Babysitting

Migration vers Kubernetes

Kubernetes 替代方案：轻量级 vs 企业级选择指南

Kubernetes - Le Truc que Google a Lâché dans la Nature

Docker Swarm 프로덕션 배포 - 야근하면서 깨달은 개빡치는 현실

Docker Swarm - Container Orchestration That Actually Works

Google Kubernetes Engine (GKE) - Google's Managed Kubernetes (That Actually Works Most of the Time)

GKE Security That Actually Stops Attacks

Amazon EKS - Managed Kubernetes That Actually Works

Azure Container Instances Production Troubleshooting - Fix the Shit That Always Breaks

Azure Container Instances - Run Containers Without the Kubernetes Complexity Tax

HashiCorp Nomad - Kubernetes Alternative Without the YAML Hell

HashiCorp Nomad - 한국 스타트업을 위한 간단한 Container Orchestration

AWS CodePipeline - Deploy Mobile Apps Without Jenkins Eating Your Laptop

Container Orchestration Pricing: What You'll Actually Pay (Spoiler: More Than You Think)

Stop manually configuring servers like it's 2005

Terraform - Define Infrastructure in Code Instead of Clicking Through AWS Console for 3 Hours

Terraform vs Ansible vs Pulumi - Guía Completa de Herramientas IaC 2025