Amazon S3 - Object Storage That Actually Works

Currently viewing the human version

What S3 Actually Is (Skip the Marketing BS)

S3 launched in 2006 when AWS was still figuring out what "cloud" meant.

Nearly 20 years later, it's become the storage everyone copies because it solved the fundamental problem: reliable, scalable object storage without the nightmare of managing your own infrastructure. Gartner consistently ranks AWS as a leader in cloud storage services.

Here's what you need to know:

S3 stores files (they call them "objects") in buckets. Think of it as a giant key-value store where the key is the file path and the value is your data. Dead simple concept, but the implementation handles edge cases that would make you want to quit engineering.

Why Object Storage Doesn't Suck Like File Systems

Traditional file systems shit the bed when you hit certain limits. Try storing 100 million files in a directory and watch your server cry. S3 doesn't have this problem because it's not pretending to be a file system.

Every file is an "object" with a [key (the path) and metadata](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/UsingMetadata.html).

No directories, no inodes, no filesystem corruption when your server decides to reboot during a write operation. Spotify stores millions of tracks using this exact architecture for their streaming platform.

S3 Object Storage Architecture

# This is how you think about S3 
- just key-value pairs
my-bucket/users/123/profile.jpg
my-bucket/uploads/2025/01/18/document.pdf
my-bucket/backups/db-dump-20250118.sql.gz

The 99.999999999% durability sounds like marketing bullshit until you realize they replicate your data across multiple data centers automatically.

I've never actually lost data in S3, which is more than I can say for the RAID arrays I managed in 2010.

Storage Classes (Where Your Money Goes)

S3 has eight storage classes, which sounds complicated until you realize most people only use three:

Standard for stuff you need right now, Intelligent-Tiering for everything else, and Glacier Deep Archive for backups you hope to never touch.

Standard costs $0.023/GB for instant access. Glacier Deep Archive drops to $0.00099/GB but takes 12+ hours to retrieve your data.

The genius move is Intelligent-Tiering

it automatically shuffles data between tiers based on access patterns without you having to guess. Airbnb uses this approach to manage their data lake costs.

I learned this the hard way when our backup costs hit $50K/month because everything was in Standard.

Switching to Intelligent-Tiering cut that to $15K overnight.

AWS doesn't make this obvious in their pricing calculator.

Integrations (The Real Reason People Stick With AWS)

S3 works with pretty much everything AWS makes, which is both a blessing and a trap.

Need a CDN? CloudFront reads from S 3.

Want to process files automatically? Lambda triggers when objects change.

Need to run SQL on CSV files? Athena queries S3 directly.

The integration ecosystem is AWS's moat.

Once you're storing everything in S3, using other AWS services becomes trivial. Want to switch to Google Cloud? Good luck migrating petabytes of data and rewriting all your Lambda functions.

I've built data pipelines where S3 events trigger Lambda functions that write to Kinesis streams that feed into EMR clusters that output back to S 3. It works, but you're locked into AWS forever.

Security (Don't Fuck This Up)

S3 security is layered, which means there are multiple ways to accidentally expose your data to the internet. [Bucket policies](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/bucket-policies.html) control who can access your bucket, [IAM policies](https://docs.aws.amazon.com/IAM/latest/User

Guide/reference_policies_examples_s3.html) control what users can do, and if you mess up either one, your data is public.

The good news is [encryption is on by default](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/UsingEncryption.html) now.

AWS learned from too many security breaches where people forgot to enable encryption. Still doesn't help if you make your bucket public, though.

For compliance nerds, S3 Object Lock prevents anyone (including you) from deleting or modifying objects for a set period.

Perfect for backups and regulatory requirements where "oops, I deleted everything" isn't an acceptable excuse. Financial institutions rely on this for SEC compliance and FINRA record retention.

Why It's Expensive But Worth It

S3 starts cheap with a 5GB free tier, then hits you with the real costs once you're hooked.

Storage is $0.023/GB in US regions, but the requests add up fast: $0.0004 per 1,000 GETs and $0.005 per 1,000 PUTs.

The real cost is data transfer.

Moving data out of S3 costs $0.09/GB, which gets painful fast if you're serving large files directly. That's why everyone puts CloudFront in front of S3.

But here's the thing: try building equivalent reliability yourself.

Factor in the cost of hiring ops engineers, buying servers, managing backups, and dealing with data center outages. S3 suddenly looks reasonable. Stripe's engineering team estimates they save millions annually by using S3 instead of self-managed storage infrastructure.

S3 vs Top Cloud Storage Competitors

Feature	Amazon S3	Google Cloud Storage	Azure Blob Storage	DigitalOcean Spaces
Durability	99.999999999% (11 9's)	99.999999999% (11 9's)	99.999999999% (11 9's)	99.999999999% (11 9's)
Availability SLA	99.99% (Standard)	99.95% (Standard)	99.9% (Hot)	99.9%
Storage Classes	8 classes	4 classes	3 tiers	1 class
Max Object Size	5 TB	5 TB	4.77 TB	50 GB
Free Tier	5 GB/month	5 GB/month	None	None

S3 Storage Classes: Choosing the Right Tier for Your Data

Getting S3 storage classes wrong costs companies thousands monthly. The eight classes aren't just pricing tiers - they're designed for specific access patterns and retrieval requirements.

S3 Storage Classes Overview

The Standard Tier Reality Check

S3 Standard at $0.023/GB serves frequently accessed data with sub-100ms latency. Use it for active application data, frequently accessed analytics, and content distribution. The pricing seems high until you factor in 99.99% availability and instant global accessibility.

Spotify keeps their most popular songs in S3 Standard for instant streaming worldwide, while older tracks automatically tier to cheaper storage.

Intelligent-Tiering: Set-and-Forget Optimization

S3 Intelligent-Tiering charges $0.0125/1000 objects monthly for monitoring, but automatically moves data between Standard ($0.023/GB) and Infrequent Access ($0.0125/GB) tiers based on access patterns. Objects not accessed for 30 days move to IA, returning to Standard when accessed.

S3 Intelligent-Tiering Flow

The math works for unpredictable workloads. If 50% of your data goes untouched monthly, you save approximately $0.0055/GB after monitoring costs.

Archive Classes for Long-Term Storage

S3 Glacier Instant Retrieval ($0.004/GB): Archives with millisecond retrieval for compliance data accessed quarterly. Healthcare companies use this for medical records that must be instantly available during audits.

S3 Glacier Flexible Retrieval ($0.0036/GB): 1-5 minute retrieval time. Media companies archive raw footage here - accessible when needed but not instant.

S3 Glacier Deep Archive ($0.00099/GB): 12-hour retrieval for seven-year retention requirements. Banks store transaction logs here for regulatory compliance.

Express One Zone: Performance for Analytics

S3 Express One Zone delivers single-digit millisecond latency with 10x faster processing. At $0.16/GB, it's expensive but crucial for high-performance analytics workloads.

Financial firms use Express One Zone for real-time risk calculations where milliseconds matter for trading decisions.

Regional Availability and Lifecycle Policies

Not all classes are available in every region. S3 Express One Zone is limited to major regions, while Glacier classes are globally available. Lifecycle policies automate transitions:

Day 0: Upload to S3 Standard
Day 30: Transition to S3 IA (if not accessed)
Day 90: Transition to Glacier Flexible Retrieval
Day 365: Transition to Glacier Deep Archive

Storage Class Gotchas That Cost Money

Minimum storage durations: IA classes charge for 30 days minimum, Glacier classes for 90-180 days. Delete an object early, pay for the full minimum period.

Minimum object sizes: IA classes charge for 128KB minimum. Store a 1KB file, pay for 128KB.

Retrieval costs: Glacier classes charge for retrievals. S3 Standard and IA don't. Budget $0.01/GB for Glacier Flexible retrievals.

Request pricing varies: Express One Zone charges $0.25 per 1,000 requests vs $0.0004 for Standard GET requests.

The key is matching storage class to actual usage patterns, not optimizing for theoretical scenarios.

Questions You'll Actually Ask at 3 AM

Wait, where the hell are my folders?

S3 doesn't have folders. Those "folders" in the AWS console are lies

they're just part of the object key. The path documents/reports/2025/budget.pdf is a single string, not a directory structure. This will fuck with your head if you're used to file systems.I spent two hours trying to "move" files between folders before realizing I needed to copy the object with a new key and delete the old one. There's no mv command because there's no actual directory to move within.

How much shit can I actually store?

Individual files max out at 5TB. Buckets have no limits. You can store unlimited objects. Scale isn't the problem

your AWS bill is.

Will AWS lose my data?

Probably not.

I've never actually lost data in S3, but remember the February 2017 outage that broke half the internet for 4 hours? S3 isn't invincible.The 99.999999999% durability stat is real though

they replicate your data across multiple data centers automatically. Still, if you're paranoid (and you should be), enable Cross-Region Replication.

Why are my S3 bills so high? (Everyone asks this eventually)

Because AWS billing is designed to surprise you:

Request charges: Every API call costs money. That script that lists your bucket every minute? It's costing you hundreds monthly.
Data transfer: Moving data out of S3 costs $0.09/GB. Serve a few videos directly from S3 and watch your bill explode.
Storage class mistakes: I once put 10TB of backups in Standard instead of Glacier and got a $2,000 surprise.
Small file bullshit: Moving tiny files to Infrequent Access actually costs more due to minimum billing sizes.

Pro tip: Check Cost Explorer every week or you'll get fucked.

Can I mount S3 like a normal drive?

No, and you shouldn't want to. S3 is object storage with REST APIs, not a file system. Every file operation becomes HTTP requests, which is slow as shit.

That said, AWS offers Storage Gateway and S3 Mountpoint for masochists who insist on pretending S3 is a file system. Performance will disappoint you.

How do I not accidentally expose my data to the internet?

S3 security has more layers than an onion, and just as likely to make you cry:

Block Public Access: Turn this on everywhere. I don't care what you think you need.
Bucket policies: JSON hell that controls bucket access
IAM policies: Different JSON hell that controls user access
VPC Endpoints: Keep traffic inside AWS so it can't leak

The Capital One breach happened because someone fucked up IAM roles, not S3 itself. The real threat is you misconfiguring something.

How do I move massive amounts of data without losing my mind?

For under 10TB, use DataSync and pray your internet connection doesn't die halfway through. For bigger moves, AWS will literally ship you hardware:

Snowcone: 8TB (cute little box)
Snowball Edge: 80TB (briefcase-sized)
Snowmobile: 100PB (actual fucking truck)

Whatever AWS estimates for migration time, triple it. I've seen "2-week" migrations take 3 months when edge cases started crawling out of the woodwork.

Why is versioning both a blessing and a curse?

Versioning saves your ass when someone accidentally deletes production data. It also quietly costs you a fortune because every version is a separate billable object.

Enable it before you need it - you can't un-delete without versioning. But set up lifecycle policies immediately or you'll be paying for 50 versions of that 10GB file you keep overwriting.

Can I run my database on S3?

No. Don't even think about it. S3 is object storage for files, not a database. It has no ACID transactions, no indexes, and every query is an HTTP request.

Use S3 for database backups, data lakes, and storing files your app serves. Not for your actual database. I've seen people try this. It ends badly.

Remember that time S3 broke the internet?

February 28, 2017. S3 in US-East-1 went down for 4 hours and took half the internet with it. Slack went dark. Websites showed blank pages. Even AWS's own status page broke because it stored its icons in S3.

Other notable shitshows:

November 2020: Multi-service outage
December 2021: Another US-East-1 disaster

Lesson: If your app can't survive an S3 outage, architect for it or accept the risk.

How do I make S3 fast?

Spread your requests: Don't hammer the same prefix patterns or S3 will throttle you
Use CloudFront: For anything users download frequently
Multipart uploads: Required for files over 5GB, smart for anything over 100MB
Parallel everything: Multiple connections, multiple threads
Express One Zone: For when you need sub-10ms latency (and can afford it)

How do I stop S3 costs from spiraling?

Use Intelligent-Tiering: Let AWS figure out storage classes for you
Lifecycle policies: Auto-delete old shit you don't need
Compress everything: Smaller files = lower costs
Stop listing buckets constantly: Every API call costs money
Monitor with Storage Lens: Before costs surprise you
Use S3 Select: Query data in place instead of downloading terabytes

Quick Navigation

What S3 Actually Is (Skip the Marketing BS)

Why Object Storage Doesn't Suck Like File Systems

Storage Classes (Where Your Money Goes)

Integrations (The Real Reason People Stick With AWS)

Security (Don't Fuck This Up)

Why It's Expensive But Worth It

The Standard Tier Reality Check

Intelligent-Tiering: Set-and-Forget Optimization

Archive Classes for Long-Term Storage

Express One Zone: Performance for Analytics

Regional Availability and Lifecycle Policies

Storage Class Gotchas That Cost Money

Wait, where the hell are my folders?

How much shit can I actually store?

Will AWS lose my data?

Why are my S3 bills so high? (Everyone asks this eventually)

Can I mount S3 like a normal drive?

How do I not accidentally expose my data to the internet?

How do I move massive amounts of data without losing my mind?

Why is versioning both a blessing and a curse?

Can I run my database on S3?

Remember that time S3 broke the internet?

How do I make S3 fast?

How do I stop S3 costs from spiraling?

Related Tools & Recommendations

Stop Fighting Your CI/CD Tools - Make Them Work Together

Why Serverless Bills Make You Want to Burn Everything Down

S3 Enterprise Data Migration - How to Move Petabytes Without Getting Fired

Lambda's Cold Start Problem is Killing Your API - Here's What Actually Works

AWS Lambda - Run Code Without Dealing With Servers

CloudFront Review: It's Fast When It Works, Hell When It Doesn't

Amazon CloudFront - AWS's CDN That Actually Works (Sometimes)

Terraform is Slow as Hell, But Here's How to Make It Suck Less

Terraform Enterprise - HashiCorp's $37K-$300K Self-Hosted Monster

Terraform Performance at Scale Review - When Your Deploys Take Forever

Apache Spark Troubleshooting - Debug Production Failures Fast

Apache Spark - The Big Data Framework That Doesn't Completely Suck

Docker Daemon Won't Start on Windows 11? Here's the Fix

Deploy Django with Docker Compose - Complete Production Guide

Docker 프로덕션 배포할 때 털리지 않는 법

Stop Breaking FastAPI in Production - Kubernetes Reality Check

Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You

Your Kubernetes Cluster is Probably Fucked

Jenkins + Docker + Kubernetes: How to Deploy Without Breaking Production (Usually)

GitHub Actions + Jenkins Security Integration