Currently viewing the human version
Switch to AI version

What S3 Actually Is (Skip the Marketing BS)

What S3 Actually Is (Skip the Marketing BS)

S3 launched in 2006 when AWS was still figuring out what "cloud" meant.

Nearly 20 years later, it's become the storage everyone copies because it solved the fundamental problem: reliable, scalable object storage without the nightmare of managing your own infrastructure. Gartner consistently ranks AWS as a leader in cloud storage services.

Here's what you need to know:

S3 stores files (they call them "objects") in buckets. Think of it as a giant key-value store where the key is the file path and the value is your data. Dead simple concept, but the implementation handles edge cases that would make you want to quit engineering.

Why Object Storage Doesn't Suck Like File Systems

Traditional file systems shit the bed when you hit certain limits. Try storing 100 million files in a directory and watch your server cry. S3 doesn't have this problem because it's not pretending to be a file system.

Every file is an "object" with a [key (the path) and metadata](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/UsingMetadata.html).

No directories, no inodes, no filesystem corruption when your server decides to reboot during a write operation. Spotify stores millions of tracks using this exact architecture for their streaming platform.

S3 Object Storage Architecture

# This is how you think about S3 
- just key-value pairs
my-bucket/users/123/profile.jpg
my-bucket/uploads/2025/01/18/document.pdf
my-bucket/backups/db-dump-20250118.sql.gz

The 99.999999999% durability sounds like marketing bullshit until you realize they replicate your data across multiple data centers automatically.

I've never actually lost data in S3, which is more than I can say for the RAID arrays I managed in 2010.

Storage Classes (Where Your Money Goes)

S3 has eight storage classes, which sounds complicated until you realize most people only use three:

Standard for stuff you need right now, Intelligent-Tiering for everything else, and Glacier Deep Archive for backups you hope to never touch.

Standard costs $0.023/GB for instant access. Glacier Deep Archive drops to $0.00099/GB but takes 12+ hours to retrieve your data.

The genius move is Intelligent-Tiering

  • it automatically shuffles data between tiers based on access patterns without you having to guess. Airbnb uses this approach to manage their data lake costs.

I learned this the hard way when our backup costs hit $50K/month because everything was in Standard.

Switching to Intelligent-Tiering cut that to $15K overnight.

AWS doesn't make this obvious in their pricing calculator.

Integrations (The Real Reason People Stick With AWS)

S3 works with pretty much everything AWS makes, which is both a blessing and a trap.

Need a CDN? CloudFront reads from S 3.

Want to process files automatically? Lambda triggers when objects change.

Need to run SQL on CSV files? Athena queries S3 directly.

The integration ecosystem is AWS's moat.

Once you're storing everything in S3, using other AWS services becomes trivial. Want to switch to Google Cloud? Good luck migrating petabytes of data and rewriting all your Lambda functions.

I've built data pipelines where S3 events trigger Lambda functions that write to Kinesis streams that feed into EMR clusters that output back to S 3. It works, but you're locked into AWS forever.

Security (Don't Fuck This Up)

S3 security is layered, which means there are multiple ways to accidentally expose your data to the internet. [Bucket policies](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/bucket-policies.html) control who can access your bucket, [IAM policies](https://docs.aws.amazon.com/IAM/latest/User

Guide/reference_policies_examples_s3.html) control what users can do, and if you mess up either one, your data is public.

The good news is [encryption is on by default](https://docs.aws.amazon.com/Amazon

S3/latest/userguide/UsingEncryption.html) now.

AWS learned from too many security breaches where people forgot to enable encryption. Still doesn't help if you make your bucket public, though.

For compliance nerds, S3 Object Lock prevents anyone (including you) from deleting or modifying objects for a set period.

Perfect for backups and regulatory requirements where "oops, I deleted everything" isn't an acceptable excuse. Financial institutions rely on this for SEC compliance and FINRA record retention.

Why It's Expensive But Worth It

S3 starts cheap with a 5GB free tier, then hits you with the real costs once you're hooked.

Storage is $0.023/GB in US regions, but the requests add up fast: $0.0004 per 1,000 GETs and $0.005 per 1,000 PUTs.

The real cost is data transfer.

Moving data out of S3 costs $0.09/GB, which gets painful fast if you're serving large files directly. That's why everyone puts CloudFront in front of S3.

But here's the thing: try building equivalent reliability yourself.

Factor in the cost of hiring ops engineers, buying servers, managing backups, and dealing with data center outages. S3 suddenly looks reasonable. Stripe's engineering team estimates they save millions annually by using S3 instead of self-managed storage infrastructure.

S3 vs Top Cloud Storage Competitors

Feature

Amazon S3

Google Cloud Storage

Azure Blob Storage

DigitalOcean Spaces

Durability

99.999999999% (11 9's)

99.999999999% (11 9's)

99.999999999% (11 9's)

99.999999999% (11 9's)

Availability SLA

99.99% (Standard)

99.95% (Standard)

99.9% (Hot)

99.9%

Storage Classes

8 classes

4 classes

3 tiers

1 class

Max Object Size

5 TB

5 TB

4.77 TB

50 GB

Free Tier

5 GB/month

5 GB/month

None

None

S3 Storage Classes: Choosing the Right Tier for Your Data

Getting S3 storage classes wrong costs companies thousands monthly. The eight classes aren't just pricing tiers - they're designed for specific access patterns and retrieval requirements.

S3 Storage Classes Overview

The Standard Tier Reality Check

S3 Standard at $0.023/GB serves frequently accessed data with sub-100ms latency. Use it for active application data, frequently accessed analytics, and content distribution. The pricing seems high until you factor in 99.99% availability and instant global accessibility.

Spotify keeps their most popular songs in S3 Standard for instant streaming worldwide, while older tracks automatically tier to cheaper storage.

Intelligent-Tiering: Set-and-Forget Optimization

S3 Intelligent-Tiering charges $0.0125/1000 objects monthly for monitoring, but automatically moves data between Standard ($0.023/GB) and Infrequent Access ($0.0125/GB) tiers based on access patterns. Objects not accessed for 30 days move to IA, returning to Standard when accessed.

S3 Intelligent-Tiering Flow

The math works for unpredictable workloads. If 50% of your data goes untouched monthly, you save approximately $0.0055/GB after monitoring costs.

Archive Classes for Long-Term Storage

S3 Glacier Instant Retrieval ($0.004/GB): Archives with millisecond retrieval for compliance data accessed quarterly. Healthcare companies use this for medical records that must be instantly available during audits.

S3 Glacier Flexible Retrieval ($0.0036/GB): 1-5 minute retrieval time. Media companies archive raw footage here - accessible when needed but not instant.

S3 Glacier Deep Archive ($0.00099/GB): 12-hour retrieval for seven-year retention requirements. Banks store transaction logs here for regulatory compliance.

Express One Zone: Performance for Analytics

S3 Express One Zone delivers single-digit millisecond latency with 10x faster processing. At $0.16/GB, it's expensive but crucial for high-performance analytics workloads.

Financial firms use Express One Zone for real-time risk calculations where milliseconds matter for trading decisions.

Regional Availability and Lifecycle Policies

Not all classes are available in every region. S3 Express One Zone is limited to major regions, while Glacier classes are globally available. Lifecycle policies automate transitions:

Day 0: Upload to S3 Standard
Day 30: Transition to S3 IA (if not accessed)
Day 90: Transition to Glacier Flexible Retrieval
Day 365: Transition to Glacier Deep Archive

Storage Class Gotchas That Cost Money

Minimum storage durations: IA classes charge for 30 days minimum, Glacier classes for 90-180 days. Delete an object early, pay for the full minimum period.

Minimum object sizes: IA classes charge for 128KB minimum. Store a 1KB file, pay for 128KB.

Retrieval costs: Glacier classes charge for retrievals. S3 Standard and IA don't. Budget $0.01/GB for Glacier Flexible retrievals.

Request pricing varies: Express One Zone charges $0.25 per 1,000 requests vs $0.0004 for Standard GET requests.

The key is matching storage class to actual usage patterns, not optimizing for theoretical scenarios.

Questions You'll Actually Ask at 3 AM

Q

Wait, where the hell are my folders?

A

S3 doesn't have folders. Those "folders" in the AWS console are lies

  • they're just part of the object key. The path documents/reports/2025/budget.pdf is a single string, not a directory structure. This will fuck with your head if you're used to file systems.I spent two hours trying to "move" files between folders before realizing I needed to copy the object with a new key and delete the old one. There's no mv command because there's no actual directory to move within.
Q

How much shit can I actually store?

A

Individual files max out at 5TB. Buckets have no limits. You can store unlimited objects. Scale isn't the problem

  • your AWS bill is.
Q

Will AWS lose my data?

A

Probably not.

I've never actually lost data in S3, but remember the February 2017 outage that broke half the internet for 4 hours? S3 isn't invincible.The 99.999999999% durability stat is real though

  • they replicate your data across multiple data centers automatically. Still, if you're paranoid (and you should be), enable Cross-Region Replication.
Q

Why are my S3 bills so high? (Everyone asks this eventually)

A

Because AWS billing is designed to surprise you:

  • Request charges: Every API call costs money. That script that lists your bucket every minute? It's costing you hundreds monthly.
  • Data transfer: Moving data out of S3 costs $0.09/GB. Serve a few videos directly from S3 and watch your bill explode.
  • Storage class mistakes: I once put 10TB of backups in Standard instead of Glacier and got a $2,000 surprise.
  • Small file bullshit: Moving tiny files to Infrequent Access actually costs more due to minimum billing sizes.

Pro tip: Check Cost Explorer every week or you'll get fucked.

Q

Can I mount S3 like a normal drive?

A

No, and you shouldn't want to. S3 is object storage with REST APIs, not a file system. Every file operation becomes HTTP requests, which is slow as shit.

That said, AWS offers Storage Gateway and S3 Mountpoint for masochists who insist on pretending S3 is a file system. Performance will disappoint you.

Q

How do I not accidentally expose my data to the internet?

A

S3 security has more layers than an onion, and just as likely to make you cry:

  • Block Public Access: Turn this on everywhere. I don't care what you think you need.
  • Bucket policies: JSON hell that controls bucket access
  • IAM policies: Different JSON hell that controls user access
  • VPC Endpoints: Keep traffic inside AWS so it can't leak

The Capital One breach happened because someone fucked up IAM roles, not S3 itself. The real threat is you misconfiguring something.

Q

How do I move massive amounts of data without losing my mind?

A

For under 10TB, use DataSync and pray your internet connection doesn't die halfway through. For bigger moves, AWS will literally ship you hardware:

  • Snowcone: 8TB (cute little box)
  • Snowball Edge: 80TB (briefcase-sized)
  • Snowmobile: 100PB (actual fucking truck)

Whatever AWS estimates for migration time, triple it. I've seen "2-week" migrations take 3 months when edge cases started crawling out of the woodwork.

Q

Why is versioning both a blessing and a curse?

A

Versioning saves your ass when someone accidentally deletes production data. It also quietly costs you a fortune because every version is a separate billable object.

Enable it before you need it - you can't un-delete without versioning. But set up lifecycle policies immediately or you'll be paying for 50 versions of that 10GB file you keep overwriting.

Q

Can I run my database on S3?

A

No. Don't even think about it. S3 is object storage for files, not a database. It has no ACID transactions, no indexes, and every query is an HTTP request.

Use S3 for database backups, data lakes, and storing files your app serves. Not for your actual database. I've seen people try this. It ends badly.

Q

Remember that time S3 broke the internet?

A

February 28, 2017. S3 in US-East-1 went down for 4 hours and took half the internet with it. Slack went dark. Websites showed blank pages. Even AWS's own status page broke because it stored its icons in S3.

Other notable shitshows:

  • November 2020: Multi-service outage
  • December 2021: Another US-East-1 disaster

Lesson: If your app can't survive an S3 outage, architect for it or accept the risk.

Q

How do I make S3 fast?

A
  • Spread your requests: Don't hammer the same prefix patterns or S3 will throttle you
  • Use CloudFront: For anything users download frequently
  • Multipart uploads: Required for files over 5GB, smart for anything over 100MB
  • Parallel everything: Multiple connections, multiple threads
  • Express One Zone: For when you need sub-10ms latency (and can afford it)
Q

How do I stop S3 costs from spiraling?

A
  1. Use Intelligent-Tiering: Let AWS figure out storage classes for you
  2. Lifecycle policies: Auto-delete old shit you don't need
  3. Compress everything: Smaller files = lower costs
  4. Stop listing buckets constantly: Every API call costs money
  5. Monitor with Storage Lens: Before costs surprise you
  6. Use S3 Select: Query data in place instead of downloading terabytes

Essential S3 Resources and Documentation

Related Tools & Recommendations

integration
Recommended

Stop Fighting Your CI/CD Tools - Make Them Work Together

When Jenkins, GitHub Actions, and GitLab CI All Live in Your Company

GitHub Actions
/integration/github-actions-jenkins-gitlab-ci/hybrid-multi-platform-orchestration
100%
pricing
Similar content

Why Serverless Bills Make You Want to Burn Everything Down

Six months of thinking I was clever, then AWS grabbed my wallet and fucking emptied it

AWS Lambda
/pricing/aws-lambda-vercel-cloudflare-workers/cost-optimization-strategies
98%
tool
Similar content

S3 Enterprise Data Migration - How to Move Petabytes Without Getting Fired

Learn from expensive migration disasters so you don't have to live through your own. Real strategies that work when the network sucks and users are rioting.

Amazon Simple Storage Service (Amazon S3)
/tool/amazon-s3/enterprise-data-migration
93%
alternatives
Recommended

Lambda's Cold Start Problem is Killing Your API - Here's What Actually Works

I've tested a dozen Lambda alternatives so you don't have to waste your weekends debugging serverless bullshit

AWS Lambda
/alternatives/aws-lambda/by-use-case-alternatives
65%
tool
Recommended

AWS Lambda - Run Code Without Dealing With Servers

Upload your function, AWS runs it when stuff happens. Works great until you need to debug something at 3am.

AWS Lambda
/tool/aws-lambda/overview
65%
review
Recommended

CloudFront Review: It's Fast When It Works, Hell When It Doesn't

What happens when you actually deploy AWS CloudFront in production - the good, the bad, and the surprise bills that make you question your life choices

AWS CloudFront
/review/aws-cloudfront/performance-user-experience-review
65%
tool
Recommended

Amazon CloudFront - AWS's CDN That Actually Works (Sometimes)

CDN that won't make you want to quit your job, assuming you're already trapped in AWS hell

AWS CloudFront
/tool/aws-cloudfront/overview
65%
review
Recommended

Terraform is Slow as Hell, But Here's How to Make It Suck Less

Three years of terraform apply timeout hell taught me what actually works

Terraform
/review/terraform/performance-review
63%
tool
Recommended

Terraform Enterprise - HashiCorp's $37K-$300K Self-Hosted Monster

Self-hosted Terraform that doesn't phone home to HashiCorp and won't bankrupt you with per-resource billing

Terraform Enterprise
/tool/terraform-enterprise/overview
63%
review
Recommended

Terraform Performance at Scale Review - When Your Deploys Take Forever

integrates with Terraform

Terraform
/review/terraform/performance-at-scale
63%
tool
Recommended

Apache Spark Troubleshooting - Debug Production Failures Fast

When your Spark job dies at 3 AM and you need answers, not philosophy

Apache Spark
/tool/apache-spark/troubleshooting-guide
60%
tool
Recommended

Apache Spark - The Big Data Framework That Doesn't Completely Suck

integrates with Apache Spark

Apache Spark
/tool/apache-spark/overview
60%
troubleshoot
Recommended

Docker Daemon Won't Start on Windows 11? Here's the Fix

Docker Desktop keeps hanging, crashing, or showing "daemon not running" errors

Docker Desktop
/troubleshoot/docker-daemon-not-running-windows-11/windows-11-daemon-startup-issues
60%
howto
Recommended

Deploy Django with Docker Compose - Complete Production Guide

End the deployment nightmare: From broken containers to bulletproof production deployments that actually work

Django
/howto/deploy-django-docker-compose/complete-production-deployment-guide
60%
tool
Recommended

Docker 프로덕션 배포할 때 털리지 않는 법

한 번 잘못 설정하면 해커들이 서버 통째로 가져간다

docker
/ko:tool/docker/production-security-guide
60%
howto
Recommended

Stop Breaking FastAPI in Production - Kubernetes Reality Check

What happens when your single Docker container can't handle real traffic and you need actual uptime

FastAPI
/howto/fastapi-kubernetes-deployment/production-kubernetes-deployment
57%
integration
Recommended

Temporal + Kubernetes + Redis: The Only Microservices Stack That Doesn't Hate You

Stop debugging distributed transactions at 3am like some kind of digital masochist

Temporal
/integration/temporal-kubernetes-redis-microservices/microservices-communication-architecture
57%
howto
Recommended

Your Kubernetes Cluster is Probably Fucked

Zero Trust implementation for when you get tired of being owned

Kubernetes
/howto/implement-zero-trust-kubernetes/kubernetes-zero-trust-implementation
57%
integration
Recommended

Jenkins + Docker + Kubernetes: How to Deploy Without Breaking Production (Usually)

The Real Guide to CI/CD That Actually Works

Jenkins
/integration/jenkins-docker-kubernetes/enterprise-ci-cd-pipeline
57%
integration
Recommended

GitHub Actions + Jenkins Security Integration

When Security Wants Scans But Your Pipeline Lives in Jenkins Hell

GitHub Actions
/integration/github-actions-jenkins-security-scanning/devsecops-pipeline-integration
57%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization