Why Your AWS Account Will Get Pwned (And It's Probably Your Fault)

AWS defaults are designed to get you up and running fast, not to keep you secure. Every default setting is basically "make it work first, secure it never." That EC2 instance you just launched? It's probably wide open. That S3 bucket? Public by accident. That IAM role? Has more permissions than your CEO.

I learned this the hard way when our startup got breached in 2023. Someone found an EC2 instance with 0.0.0.0/0 in its security group, lateral moved through overprivileged roles, and spent three weeks mining crypto on our bill before we noticed. Total damage: $47,000 in compute costs and about six months of trust rebuilding with customers.

The Brutal Reality of Cloud Security Stats

AWS Security Shield

AWS security is complex - that's the whole point of this guide

Look, I don't have exact percentages on how many companies get fucked by AWS misconfigurations, but it's basically all of them. Recent cloud security reports show that 45% of data breaches happen in cloud environments, and most aren't sophisticated nation-state attacks - they're stupid shit like exposed databases and hardcoded API keys.

The attacks I've seen personally:

AWS Security Is Your Problem, Not Theirs

IAM Policy Evaluation Logic

How IAM policy evaluation actually works - this flowchart is why engineers cry

The AWS Shared Responsibility Model is basically AWS saying "we'll keep the lights on, you figure out everything else." They secure the physical infrastructure, you secure:

  • Every fucking IAM policy (and IAM policies are about as readable as tax code written by drunk lawyers)
  • All your network configs (Security Groups with 0.0.0.0/0 are how production dies at 2am)
  • Data encryption (because storing plaintext customer data is career suicide)
  • Access logging (CloudTrail and GuardDuty alerts are 90% noise until you tune them for 6 months)
  • Monitoring everything (because if a crypto miner runs for three weeks without you noticing, you're doing it wrong)

The Enterprise Nightmare Gets Worse

Think securing one AWS account is hard? Try managing security across 200+ accounts with 500+ developers who just want to ship code and don't give a shit about your security policies.

Real problems from my consulting days:

  • Account sprawl: Found a fintech with something like 800+ security group rules pointing to 0.0.0.0/0 - stopped counting after 500 because it was too depressing
  • IAM role explosion: One client had like 12,000 IAM roles or something insane across their org. Could've been more, nobody was actually counting the mess
  • Compliance theater: Spent six months implementing SOC 2 controls that looked good on paper but broke production deployments twice a week
  • Cost explosion: Security logging increased their AWS bill by around 40%, but that's the price of not getting fired when your competitor gets breached

What Actually Works (Based on Getting My Hands Dirty)

This guide isn't theoretical bullshit from someone who's never debugged a security incident at 3am. Everything here comes from:

  • Five years of cleaning up breached AWS accounts - from startups to Fortune 500s
  • Real production incidents - including the time I accidentally locked everyone out of our production environment while implementing MFA
  • Compliance audits - where auditors find every shortcut you thought was clever
  • Cost optimization - because security controls that bankrupt your company aren't effective

Every section includes:

  • The exact CLI commands that actually work (copy-paste ready)
  • What breaks when you implement it (spoiler: everything, at first)
  • Time estimates based on reality (5 minutes if you're lucky, 2 hours if not)
  • The nuclear option when nothing else works ("delete node_modules and try again" equivalent for AWS)

The 3AM Security Reality Check

AWS Security Services Overview

All the AWS security services you'll need to actually secure your infrastructure

Real security hardening means assuming your first line of defense will fail. When someone inevitably commits AWS keys to GitHub (they will), when a developer gives an EC2 instance admin privileges because "it was easier" (they will), when your monitoring fails to catch the obvious attack (it will) - what happens next?

Here are actual attack patterns I've responded to:

The GitHub Key Leak: Junior dev commits `.env` file with production AWS keys. Keys get scraped within 10 minutes by automated credential harvesting bots, attacker spins up GPU instances for crypto mining. We caught it because our AWS bill jumped $200/hour.

The Phishing Success: CFO clicks malicious link, attacker resets their AWS console password (no MFA, obviously). Escalates through overprivileged IAM roles, deletes our production databases for ransom. Restore took 18 hours thanks to RDS automated backups.

The Supply Chain Fuckery: Popular npm package gets compromised, includes code that searches for AWS credentials in environment variables. Steals credentials from container environments, accesses customer data through unrestricted S3 buckets.

Each of these could have been stopped with proper hardening. But the hardening has to survive contact with real developers, real deadlines, and real production emergencies. Theoretical security that breaks everything isn't security - it's job security for whoever has to clean up the mess.

Security Hardening Triage: What to Fix First (When Everything's On Fire)

Security Area

Default Risk Level

Hardening Complexity

Business Impact

Implementation Timeline

Regulatory Impact

IAM & Access Management

🔴 Critical

Medium

Low

2-4 weeks

High (SOC2, GDPR, HIPAA)

Network Security

🔴 Critical

High

Medium

4-8 weeks

High (PCI DSS, SOC2)

Data Encryption

🟡 High

Low

Low

1-2 weeks

Critical (All compliance)

Logging & Monitoring

🟡 High

Medium

Low

2-3 weeks

Critical (All compliance)

Container Security

🟡 High

High

Medium

6-12 weeks

Medium (SOC2)

Secrets Management

🔴 Critical

Low

Low

1-2 weeks

High (All compliance)

Backup & Recovery

🟡 High

Medium

High

4-6 weeks

High (SOC2, HIPAA)

Incident Response

🟡 High

High

Medium

8-12 weeks

Critical (All compliance)

IAM: The Circle of Hell Where Dreams Go to Die

AWS IAM Service Icon

IAM - Identity and Access Management, where good intentions go to die in a maze of JSON policies

IAM policies are about as readable as tax code written by drunk lawyers, but they're the most important thing you'll configure in AWS. Fuck up IAM and every other security control becomes meaningless because attackers will just use your overprivileged roles to bypass everything.

I've seen more companies get breached through IAM misconfigurations than all other attack vectors combined. Usually it's some combination of AWS keys committed to GitHub, overprivileged service accounts, and the classic "we'll tighten permissions later" that never happens.

Stop Using Root Accounts Like a Fucking Amateur

The Root Account Problem: Your AWS root account can do literally anything - delete your entire infrastructure, change your billing info, close your account. Using it for daily ops is like giving your intern the master key to the building and hoping for the best.

How to Fix It (Without Breaking Everything):

First, check if anyone's been stupid enough to create programmatic access keys for root:

## This should return empty or you're already fucked
aws iam list-access-keys --user-name root 2>/dev/null || echo \"Good, no root keys exist\"

Enable MFA on root right fucking now:

## Check if root has MFA (it doesn't)
aws iam get-account-summary --query 'SummaryMap.AccountMFAEnabled'

Create a break-glass admin user instead of using root for everything:

{
  \"Version\": \"2012-10-17\",
  \"Statement\": [
    {
      \"Effect\": \"Allow\",
      \"Action\": \"*\",
      \"Resource\": \"*\",
      \"Condition\": {
        \"Bool\": {
          \"aws:MultiFactorAuthPresent\": \"true\"
        },
        \"NumericLessThan\": {
          \"aws:MultiFactorAuthAge\": \"3600\"
        }
      }
    }
  ]
}

Pro tip: Lock down root with SCPs that prevent root usage except during actual emergencies. I learned this after someone used root to accidentally delete our production RDS cluster at 2am on Black Friday.

Least Privilege: Or How I Learned to Stop Worrying and Love Broken Deployments

The AdministratorAccess Problem: Every developer wants AdministratorAccess because it's easier than figuring out what permissions they actually need. This is how you end up with crypto miners running on your production instances.

Real talk: implementing least privilege will break shit initially. Plan for it.

Start Here (Don't Go Full Lockdown on Day 1):

{
  \"Version\": \"2012-10-17\",
  \"Statement\": [
    {
      \"Effect\": \"Allow\",
      \"Action\": [
        \"ec2:Describe*\",
        \"ec2:RunInstances\",
        \"ec2:StopInstances\"
      ],
      \"Resource\": \"*\",
      \"Condition\": {
        \"StringEquals\": {
          \"ec2:InstanceType\": [\"t3.micro\", \"t3.small\", \"t3.medium\"]
        },
        \"StringLike\": {
          \"ec2:subnet-id\": \"subnet-dev-*\"
        }
      }
    }
  ]
}

Use Access Analyzer to see what people actually do:

## This will show you 90% of permissions are never used
aws accessanalyzer create-policy-generation \
    --policy-generation-details \"principalArn=arn:aws:iam::123456789012:role/DeveloperRole\"

Reality check: We tried implementing least privilege and broke half our apps. Plan for 2-3 weeks of "why can't I deploy" tickets while you tune permissions.

MFA: The Simple Fix That Everyone Fucks Up

The MFA Reality: Everyone knows you need MFA. Nobody wants to implement it because it's annoying. Then they get breached and suddenly MFA doesn't seem so bad.

Force MFA or people will skip it:

{
  \"Version\": \"2012-10-17\",
  \"Statement\": [
    {
      \"Effect\": \"Deny\",
      \"Action\": \"*\",
      \"Resource\": \"*\",
      \"Condition\": {
        \"Bool\": {
          \"aws:MultiFactorAuthPresent\": \"false\"
        }
      }
    }
  ]
}

Get hardware keys for admins: Yubikeys cost $50 and prevent 99% of phishing attacks. SMS MFA is fucking useless - SIM swapping exists and is trivial for motivated attackers.

CLI MFA (because developers will whine):

## Get temp creds with MFA
aws sts get-session-token \
    --serial-number arn:aws:iam::123456789012:mfa/username \
    --token-code 123456 \
    --duration-seconds 3600

Stop Using Long-Lived Keys

The Problem: AWS access keys that live forever in config files are like leaving your house keys under the doormat. Use IAM roles instead.

Cross-account role example:

## Assume role instead of hardcoded keys
aws sts assume-role \
    --role-arn arn:aws:iam::PROD:role/ReadOnlyRole \
    --role-session-name emergency-debug

Regular IAM Cleanup (The Boring but Critical Stuff)

Find unused IAM users who've been sitting there for months:

## Generate cred report and find dormant accounts
aws iam generate-credential-report
aws iam get-credential-report --query 'Content' --output text | base64 -d > creds.csv

Use AWS Config rules to automatically flag:

  • IAM users without MFA
  • Unused access keys
  • Root account usage (fire whoever does this)

The brutal truth: Our security audit found 200+ unused IAM users, 47 hardcoded access keys in GitHub repos, and one developer who'd been using root credentials for two years. Cost went up 40% after implementing proper CloudTrail logging, but that's cheaper than getting breached.

AWS Security Hardening FAQ

Q

How long does AWS security hardening actually take?

A

If you're starting from scratch with typical AWS sprawl: Plan for 6 months minimum, probably closer to a year if you want to do it right.

Here's the brutal reality:

  • Week 1: Audit existing mess, discover 500+ security violations
  • Months 2-3: Fix the obviously stupid stuff (root account usage, 0.0.0.0/0 security groups)
  • Months 4-6: Implement proper IAM, break half your applications, fix them
  • Months 7-12: Network hardening, monitoring that doesn't spam you, compliance theater

Add 3 months if you have compliance requirements. Add 6 months if your developers actively resist security changes.

Q

What's the biggest fuckup I see companies make?

A

Going full security lockdown on Day 1. I've watched this disaster movie too many times:

Day 1: Security team implements strict IAM policies across production
Day 2: Half the applications stop working
Day 3: Developers start spinning up shadow AWS accounts to get work done
Day 4: Security team gets blamed for "killing productivity"
Day 5: Management rolls back all security changes

Smart approach: Start with monitoring and logging (won't break anything), then gradually tighten screws. Test everything in dev first. If developers can't do their jobs, they'll find ways around your controls.

Q

How do I not break everything while adding security?

A

Golden rule: Monitor first, block later. Seriously.

  1. Turn on CloudTrail and GuardDuty first - they just watch, can't break anything
  2. Use --dry-run religiously:
## Test security group changes without applying
aws ec2 authorize-security-group-ingress \
    --group-id sg-12345678 \
    --protocol tcp --port 22 \
    --source-group sg-87654321 \
    --dry-run
  1. Deploy at 10am on Tuesday - never Friday, never late at night when you're tired
  2. Have a rollback plan - know exactly how to undo what you're about to do
  3. Start in dev/staging - if it breaks there, it'll break worse in production

Pro tip: The --dry-run flag is your best friend. Use it on everything.

Q

AWS security tools vs third-party - what actually works?

A

Start with AWS native, add third-party only when AWS sucks at something specific.

AWS tools that don't suck:

  • GuardDuty - decent threat detection, lots of false positives initially
  • Security Hub - good aggregation, terrible UI
  • Config - solid compliance checking, expensive at scale

When you need third-party:

  • Your SIEM team already knows Splunk/Elastic and doesn't want to learn new shit
  • You need multi-cloud support (Google/Azure)
  • AWS tools are missing specific compliance features your auditors want

Real talk: AWS security tools work fine for 90% of companies. Don't overcomplicate it unless you have a specific gap that AWS doesn't fill.

Q

What does this security stuff actually cost?

A

Painful truth: Plan for 10-15% of your AWS bill, plus consultant fees, plus internal time.

Reality check from our experience:

  • GuardDuty, Security Hub, Config: Added 12% to our AWS bill
  • Security consultant: $150K over 6 months
  • Internal engineering time: ~6 months of one senior engineer
  • Compliance audit prep: Another $50K in consultant fees

But consider the alternative: One decent breach will cost you $2-5 million in remediation, legal fees, and lost business. We're cheap compared to that.

Q

What about compliance bullshit?

A

If you're dealing with compliance auditors: SOC 2, HIPAA, PCI DSS - they all want the same basic stuff. Encryption, access logs, network segmentation, regular access reviews.

AWS Artifact has templates and reports that make auditors happy. Security Hub can generate compliance dashboards that look impressive in PowerPoint.

Q

How do I know if this security stuff is actually working?

A

Simple metrics that matter:

  • Time to detect weird shit happening (aim for under 10 minutes)
  • Number of "oh fuck" incidents per month (should go down)
  • False positive rate on alerts (should be under 20% or you'll ignore them)
  • How fast you can isolate a compromised instance (under 5 minutes with automation)

Reality check: If your monitoring system cries wolf every 20 minutes, nobody will pay attention when there's a real attack.

Q

What do I do when someone gets pwned?

A

The 3am incident response checklist:

  1. Don't panic (easier said than done)
  2. Isolate the compromised shit:
## Quarantine instance immediately
aws ec2 modify-instance-attribute \
    --instance-id i-compromised123 \
    --groups sg-quarantine
  1. Revoke all related credentials
  2. Check CloudTrail logs to see what the attacker did
  3. Fix the root cause after you contain the damage

Keep pre-written playbooks because at 3am your brain doesn't work.

Network Security: Stop Lateral Movement Before It Kills You

AWS Network Security

Most attacks start with one compromised instance and spread through shitty network configs. I've seen attackers pivot from a compromised web server to production databases in under 10 minutes because someone used default VPC settings and called it "good enough."

The brutal reality: once someone's inside your network, they own everything unless you've properly segmented it. And most companies haven't.

Subnet Segmentation: The Basics Everyone Gets Wrong

The Default VPC Problem: AWS gives you one big subnet where everything talks to everything. This is like putting your cash register, your office computers, and your Wi-Fi guest network on the same network segment. It's fucking stupid.

How to Actually Design Network Zones:

Public DMZ: Only load balancers and NAT gateways go here. If you put app servers in public subnets, you deserve what happens next.

Private Application Tier: Your app servers live here. No direct internet access. They talk to the DMZ through load balancers only.

Database Tier: Isolated subnets. Only accepts connections from application tier. If your database is directly accessible from the internet, fire your architect.

## Create properly segmented VPC
aws ec2 create-vpc \
    --cidr-block 10.0.0.0/16 \
    --enable-dns-hostnames \
    --enable-dns-support

## DMZ subnet - internet-facing shit only
aws ec2 create-subnet \
    --vpc-id vpc-12345678 \
    --cidr-block 10.0.1.0/24 \
    --availability-zone us-east-1a

## App subnet - private, no internet access
aws ec2 create-subnet \
    --vpc-id vpc-12345678 \
    --cidr-block 10.0.10.0/24 \
    --availability-zone us-east-1a

## DB subnet - most restricted
aws ec2 create-subnet \
    --vpc-id vpc-12345678 \
    --cidr-block 10.0.20.0/24 \
    --availability-zone us-east-1a

Security Groups: Where 0.0.0.0/0 Goes to Die

The 0.0.0.0/0 Problem: This CIDR block means "the entire fucking internet can access this port." If I audit your AWS account and find 0.0.0.0/0 in your security groups, we're having a serious conversation about your life choices.

Security Groups That Don't Suck:

Only allow what you actually need:

## Web servers only accept load balancer traffic
aws ec2 authorize-security-group-ingress \
    --group-id sg-web123 \
    --protocol tcp --port 80 \
    --source-group sg-alb456

## Databases only accept app server connections
aws ec2 authorize-security-group-ingress \
    --group-id sg-db789 \
    --protocol tcp --port 5432 \
    --source-group sg-web123

## SSH only from specific IP ranges (not the entire internet)
aws ec2 authorize-security-group-ingress \
    --group-id sg-mgmt101 \
    --protocol tcp --port 22 \
    --cidr 203.0.113.0/24  # Your office IP, not 0.0.0.0/0

Pro tip: If you see 0.0.0.0/0 in a production security group, fix it immediately or expect to be breached.

The Essential Monitoring You Actually Need

AWS Security Monitoring

Monitoring - because if you can't see the attack happening, you can't stop it

Turn on VPC Flow Logs:

## Enable flow logs to see who's talking to what
aws ec2 create-flow-logs \
    --resource-type VPC \
    --resource-ids vpc-12345678 \
    --traffic-type ALL \
    --log-destination-type cloud-watch-logs \
    --log-group-name VPCFlowLogs

Enable GuardDuty (it's literally one command):

aws guardduty create-detector --enable

Set up AWS WAF if you have web apps:

The reality: We caught an attacker lateral movement attempt because VPC Flow Logs showed unusual database connections from compromised web servers. Without network segmentation, they would have owned our entire production environment in 10 minutes. With proper segmentation, they were stuck in the DMZ talking to nothing.

Time investment: 2-3 weeks to implement properly, 6 months to tune monitoring so it doesn't spam you with false positives.

Resources That Actually Help (Unlike AWS Documentation)