Currently viewing the human version
Switch to AI version

Why Your K8s Cluster is Probably a Security Nightmare

K8s security out of the box is a fucking joke. I'm talking `--anonymous-auth=true` enabled by default on kubelet until v1.10, etcd running without TLS encryption, API server bound to 0.0.0.0:8080 with zero authentication. The defaults scream "we care more about your developer experience than not getting breached."

kube-bench crawls through your cluster's guts and checks about 100 different ways your security is fucked. Learned this the hard way after spending 6 hours manually checking configs when a PCI audit flagged our payment processing cluster as "fundamentally insecure." The auditor literally said our k8s v1.8 cluster was "designed for attackers" because we had the default kubelet config from the getting started guide.

Kubernetes Architecture Overview

What Actually Gets Checked

Here's what kube-bench catches that will bite you in production:

API Server Shit That'll Get You Owned:

kubelet Fuckups That Keep Me Up at Night:

etcd - The Crown Jewel Everyone Forgets to Secure:

Node-Level Ways to Get Pwned:

This motherfucker runs about 100 checks total. Takes maybe 30 seconds unless you're on a potato cluster. About 40% of checks fail on EKS because Amazon won't let you see master node configs (learned this after 3 hours of debugging "permission denied" errors).

The CIS Kubernetes Benchmark is basically a 300-page PDF of paranoid security requirements that auditors cum over.

How kube-bench Stacks Up Against Other Security Tools (Brutally Honest Take)

Tool

What It Actually Does

When to Use It

What Sucks About It

kube-bench

Tells you exactly how fucked your cluster config is

Run this first or you're flying blind

Won't fix jack shit automatically, 60% failure rate on EKS because AWS locks you out

kube-hunter

Tries to actually hack your cluster like a real attacker

When you want to terrify yourself

Takes 45 minutes to run, crashed our staging cluster twice, CISO banned it after it found RCE

Kubescape

Does vulnerability scanning, config checks, and makes your coffee

If you love complex tools that do everything poorly

500MB binary, 20-page config file, crashes when it sees non-standard workloads

Falco

Watches syscalls and yells when sketchy shit happens

Production threat detection if you hate sleep

Kernel module breaks on every OS update, 1000 false positives per day until tuned

Actually Running This Bastard (Without Losing Your Sanity)

There are like 5 different ways to run kube-bench, but most of them suck. Here's the one way that actually works without making you want to quit tech:

Kubernetes API Server Component

Kubernetes Job (The Only Way That Doesn't Suck)

Copy this YAML, apply it, pray to whatever deity you believe in:

apiVersion: batch/v1
kind: Job
metadata:
  name: kube-bench
spec:
  template:
    spec:
      hostPID: true
      nodeSelector:
        kubernetes.io/os: linux
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: kube-bench
        image: aquasec/kube-bench:latest
        command: ["kube-bench"]
        volumeMounts:
        - name: var-lib-etcd
          mountPath: /var/lib/etcd
          readOnly: true
        - name: var-lib-kubelet
          mountPath: /var/lib/kubelet
          readOnly: true
      volumes:
      - name: var-lib-etcd
        hostPath:
          path: "/var/lib/etcd"
      - name: var-lib-kubelet
        hostPath:
          path: "/var/lib/kubelet"

Run with: kubectl apply -f kube-bench-job.yaml && kubectl logs job/kube-bench

Shit That Will Ruin Your Day:

Direct Binary Installation (For Air-Gapped Hell)

If you're stuck in an air-gapped environment or need to run this directly on nodes, grab the latest release from GitHub:

## Download the binary (check releases page for latest version)
curl -L https://github.com/aquasecurity/kube-bench/releases/download/v0.12.0/kube-bench_0.12.0_linux_amd64.tar.gz -o kube-bench.tar.gz
tar xzf kube-bench.tar.gz
sudo mv kube-bench /usr/local/bin/
chmod +x /usr/local/bin/kube-bench

## Run it
sudo kube-bench --config-dir /etc/kube-bench/cfg

Error Messages That Made Me Rage-Quit:

Kubernetes Scheduler Workflow

Cloud Platform Reality Check

EKS (Amazon): 47 out of 100 checks fail by design because Bezos thinks you can't handle master node access. Wasted 4 hours trying to figure out why etcd checks returned "no such file or directory" before someone told me AWS manages etcd in a black box. Use `--benchmark eks` or enjoy explaining to auditors why everything's broken.

AKS (Azure): 39 checks fail because Microsoft's Container-Optimized OS moves configs to /etc/kubernetes/azure.json instead of standard paths. Kubelet configs get 755 permissions by default, which kube-bench flags as insecure (it's right).

GKE (Google): Surprisingly doesn't suck. Only 12 checks fail on standard GKE, mostly around file permissions. Autopilot is a different beast - 78% failure rate because Google locked down everything and you can't change it.

Self-managed clusters: All checks work but you get to deal with actually fixing everything it finds. Which is... a lot.

Real CI/CD Integration

If you want this in your pipeline (and you should), output JSON and fail on critical findings:

kube-bench --json | jq -r '.Totals.total_fail' | xargs test 0 -eq

For AWS Security Hub integration, use `--asff` flag but expect to spend time mapping finding types to your compliance framework.

Questions People Actually Ask

Q

Does this actually work on cloud clusters or just shit the bed everywhere?

A

EKS fails 47 checks, AKS fails 39, GKE fails 12.

That's not bugs, that's cloud providers treating you like a child who can't be trusted with root access. Spent an entire afternoon debugging EACCES: permission denied, open '/var/lib/etcd/member/snap/db' on an EKS v1.24 cluster before my colleague laughed and said "dude, that's AWS managed etcd, you can't touch it." Same error on every EKS version since v1.20.

Q

How often should I torture myself with this thing?

A

Every time you touch cluster config, and definitely in CI/CD to catch the obvious shit before it explodes in prod. Weekly runs are for compliance theater

  • daily is just masochism. I run it after any RBAC changes, admission controller updates, or when someone claims they "just made a small security tweak."
Q

Why are there like 47 different benchmark versions?

A

Because CIS moves slower than a dead turtle. K8s 1.28 might use CIS benchmark 1.7.0 while K8s 1.25 uses 1.6.1. Auto-detection works most of the time, but when you get unknown test file 1.4.2.yaml errors, you need to manually specify --benchmark cis-1.23 because the tool guessed wrong. Spent 30 minutes last month debugging why kube-bench v0.12.0 kept using the 1.6.1 benchmark on our shiny new K8s 1.28 cluster

  • turns out the version detection logic failed and defaulted to some ancient benchmark.
Q

Can this thing fix anything or just bitch about everything?

A

Hell no, it just points and laughs at your broken config. You get a remediation guide that basically says "edit this file and restart kubelet." Which is probably for the best

  • imagine if it automatically disabled anonymous auth and broke your monitoring stack at 2am.
Q

How do I stop this thing from crying wolf constantly?

A

Edit the test config YAML files in /opt/kube-bench/cfg and add skip: true to tests that don't apply. First thing I did was skip all etcd tests (1.1.1 through 1.1.20) on EKS because AWS hides that shit. Also skipped 4.2.1 (kubelet config permissions) on GKE because Google's COS puts configs in /home/kubernetes/kubelet-config.yaml instead of the expected path.

Q

What output format doesn't make me want to kill myself in CI/CD?

A

--json all the way. Gives you structured data with actual fail counts you can parse with jq. Text output is human-readable but useless for scripting. ASFF format is only worth it if you're balls deep in AWS Security Hub and love vendor lock-in.

Q

Will this thing tank my cluster performance?

A

Nah, it's basically a glorified file reader. Takes 20-30 seconds to run all checks, uses about 50MB of RAM, doesn't touch your workloads. It's not intercepting traffic or doing anything fancy

  • just reading files and running ps aux to check process flags.
Q

How do I hack this thing to check my weird corporate bullshit?

A

Edit the YAML test files in /opt/kube-bench/cfg/. Each test has commands, tests, and remediation sections. I added a custom check for our internal CA requirement by copying an existing test and changing the grep pattern. Pro tip: test your regex on a real cluster first or you'll get mysterious "test failed to execute" errors.

Kubernetes CNI Plugin Architecture

Q

How do I make CI/CD shit itself when security checks fail?

A

Don't just check exit codes or you'll fail on every EKS cluster (47 failures is normal). Parse the JSON and set intelligent thresholds:

## Fail only on critical shit, not normal cloud provider limitations
FAILURES=$(kube-bench --json | jq -r '.Totals.total_fail')
if [ "$FAILURES" -gt 15 ]; then
  echo "Too many failures: $FAILURES"
  exit 1
fi

Took me 2 weeks to dial in the threshold. Started at 0 failures (build broke constantly), tried 5 (still broke on normal stuff), settled on 15 because that catches real problems without crying about AWS being overprotective.

Q

Is this made by the same people who do Trivy?

A

Yep, Aqua Security makes both. Trivy finds CVEs in your container images, kube-bench finds fuckups in your cluster config. Different problems, different tools. Use both or you're only seeing half the security picture.

Q

Why does this thing need root access to my entire cluster?

A

It has to read kubelet command line args from /proc/$PID/cmdline, config files from /etc/kubernetes/, and check file permissions on sensitive directories. hostPID: true lets it see processes outside the container, host mounts give it access to the real filesystem. Security teams have panic attacks seeing this YAML, but it's the only way to check actual cluster security.

Q

Do they keep this shit up to date?

A

They're usually 2-3 months behind K8s releases, and CIS benchmarks lag even further. K8s 1.28 might use tests designed for 1.26. Check the release notes - if you're running bleeding edge K8s, some tests might be outdated or missing entirely.

Cloud Controller Manager Architecture

Q

Why does this motherfucker keep dying with "permission denied"?

A

Missing privileged: true in the security

Context, or your cluster uses non-standard paths.

Spent 4 hours debugging EACCES: permission denied, open '/var/lib/kubelet/config.yaml' on GKE Autopilot before realizing Google locks down host filesystem access. Also happens when the job runs on nodes that don't have the expected directory structure

  • looking at you, Fargate.

Why Companies Pay Me to Run This Thing

Kubernetes Security Compliance Framework

Companies use kube-bench because auditors demand it and because it catches the dumb shit that gets you fired when there's a breach. It's not sexy work, but it keeps the lights on and the lawyers happy.

Checkbox Checking for Fun and Profit

Regulated industries worship at the altar of CIS compliance. kube-bench spits out the magic reports auditors cream themselves over:

SOC 2 audits: Auditors see "automated security controls" and immediately get hard. Running kube-bench weekly shows you have "continuous monitoring" which sounds way more impressive than it is.

PCI DSS: Payment card industry auditors want proof your infrastructure doesn't suck. CIS compliance + kube-bench reports = checkbox checked, auditor happy.

HIPAA: Healthcare companies wave kube-bench reports at auditors to prove they're "implementing appropriate safeguards for patient data infrastructure."

FedRAMP/Government: Federal auditors demand NIST compliance, which has enough overlap with CIS that kube-bench reports satisfy most requirements.

Kubernetes Controller Manager

Horror Story #1 - The Auditor Who Didn't Understand Cloud:
Auditor spent 3 hours grilling me about 47 "FAIL" results in our EKS kube-bench report. Dude was frantically taking notes about "critical control plane vulnerabilities" and "unauthorized etcd access." I'm trying to explain that AWS manages the master nodes and we literally can't access etcd even if we wanted to. He keeps pointing at test 1.1.12 (etcd data directory permissions) and asking why we haven't "remediated this critical finding." Finally brought in a Solutions Architect to explain that AWS doesn't give customers etcd access. Auditor's response: "So Amazon could be reading your encrypted secrets?" Face, meet palm.

Horror Story #2 - The Security Fix That Broke Everything:
Decided to "fix" all our kube-bench failures before the next audit. Set `--anonymous-auth=false` on kubelet to pass test 4.2.1. Deployed to prod on a Friday at 4:30pm (I know, I fucking know). Got paged at 2:47am because all our Grafana dashboards were blank - every single panel showing "No data." Prometheus couldn't scrape kubelet metrics on port 10255 because it was using anonymous auth - had been for 2 years. Spent until 6am rolling back the change, explaining to the very pissed on-call manager why "fixing security" broke monitoring, and writing the 3-page incident report. The kicker? That kubelet flag was the only way our monitoring worked, and nobody documented it because "anonymous auth is fine for internal metrics." Lost 4 hours of production monitoring because I trusted a security checklist over understanding what the fuck the config actually did.

The AWS Security Hub integration is useful if you're already sending compliance data there. Otherwise it's just another dashboard to ignore.

Container Runtime Architecture

Real DevSecOps Integration

CI/CD pipelines: Run kube-bench on staging clusters before promoting to production. Parse the JSON output to fail builds on critical findings. Most teams only fail on high-severity issues because fixing everything is impossible.

Infrastructure validation: Use it to check Terraform/Helm deployments against CIS standards. Catches configuration mistakes before they hit production, but doesn't replace proper security review.

Scheduled monitoring: Weekly runs are common to catch configuration drift. Daily is overkill unless you're in a regulated industry with paranoid auditors who want reports every time someone breathes near the cluster.

Reality of Running at Scale

Multi-cluster hell: If you have dozens of clusters, you need automation to deploy and collect results. Expect different cloud providers to have different failure patterns.

Resource overhead: Uses almost no resources - maybe 50MB RAM and runs for 30 seconds. The network calls to check external endpoints sometimes time out on slow networks.

Parallel scaling: You can run multiple instances simultaneously but watch out for rate limiting if you're hitting the same etcd or API server endpoints.

Support and Maintenance

Open source support: Active community on GitHub for issues and feature requests. Maintainers are responsive but don't expect 24/7 support for free.

Commercial support: Aqua Security offers paid support if you need SLAs and guaranteed response times. Most companies don't need this unless they're risk-averse enterprises.

Updates: Tool gets updated regularly for new Kubernetes and CIS benchmark versions. Usually lags a few months behind but catches up eventually.

Enterprise features: Aqua's commercial platform adds remediation guidance and integration with other security tools. Useful if you're already paying for their other products.

The AWS Security Hub integration is useful if you're already sending compliance data there.

Related Tools & Recommendations

tool
Recommended

Migration vers Kubernetes

Ce que tu dois savoir avant de migrer vers K8s

Kubernetes
/fr:tool/kubernetes/migration-vers-kubernetes
66%
alternatives
Recommended

Kubernetes 替代方案:轻量级 vs 企业级选择指南

当你的团队被 K8s 复杂性搞得焦头烂额时,这些工具可能更适合你

Kubernetes
/zh:alternatives/kubernetes/lightweight-vs-enterprise
66%
tool
Recommended

Kubernetes - Le Truc que Google a Lâché dans la Nature

Google a opensourcé son truc pour gérer plein de containers, maintenant tout le monde s'en sert

Kubernetes
/fr:tool/kubernetes/overview
66%
tool
Recommended

Docker for Node.js - The Setup That Doesn't Suck

compatible with Node.js

Node.js
/tool/node.js/docker-containerization
66%
howto
Recommended

Complete Guide to Setting Up Microservices with Docker and Kubernetes (2025)

Split Your Monolith Into Services That Will Break in New and Exciting Ways

Docker
/howto/setup-microservices-docker-kubernetes/complete-setup-guide
66%
tool
Recommended

Docker Distribution (Registry) - 본격 컨테이너 이미지 저장소 구축하기

OCI 표준 준수하는 오픈소스 container registry로 이미지 배포 파이프라인 완전 장악

Docker Distribution
/ko:tool/docker-registry/overview
66%
tool
Recommended

Shopify Polaris - Stop Building the Same Components Over and Over

competes with Shopify Polaris

Shopify Polaris
/tool/shopify-polaris/overview
60%
tool
Recommended

GitHub Actions - CI/CD That Actually Lives Inside GitHub

integrates with GitHub Actions

GitHub Actions
/tool/github-actions/overview
60%
integration
Recommended

GitHub Actions + AWS Lambda: Deploy Shit Without Desktop Boomer Energy

AWS finally stopped breaking lambda deployments every 3 weeks

GitHub Actions
/brainrot:integration/github-actions-aws/serverless-lambda-deployment-automation
60%
review
Recommended

🔧 GitHub Actions vs Jenkins

GitHub Actions vs Jenkins - 실제 사용기

GitHub Actions
/ko:review/compare/github-actions/jenkins/performance-focused-review
60%
tool
Recommended

Jenkins - The CI/CD Server That Won't Die

integrates with Jenkins

Jenkins
/tool/jenkins/overview
60%
integration
Recommended

Jenkins Docker 통합: CI/CD Pipeline 구축 완전 가이드

한국 개발자를 위한 Jenkins + Docker 자동화 시스템 구축 실무 가이드 - 2025년 기준으로 작성된 제대로 동작하는 통합 방법

Jenkins
/ko:integration/jenkins-docker/pipeline-setup
60%
tool
Recommended

Jenkins - 日本発のCI/CDオートメーションサーバー

プラグインが2000個以上とかマジで管理不能だけど、なんでも実現できちゃう悪魔的なCI/CDプラットフォーム

Jenkins
/ja:tool/jenkins/overview
60%
tool
Recommended

GitLab CI/CD - The Platform That Does Everything (Usually)

CI/CD, security scanning, and project management in one place - when it works, it's great

GitLab CI/CD
/tool/gitlab-ci-cd/overview
60%
integration
Recommended

Stop Fighting Your CI/CD Tools - Make Them Work Together

When Jenkins, GitHub Actions, and GitLab CI All Live in Your Company

GitHub Actions
/integration/github-actions-jenkins-gitlab-ci/hybrid-multi-platform-orchestration
60%
howto
Similar content

Lock Down Your K8s Cluster Before It Costs You $50k

Stop getting paged at 3am because someone turned your cluster into a bitcoin miner

Kubernetes
/howto/setup-kubernetes-production-security/hardening-production-clusters
58%
troubleshoot
Similar content

Kubernetes Security Policies Are Blocking Everything - Here's How to Actually Fix It

Learn to diagnose and resolve Kubernetes security policy violations, including PodSecurity and RBAC errors. Get quick triage tips and lasting fixes to unblock y

Kubernetes
/troubleshoot/kubernetes-security-policy-violations/security-policy-violations
57%
tool
Similar content

GKE Security That Actually Stops Attacks

Secure your GKE clusters without the security theater bullshit. Real configs that actually work when attackers hit your production cluster during lunch break.

Google Kubernetes Engine (GKE)
/tool/google-kubernetes-engine/security-best-practices
53%
tool
Similar content

Pod Security Standards - Three Security Levels Instead of Policy Hell

Replace the clusterfuck that was Pod Security Policies with simple security profiles

Pod Security Standards
/tool/pod-security-standards/overview
51%
howto
Similar content

Complete Kubernetes Security Monitoring Stack Setup - Zero to Production

Learn to build a complete Kubernetes security monitoring stack from zero to production. Discover why commercial tools fail, get a step-by-step implementation gu

Kubernetes
/howto/setup-kubernetes-security-monitoring/complete-security-monitoring-stack
49%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization