Kubernetes security policies are like having that overprotective parent who locked you in your room until you were 18. Since v1.25, Pod Security Standards replaced Pod Security Policies and somehow made things worse - now they block half your shit by default instead of letting you opt into the pain. The Pod Security admission controller enforces these standards automatically, making deployment failures more frequent than successful ones.
Three Security Systems That Will Eat Your Entire Afternoon
1. Pod Security Standards (PSS) - The New Sheriff in Town
Pod Security Standards define three security profiles that determine what privileges your pods can have. Since Kubernetes v1.25, these have replaced Pod Security Policies completely. The PSS implementation guide covers the technical details, while the cluster-level PSS tutorial shows how to apply them across your entire cluster. AWS EKS PSS implementation provides cloud-specific guidance.
The Three Profiles That Matter:
- Privileged: Unrestricted policy, anything goes (dangerous but sometimes necessary)
- Baseline: Minimally restrictive, prevents known privilege escalations
- Restricted: Heavily restricted, follows pod hardening best practices
What actually gets blocked:
## This pod will be REJECTED under baseline/restricted profiles
apiVersion: v1
kind: Pod
metadata:
name: privileged-pod
spec:
containers:
- name: app
image: nginx
securityContext:
privileged: true # BLOCKED: Privileged containers
runAsUser: 0 # BLOCKED: Running as root
allowPrivilegeEscalation: true # BLOCKED: Can gain privileges
Common violation errors you'll see:
violates PodSecurity "baseline:latest": privileged container is not allowed
violates PodSecurity "restricted:latest": runAsNonRoot != true
violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false
Quick diagnostic command:
## Check what security profile is enforced in your namespace
kubectl get namespace your-namespace -o yaml | grep -A 5 pod-security
2. Role-Based Access Control (RBAC) - The Permission Police
RBAC controls who can do what in your cluster. When RBAC blocks you, it's usually because someone tried to implement "least privilege" without understanding what privileges you actually need. The RBAC good practices guide outlines proper implementation, while advanced RBAC best practices covers complex scenarios. Google Cloud's RBAC guide provides cloud-specific implementation details, and the Dynatrace RBAC security guide explains security implications.
What RBAC violations look like:
## This error means your ServiceAccount lacks permissions
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:my-app" cannot create resource "pods"
## This means the user/service lacks the specific verb permission
Error from server (Forbidden): secrets is forbidden: User "developer" cannot get resource "secrets" in API group ""
The debugging process that actually works:
## Check what a user/service account can actually do
kubectl auth can-i create pods --as=system:serviceaccount:default:my-app
kubectl auth can-i get secrets --as=system:serviceaccount:default:my-app
## See all permissions for a service account
kubectl auth can-i --list --as=system:serviceaccount:default:my-app
## Find which ClusterRole or Role grants specific permissions
kubectl get clusterrolebindings -o json | jq -r '.items[] | select(.subjects[]?.name=="my-app") | .metadata.name'
3. Network Policies - The Communication Killers
Network Policies act like firewalls between pods. Once you apply any NetworkPolicy to a namespace, it becomes "default deny" and you must explicitly allow every connection. The NetworkPolicy debugging guide covers troubleshooting techniques, while the advanced debugging series handles complex scenarios. AWS EKS network policy troubleshooting provides cloud-specific debugging, and the Plural security guide offers comprehensive policy management strategies. Tigera's networking debugging guide explains the underlying networking concepts.
The silent killer scenario:
Your pods are running fine, but services return connection refused or timeouts. No obvious error messages in logs.
## Check if NetworkPolicies are blocking traffic
kubectl get networkpolicies --all-namespaces
kubectl describe networkpolicy your-policy -n your-namespace
## Test connectivity between pods
kubectl run debug-pod --image=busybox --rm -it -- sh
## Inside pod: telnet your-service 80
What breaks when NetworkPolicies are applied:
- Health check probes fail (kubelet can't reach your pods)
- Service discovery stops working (DNS queries blocked)
- Inter-service communication fails (API calls timeout)
- Ingress traffic gets blocked (external requests can't reach pods)
Real-World Security Violation Scenarios
Scenario 1: The Spring Boot Weekend Killer
What happened: Worked perfectly fine in dev, then production decided to shit the bed Monday morning with some PodSecurity violation about running as root. Turns out our security team "hardened" prod over the weekend without telling anyone. Thanks for the heads up, security team. Really appreciate finding out via production alerts instead of, you know, a fucking email.
Root cause: The Spring Boot container was running as root because whoever wrote the Dockerfile copied some tutorial and never bothered to add a proper user. Dev cluster was still set to privileged
so it worked there, but prod got upgraded to restricted
and suddenly everything broke.
The fix:
## In your Dockerfile, add this before ENTRYPOINT
RUN addgroup -g 1001 -S appgroup && adduser -u 1001 -S appuser -G appgroup
USER 1001
## Or in your pod spec:
apiVersion: v1
kind: Pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
containers:
- name: spring-app
image: my-spring-app
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
Scenario 2: The NetworkPolicy Black Hole That Ate Production
What happened: Someone applied a NetworkPolicy at 2 PM and broke everything. Took us three hours to figure out it was network shit because nothing was talking to anything else. Monitoring still showed green because health checks were cached, but every single API call was timing out.
Root cause: The NetworkPolicy was written by someone who'd never actually run a service mesh in production. It blocked everything including kubelet health checks, DNS resolution, and service-to-service communication. Even the ingress controller couldn't reach the pods.
The debugging process:
## 1. Verify NetworkPolicy exists and is blocking traffic
kubectl get networkpolicy -n production
## 2. Check if pods have any ingress rules
kubectl describe networkpolicy default-deny -n production
## 3. Test connectivity from inside the cluster
kubectl run nettest --image=nicolaka/netshoot --rm -it -- bash
## Inside container:
curl https://httpbin.org/status/200
The fix that actually worked:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-access
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from other pods in the same namespace
- from:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 8080
# CRITICAL: Allow health checks from kubelet
- from: [] # Allow from any source
ports:
- protocol: TCP
port: 8080
Scenario 3: The RBAC Permission Maze (AKA Why Our Deploys Broke for a Week)
What happened: After a security audit, some expensive consultant convinced management to implement "least privilege everywhere." Within 24 hours, every single CI/CD pipeline started failing with secrets is forbidden: User "system:serviceaccount:ci-cd:pipeline-sa" cannot get resource "secrets" in API group ""
. Deployments stopped working and nobody could figure out why because the consultant fucked off without documenting any of the changes.
Root cause: The consultant removed the overly broad ClusterRoleBinding that every ServiceAccount was using, then tried to implement "proper" RBAC by giving each service only the permissions it "should" need. Problem is, they had no clue what permissions our actual workloads required - like how our CI pipeline needed kubectl patch deployment
to update image tags, or kubectl get configmap/build-config
to read build settings - so they just guessed based on some best practices blog post they found.
The diagnostic journey:
## Check what the service account can actually do
kubectl auth can-i get secrets --as=system:serviceaccount:ci-cd:pipeline-sa
## Find existing permissions
kubectl get rolebindings -n ci-cd -o json | jq -r '.items[] | select(.subjects[]?.name=="pipeline-sa")'
## Check cluster-wide permissions
kubectl get clusterrolebindings -o json | jq -r '.items[] | select(.subjects[]?.name=="pipeline-sa")'
The solution:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ci-cd
name: pipeline-secrets-reader
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"]
# Limit to specific secrets instead of all
resourceNames: ["db-credentials", "api-keys", "registry-secret"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pipeline-secrets-binding
namespace: ci-cd
subjects:
- kind: ServiceAccount
name: pipeline-sa
namespace: ci-cd
roleRef:
kind: Role
name: pipeline-secrets-reader
apiGroup: rbac.authorization.k8s.io
The Security Policy Violation Troubleshooting Workflow
Step 1: Identify the Policy Type (30 seconds)
## Check pod events for security violations
kubectl describe pod your-pod | grep -A 10 Events
## Look for these violation types:
## - "violates PodSecurity" = Pod Security Standards
## - "Forbidden" = RBAC issue
## - Connection timeouts/refused = NetworkPolicy
Step 2: Gather Context (2 minutes)
## Check namespace-level security policies
kubectl get namespace your-namespace -o yaml | grep -A 10 labels
## Check for NetworkPolicies that might be blocking traffic
kubectl get networkpolicies -n your-namespace
## Verify RBAC permissions for the ServiceAccount
kubectl auth can-i --list --as=system:serviceaccount:your-namespace:your-sa
Step 3: Apply the Fix (5-30 minutes)
Based on the violation type:
- PSS violations: Modify pod security context or request profile exemption
- RBAC denials: Add specific permissions or use a different ServiceAccount
- NetworkPolicy blocks: Add ingress/egress rules or test without policies temporarily
Here's the reality nobody tells you: security violations cascade like dominoes. Fix the Pod Security Standard issue and boom - RBAC error. Fix that and congratulations, now NetworkPolicies are blocking everything. Each fix just reveals the next layer of broken.
These three security layers combine to create a perfect storm of Saturday night pages. Your pod might pass Pod Security Standards, fail due to some RBAC bullshit, and then when you finally get it running, NetworkPolicies kill all the traffic. It's like security whack-a-mole, except each mole costs you 2 hours of debugging.
Next section covers the actual debugging process - because when production's on fire at 3 AM, you need something that finds the root cause instead of making you guess for three fucking hours.