Kubernetes Security Policies Are Blocking Everything

Understanding Kubernetes Security Policy Violations (What's Actually Blocking Your Deployments)

Kubernetes Security Architecture

Kubernetes security policies are like having that overprotective parent who locked you in your room until you were 18. Since v1.25, Pod Security Standards replaced Pod Security Policies and somehow made things worse - now they block half your shit by default instead of letting you opt into the pain. The Pod Security admission controller enforces these standards automatically, making deployment failures more frequent than successful ones.

Three Security Systems That Will Eat Your Entire Afternoon

1. Pod Security Standards (PSS) - The New Sheriff in Town

Pod Security Standards define three security profiles that determine what privileges your pods can have. Since Kubernetes v1.25, these have replaced Pod Security Policies completely. The PSS implementation guide covers the technical details, while the cluster-level PSS tutorial shows how to apply them across your entire cluster. AWS EKS PSS implementation provides cloud-specific guidance.

The Three Profiles That Matter:

Privileged: Unrestricted policy, anything goes (dangerous but sometimes necessary)
Baseline: Minimally restrictive, prevents known privilege escalations
Restricted: Heavily restricted, follows pod hardening best practices

What actually gets blocked:

## This pod will be REJECTED under baseline/restricted profiles
apiVersion: v1
kind: Pod
metadata:
  name: privileged-pod
spec:
  containers:
  - name: app
    image: nginx
    securityContext:
      privileged: true        # BLOCKED: Privileged containers
      runAsUser: 0           # BLOCKED: Running as root
      allowPrivilegeEscalation: true  # BLOCKED: Can gain privileges

Common violation errors you'll see:

violates PodSecurity "baseline:latest": privileged container is not allowed
violates PodSecurity "restricted:latest": runAsNonRoot != true
violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false

Quick diagnostic command:

## Check what security profile is enforced in your namespace
kubectl get namespace your-namespace -o yaml | grep -A 5 pod-security

2. Role-Based Access Control (RBAC) - The Permission Police

RBAC Components

RBAC controls who can do what in your cluster. When RBAC blocks you, it's usually because someone tried to implement "least privilege" without understanding what privileges you actually need. The RBAC good practices guide outlines proper implementation, while advanced RBAC best practices covers complex scenarios. Google Cloud's RBAC guide provides cloud-specific implementation details, and the Dynatrace RBAC security guide explains security implications.

What RBAC violations look like:

## This error means your ServiceAccount lacks permissions
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:my-app" cannot create resource "pods"

## This means the user/service lacks the specific verb permission
Error from server (Forbidden): secrets is forbidden: User "developer" cannot get resource "secrets" in API group ""

The debugging process that actually works:

## Check what a user/service account can actually do
kubectl auth can-i create pods --as=system:serviceaccount:default:my-app
kubectl auth can-i get secrets --as=system:serviceaccount:default:my-app

## See all permissions for a service account
kubectl auth can-i --list --as=system:serviceaccount:default:my-app

## Find which ClusterRole or Role grants specific permissions
kubectl get clusterrolebindings -o json | jq -r '.items[] | select(.subjects[]?.name=="my-app") | .metadata.name'

3. Network Policies - The Communication Killers

NetworkPolicy

Network Policies act like firewalls between pods. Once you apply any NetworkPolicy to a namespace, it becomes "default deny" and you must explicitly allow every connection. The NetworkPolicy debugging guide covers troubleshooting techniques, while the advanced debugging series handles complex scenarios. AWS EKS network policy troubleshooting provides cloud-specific debugging, and the Plural security guide offers comprehensive policy management strategies. Tigera's networking debugging guide explains the underlying networking concepts.

The silent killer scenario:
Your pods are running fine, but services return connection refused or timeouts. No obvious error messages in logs.

## Check if NetworkPolicies are blocking traffic
kubectl get networkpolicies --all-namespaces
kubectl describe networkpolicy your-policy -n your-namespace

## Test connectivity between pods
kubectl run debug-pod --image=busybox --rm -it -- sh
## Inside pod: telnet your-service 80

What breaks when NetworkPolicies are applied:

Health check probes fail (kubelet can't reach your pods)
Service discovery stops working (DNS queries blocked)
Inter-service communication fails (API calls timeout)
Ingress traffic gets blocked (external requests can't reach pods)

Pod Security Standards

Real-World Security Violation Scenarios

Scenario 1: The Spring Boot Weekend Killer

What happened: Worked perfectly fine in dev, then production decided to shit the bed Monday morning with some PodSecurity violation about running as root. Turns out our security team "hardened" prod over the weekend without telling anyone. Thanks for the heads up, security team. Really appreciate finding out via production alerts instead of, you know, a fucking email.

Root cause: The Spring Boot container was running as root because whoever wrote the Dockerfile copied some tutorial and never bothered to add a proper user. Dev cluster was still set to privileged so it worked there, but prod got upgraded to restricted and suddenly everything broke.

The fix:

## In your Dockerfile, add this before ENTRYPOINT
RUN addgroup -g 1001 -S appgroup && adduser -u 1001 -S appuser -G appgroup
USER 1001

## Or in your pod spec:
apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    runAsGroup: 1001
  containers:
  - name: spring-app
    image: my-spring-app
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL

Scenario 2: The NetworkPolicy Black Hole That Ate Production

What happened: Someone applied a NetworkPolicy at 2 PM and broke everything. Took us three hours to figure out it was network shit because nothing was talking to anything else. Monitoring still showed green because health checks were cached, but every single API call was timing out.

Root cause: The NetworkPolicy was written by someone who'd never actually run a service mesh in production. It blocked everything including kubelet health checks, DNS resolution, and service-to-service communication. Even the ingress controller couldn't reach the pods.

The debugging process:

## 1. Verify NetworkPolicy exists and is blocking traffic
kubectl get networkpolicy -n production

## 2. Check if pods have any ingress rules
kubectl describe networkpolicy default-deny -n production

## 3. Test connectivity from inside the cluster
kubectl run nettest --image=nicolaka/netshoot --rm -it -- bash
## Inside container:
curl https://httpbin.org/status/200

The fix that actually worked:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-access
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Allow traffic from other pods in the same namespace
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
    ports:
    - protocol: TCP
      port: 8080
  # CRITICAL: Allow health checks from kubelet
  - from: []  # Allow from any source
    ports:
    - protocol: TCP
      port: 8080

Scenario 3: The RBAC Permission Maze (AKA Why Our Deploys Broke for a Week)

What happened: After a security audit, some expensive consultant convinced management to implement "least privilege everywhere." Within 24 hours, every single CI/CD pipeline started failing with secrets is forbidden: User "system:serviceaccount:ci-cd:pipeline-sa" cannot get resource "secrets" in API group "". Deployments stopped working and nobody could figure out why because the consultant fucked off without documenting any of the changes.

Root cause: The consultant removed the overly broad ClusterRoleBinding that every ServiceAccount was using, then tried to implement "proper" RBAC by giving each service only the permissions it "should" need. Problem is, they had no clue what permissions our actual workloads required - like how our CI pipeline needed kubectl patch deployment to update image tags, or kubectl get configmap/build-config to read build settings - so they just guessed based on some best practices blog post they found.

The diagnostic journey:

## Check what the service account can actually do
kubectl auth can-i get secrets --as=system:serviceaccount:ci-cd:pipeline-sa

## Find existing permissions
kubectl get rolebindings -n ci-cd -o json | jq -r '.items[] | select(.subjects[]?.name=="pipeline-sa")'

## Check cluster-wide permissions
kubectl get clusterrolebindings -o json | jq -r '.items[] | select(.subjects[]?.name=="pipeline-sa")'

The solution:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: ci-cd
  name: pipeline-secrets-reader
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list"]
  # Limit to specific secrets instead of all
  resourceNames: ["db-credentials", "api-keys", "registry-secret"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pipeline-secrets-binding
  namespace: ci-cd
subjects:
- kind: ServiceAccount
  name: pipeline-sa
  namespace: ci-cd
roleRef:
  kind: Role
  name: pipeline-secrets-reader
  apiGroup: rbac.authorization.k8s.io

The Security Policy Violation Troubleshooting Workflow

Step 1: Identify the Policy Type (30 seconds)

## Check pod events for security violations
kubectl describe pod your-pod | grep -A 10 Events

## Look for these violation types:
## - "violates PodSecurity" = Pod Security Standards
## - "Forbidden" = RBAC issue
## - Connection timeouts/refused = NetworkPolicy

Step 2: Gather Context (2 minutes)

## Check namespace-level security policies
kubectl get namespace your-namespace -o yaml | grep -A 10 labels

## Check for NetworkPolicies that might be blocking traffic
kubectl get networkpolicies -n your-namespace

## Verify RBAC permissions for the ServiceAccount
kubectl auth can-i --list --as=system:serviceaccount:your-namespace:your-sa

Step 3: Apply the Fix (5-30 minutes)

Based on the violation type:

PSS violations: Modify pod security context or request profile exemption
RBAC denials: Add specific permissions or use a different ServiceAccount
NetworkPolicy blocks: Add ingress/egress rules or test without policies temporarily

Here's the reality nobody tells you: security violations cascade like dominoes. Fix the Pod Security Standard issue and boom - RBAC error. Fix that and congratulations, now NetworkPolicies are blocking everything. Each fix just reveals the next layer of broken.

These three security layers combine to create a perfect storm of Saturday night pages. Your pod might pass Pod Security Standards, fail due to some RBAC bullshit, and then when you finally get it running, NetworkPolicies kill all the traffic. It's like security whack-a-mole, except each mole costs you 2 hours of debugging.

Next section covers the actual debugging process - because when production's on fire at 3 AM, you need something that finds the root cause instead of making you guess for three fucking hours.

Quick Triage When Everything's Broken

Kubernetes Debugging Process

When Kubernetes security policies decide to ruin your weekend, here's the debugging process that actually works instead of guessing for 3 hours. The official Kubernetes debugging guide provides basic troubleshooting techniques, while Container Solutions' debugging guide covers networking issues specifically. Komodor's comprehensive debugging guide addresses the most frequent problems.

Debugging Security Violations (Prepare to Lose Your Weekend)

Phase 1: Rapid Assessment (30 seconds)

Start with these commands to understand what's broken and why:

## Get the big picture - what's failing?
kubectl get pods -n your-namespace | grep -v Running

## Check recent events for security violations
kubectl get events -n your-namespace --sort-by='.lastTimestamp' | tail -10

## Look for admission controller rejections
kubectl describe pod your-pod | grep -A 5 "Events:"

What to look for in the output:

Pod status: CreateContainerError, CrashLoopBackOff, or Pending
Event messages containing: violates PodSecurity, Forbidden, or NetworkPolicy
Admission controller names: pod-security-admission, ValidatingAdmissionWebhook

Phase 2: Identify the Security Layer (2 minutes)

Based on the error patterns, determine which security mechanism is blocking you:

Pod Security Standards Violations:

## Error pattern: "violates PodSecurity"
kubectl get namespace your-namespace -o yaml | grep -A 5 pod-security

## Check what profile is enforced
kubectl label namespace your-namespace pod-security.kubernetes.io/audit=restricted --dry-run=client

RBAC Permission Denials:

## Error pattern: "Forbidden" or "cannot create/get/list resource"
SERVICE_ACCOUNT="system:serviceaccount:your-namespace:your-sa"
kubectl auth can-i create pods --as=$SERVICE_ACCOUNT
kubectl auth can-i get secrets --as=$SERVICE_ACCOUNT --list

Network Policy Blocks:

## Symptoms: Connection timeouts, health check failures
kubectl get networkpolicies -n your-namespace
kubectl describe networkpolicy --all -n your-namespace | grep -A 10 "Spec:"

Phase 3: Deep Dive Analysis (3 minutes)

Once you've identified the security layer, gather detailed information:

For Pod Security Standards Issues:

## Get detailed violation information
kubectl get pod your-pod -o yaml | grep -A 20 securityContext

## Check what specific controls are failing
kubectl get pod your-pod -o jsonpath='{.spec.containers[*].securityContext}' | jq .

## Test with a compliant pod configuration
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: security-test-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    seccomp:
      type: RuntimeDefault
  containers:
  - name: test
    image: nginx:alpine
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
EOF

For RBAC Permission Issues:

## Find which roles are bound to your service account
kubectl get rolebindings,clusterrolebindings -o json | \
  jq -r '.items[] | select(.subjects[]?.name=="your-sa") | .metadata.name'

## Check what permissions a specific role grants
kubectl describe clusterrole your-role
kubectl describe role your-role -n your-namespace

## Test specific permission scenarios
kubectl auth can-i create deployments --as=$SERVICE_ACCOUNT
kubectl auth can-i get secrets --as=$SERVICE_ACCOUNT --subresource=

For Network Policy Blocks:

## Analyze network policy rules in detail
kubectl get networkpolicy -n your-namespace -o yaml

## Test connectivity from inside the cluster
kubectl run netdebug --image=nicolaka/netshoot --rm -it -- bash
## Inside container:
## nslookup your-service.your-namespace.svc.cluster.local
## telnet your-service 80
## curl -v https://httpbin.org/status/200

## Check if kubelet can reach pod health checks
kubectl describe pod your-pod | grep -A 5 "Liveness\|Readiness"

Advanced Debugging Scenarios

Kubernetes Security Debugging

Scenario A: Cascading Security Failures

The Problem: Your pod passes initial security checks but fails mysteriously after starting.

The Investigation Process:

## 1. Check if the pod started successfully
kubectl get pod your-pod -o wide

## 2. Look at container logs for runtime security issues
kubectl logs your-pod --previous
kubectl logs your-pod --all-containers=true

## 3. Check if runtime security policies are blocking system calls
kubectl exec your-pod -- dmesg | grep -i seccomp
kubectl exec your-pod -- cat /proc/1/status | grep -i seccomp

Real Example: Our PostgreSQL pod from Bitnami's chart worked perfect for 8 months, then one Tuesday morning it started shitting the bed with "bind: permission denied (errno 13) - postgres: could not create TCP/IP listen socket on port 5432". Took me six fucking hours to figure out what broke. This aligns with the Pod Security Standards migration guide warnings about capability requirements, and the OpenShift pod security documentation explains similar violations:

Pod Security Standards were happy - the pod started fine
Container was running as postgres user (UID 999) so it was compliant
But the damn thing needed to bind to port 5432 and didn't have NET_BIND_SERVICE capability
The Bitnami chart wasn't updated for the new security policies, so we had to fork it and add the capability
Bonus fuckery: the chart also needed CHOWN capability because postgres wanted to chown its data directory but couldn't

The Fix:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: postgres
    image: postgres:15-alpine
    securityContext:
      runAsNonRoot: true
      runAsUser: 999
      # Add specific capability for binding to privileged ports
      capabilities:
        add:
        - NET_BIND_SERVICE
        drop:
        - ALL

Network Policy Troubleshooting

Scenario B: Network Policy Debugging Hell

The Problem: Services appear healthy but intercommunication fails intermittently.

The Systematic Approach:

## 1. Map current network policies
kubectl get networkpolicies --all-namespaces -o yaml > current-netpol.yaml

## 2. Identify default behaviors
## Check if namespace has any NetworkPolicy (makes it default-deny)
NAMESPACE_POLICIES=$(kubectl get networkpolicy -n your-namespace --no-headers | wc -l)
echo "NetworkPolicies in namespace: $NAMESPACE_POLICIES"
if [ $NAMESPACE_POLICIES -gt 0 ]; then
  echo "Namespace is in default-deny mode"
fi

## 3. Test specific connection paths
kubectl run debug-source --image=busybox --rm -it --labels="app=debug" -- sh
## Inside pod, test each connection:
## telnet api-service.backend.svc.cluster.local 8080
## telnet database.data.svc.cluster.local 5432
## nslookup api-service.backend.svc.cluster.local

The Discovery Process:
Create a temporary "permissive" NetworkPolicy to isolate the issue:

## Temporary debugging policy - DO NOT use in production
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: debug-allow-all
  namespace: your-namespace
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - {}
  egress:
  - {}

Apply this policy, test your connections, then systematically restrict until you find the minimal required permissions.

Scenario C: RBAC Permission Archaeology

The Problem: A deployment works in one namespace but fails in another with identical YAML.

The Detective Work:

## Compare ServiceAccount permissions between namespaces
WORKING_NS="namespace-a"
BROKEN_NS="namespace-b"
SA_NAME="your-service-account"

## Get all permissions in working namespace
kubectl auth can-i --list --as=system:serviceaccount:$WORKING_NS:$SA_NAME > working-perms.txt

## Get all permissions in broken namespace
kubectl auth can-i --list --as=system:serviceaccount:$BROKEN_NS:$SA_NAME > broken-perms.txt

## Find the differences
diff working-perms.txt broken-perms.txt

What you'll often discover:

ClusterRoleBindings vs RoleBindings scope differences
Missing ServiceAccount creation in the target namespace
Different default ServiceAccount configurations
Namespace-specific RBAC policies applied by admission controllers

The Resolution Process:

## 1. Ensure ServiceAccount exists in target namespace
kubectl get serviceaccount $SA_NAME -n $BROKEN_NS

## 2. Copy working RBAC configuration
kubectl get rolebinding,clusterrolebinding -n $WORKING_NS -o yaml | \
  sed "s/namespace: $WORKING_NS/namespace: $BROKEN_NS/g" | \
  kubectl apply -f -

## 3. Verify permissions are correctly applied
kubectl auth can-i --list --as=system:serviceaccount:$BROKEN_NS:$SA_NAME

Emergency Procedures

Emergency Bypass Procedures (When You Need Things Working Now)

Temporary Security Profile Relaxation

When: Production is down, your CEO is asking for updates every 15 minutes, and you need immediate deployment capability because the hotfix won't deploy due to some bullshit security policy. The emergency response guide explains temporary profile relaxation, while the ARMO security guide covers emergency RBAC procedures. Trilio's RBAC implementation guide provides best practices for emergency access patterns.

The Process:

## 1. Relax Pod Security Standards temporarily
kubectl label namespace your-namespace pod-security.kubernetes.io/enforce=privileged --overwrite

## 2. Deploy your application
kubectl apply -f your-deployment.yaml

## 3. IMMEDIATELY begin hardening
kubectl label namespace your-namespace pod-security.kubernetes.io/warn=restricted --overwrite
kubectl label namespace your-namespace pod-security.kubernetes.io/audit=restricted --overwrite

## 4. Check what violations exist
kubectl get events -n your-namespace | grep -i "violates\|security"

Critical: This is for emergencies only. Set a calendar reminder to fix security issues within 24 hours.

Network Policy Bypass

When: Black Friday traffic is hitting, nothing works because NetworkPolicies are blocking everything, and you need to restore service before the business loses $10k per minute.

The Emergency Procedure:

## 1. Backup current policies
kubectl get networkpolicies -n your-namespace -o yaml > netpol-backup.yaml

## 2. Remove blocking policies temporarily
kubectl delete networkpolicy --all -n your-namespace

## 3. Test connectivity restoration
kubectl run connectivity-test --image=busybox --rm -it -- sh
## Test your critical service paths

## 4. Restore policies incrementally
kubectl apply -f netpol-backup.yaml
## Test after each policy application to identify the problematic one

RBAC Emergency Access

When: CI/CD is completely blocked, deployments are failing, and you've got a critical security patch that needs to go out in the next hour.

The Emergency Workaround:

## Create temporary admin access - REMOVE AFTER FIXING
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: emergency-admin-access
  namespace: your-namespace
subjects:
- kind: ServiceAccount
  name: your-service-account
  namespace: your-namespace
roleRef:
  kind: ClusterRole
  name: admin  # Full admin access - DANGEROUS
  apiGroup: rbac.authorization.k8s.io

Cleanup process:

## MUST DO: Remove emergency access within hours
kubectl delete rolebinding emergency-admin-access -n your-namespace

## Implement proper least-privilege permissions
kubectl apply -f proper-rbac-configuration.yaml

Security Violation Prevention Strategies

1. Gradual Policy Enforcement

Instead of jumping straight to restricted Pod Security Standards, implement progressive enforcement:

## Week 1: Warning mode
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/enforce: privileged

## Week 2: Baseline enforcement
    pod-security.kubernetes.io/enforce: baseline

## Week 4: Full restriction (after fixing violations)
    pod-security.kubernetes.io/enforce: restricted

2. Network Policy Testing Framework

Test network policies in isolation before applying them:

## Create a test namespace with your policies
kubectl create namespace netpol-test
kubectl apply -f your-network-policies.yaml -n netpol-test

## Deploy test applications
kubectl apply -f test-applications.yaml -n netpol-test

## Run automated connectivity tests
kubectl apply -f connectivity-test-job.yaml -n netpol-test

## Only apply to production after tests pass
kubectl apply -f your-network-policies.yaml -n production

3. RBAC Validation Pipeline

Add RBAC testing to your CI/CD pipeline:

#!/bin/bash
## rbac-test.sh - Run before deploying to production

## Test that ServiceAccount has required permissions
kubectl auth can-i create pods --as=system:serviceaccount:production:api-server
kubectl auth can-i get secrets --as=system:serviceaccount:production:api-server
kubectl auth can-i list configmaps --as=system:serviceaccount:production:api-server

## Verify ServiceAccount cannot perform unauthorized actions
kubectl auth can-i delete nodes --as=system:serviceaccount:production:api-server
kubectl auth can-i create clusterroles --as=system:serviceaccount:production:api-server

## All tests must pass before deployment proceeds

This systematic debugging approach saves you from the "try random shit until something works" method. For additional debugging techniques, the official Kubernetes debugging documentation provides comprehensive troubleshooting guidance, while 13 essential Kubernetes debugging tools covers the toolchain you need. The netshoot debugging container is invaluable for network troubleshooting, and kubectl debug command guide explains the newer debugging capabilities. Gravitee's service debugging guide provides service-specific troubleshooting, and the Reddit Kubernetes community troubleshooting discussion offers real-world debugging experiences. Security.com's application troubleshooting guide covers application-layer issues, while DebugAgent's analysis explains why Kubernetes debugging is inherently complex.

Next section has the actual fixes that work in production - not the theoretical bullshit from blog posts written by people who've never deployed anything real.

Security Fixes That Won't Break Again Next Week

Security Standards

After spending half your day figuring out why your deployment gets rejected, you need fixes that won't break again next Tuesday. I've debugged this shit enough times to know what actually works versus what looks good in a blog post - here's the real solutions that solve the problems instead of creating three new ones. The comprehensive Kubernetes security hardening guide provides production-tested approaches, while Wiz Academy's security best practices covers the fundamentals. The official Kubernetes security documentation outlines the security model, and Google Cloud's cluster hardening guide provides cloud-specific security measures. Tigera's 8 security practices covers cluster protection, the NSA/CISA Kubernetes hardening guidance provides government-level security standards, and Spacelift's 4C security model guide explains the layered approach. CrowdStrike's security practices offers enterprise perspective, while Sysdig's Kubernetes Security 101 provides fundamental concepts.

Security Compliance

Pod Security Standards Compliance Fixes

Fix 1: Container Running as Root (Most Common Violation)

The Problem: violates PodSecurity "restricted:latest": runAsNonRoot != true

Root Cause: Most container images run as root by default, which violates modern security standards.

The Complete Solution:

## In your Dockerfile - The permanent fix
FROM node:18-alpine

## Create non-root user (use any non-zero UID)
RUN addgroup -g 1001 -S nodegroup && \
    adduser -u 1001 -S nodeuser -G nodegroup

## Set up application directory with correct ownership
WORKDIR /app
COPY --chown=1001:1001 package*.json ./
RUN npm ci --only=production

COPY --chown=1001:1001 . .

## Switch to non-root user
USER 1001

EXPOSE 3000
CMD ["node", "server.js"]

Pod Configuration for Immediate Fix:

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    # Enforce non-root at pod level
    runAsNonRoot: true
    runAsUser: 1001
    runAsGroup: 1001
    fsGroup: 1001
  containers:
  - name: app
    image: your-app:latest
    securityContext:
      # Container-level security constraints
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
      # Add only required capabilities
      # capabilities:
      #   add:
      #   - NET_BIND_SERVICE  # Only if binding to ports < 1024

Real-world gotchas that will bite you:

File permissions: Your app crashes when it tries to write to /tmp because the filesystem is read-only. Mount a tmpfs volume or deal with it.
Port binding: App won't start because it's trying to bind to port 80 and you're running as non-root. Either use a port > 1024 or add the NET_BIND_SERVICE capability (and watch security team lose their minds).
Log writing: Your logging library shits the bed because it can't write to /var/log. Either mount a writable volume or just log to stdout like you should've been doing all along.
WSL2 weirdness: On Windows with WSL2, user ID mapping gets completely fucked and nothing works. Good luck with that.
MacOS Docker: File ownership is a complete nightmare on MacOS - what works on Linux will break spectacularly on your Mac.
EKS specifics: AWS EKS handles some of this differently than vanilla Kubernetes, especially around IAM roles. Your ServiceAccount needs proper IRSA (IAM Roles for Service Accounts) configuration or aws s3 ls fails with "unable to locate credentials". Also, EKS Fargate enforces some security contexts by default that regular EC2 nodes don't give a shit about.
GKE weirdness: Workload Identity is Google's version of AWS IRSA but more complicated. Your pod needs both a Kubernetes ServiceAccount AND a Google Service Account, and they have to be properly linked or nothing works.
AKS gotchas: Azure AD integration means your RBAC bindings might reference Azure AD groups, and when those groups change (hello corporate restructuring), suddenly half your workloads can't access anything.

Fix 2: Privileged Container Requirements

The Problem: Your app needs privileged access (Docker-in-Docker, hardware access, kernel modules).

The Security-Conscious Solution:

Instead of privileged: true, use specific capabilities:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: system-app
    image: system-tools:latest
    securityContext:
      # Instead of privileged: true, grant specific capabilities
      capabilities:
        add:
        - SYS_ADMIN      # For mounting filesystems
        - NET_ADMIN      # For network configuration
        - SYS_PTRACE     # For debugging/profiling
        drop:
        - ALL
      # Allow access to specific devices
      volumeMounts:
      - name: dev-fuse
        mountPath: /dev/fuse
  volumes:
  - name: dev-fuse
    hostPath:
      path: /dev/fuse
      type: CharDevice

Exception Handling for True Privileged Requirements:

## Only use when absolutely necessary
apiVersion: v1
kind: Namespace
metadata:
  name: system-privileged
  labels:
    # Exempt this namespace from Pod Security Standards
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
    # Document the exception
    security-exception: "system-level-containers"
    security-justification: "docker-in-docker-ci-builds"

Fix 3: Seccomp and AppArmor Compliance

The Problem: violates PodSecurity "restricted:latest": seccompProfile

The Modern Solution:

apiVersion: v1
kind: Pod
spec:
  securityContext:
    # Use runtime default seccomp profile
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: nginx:alpine
    securityContext:
      seccompProfile:
        type: RuntimeDefault
      # For custom seccomp profiles (advanced)
      # seccompProfile:
      #   type: Localhost
      #   localhostProfile: profiles/audit.json

Custom Seccomp Profile for Special Cases:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "lstat", "poll", "lseek", "mmap", "mprotect", "munmap",
        "brk", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
        "ioctl", "pread64", "pwrite64", "readv", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

RBAC Configuration That Works

Fix 1: Service Account Permissions (Least Privilege Approach)

The Problem: secrets is forbidden: User "system:serviceaccount:default:app" cannot get resource "secrets"

The Systematic Solution:

## 1. Create dedicated ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: production
---
## 2. Create minimal Role with only required permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: app-permissions
  namespace: production
rules:
## Only allow access to specific secrets
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  resourceNames: ["app-db-secret", "app-api-keys"]
## Allow reading specific ConfigMaps
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list"]
  resourceNames: ["app-config"]
## Allow creating/updating own status
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["patch"]
  resourceNames: ["app-*"]
---
## 3. Bind ServiceAccount to Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: app-service-account
  namespace: production
roleRef:
  kind: Role
  name: app-permissions
  apiGroup: rbac.authorization.k8s.io

Testing the Permissions:

## Verify the ServiceAccount has exactly the needed permissions
SA="system:serviceaccount:production:app-service-account"
kubectl auth can-i get secrets --as=$SA
kubectl auth can-i get secret/app-db-secret --as=$SA
kubectl auth can-i delete secrets --as=$SA  # Should be "no"
kubectl auth can-i get secrets --as=$SA --subresource=  # List specific secret access

Fix 2: Cross-Namespace Access Patterns

The Problem: Your app in frontend namespace needs to read ConfigMaps from shared-config namespace.

The Secure Approach:

## ClusterRole for cross-namespace access (minimal scope)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cross-namespace-config-reader
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list"]
  # Restrict to specific namespaces
  resourceNames: []
---
## RoleBinding in the target namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: frontend-config-access
  namespace: shared-config
subjects:
- kind: ServiceAccount
  name: frontend-app
  namespace: frontend
roleRef:
  kind: ClusterRole
  name: cross-namespace-config-reader
  apiGroup: rbac.authorization.k8s.io

Fix 3: CI/CD Pipeline Permissions

The Problem: Deployment pipeline needs broad permissions but should be constrained.

The Production-Safe Solution:

## Separate ServiceAccounts for different pipeline stages
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ci-builder
  namespace: ci-cd
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cd-deployer
  namespace: ci-cd
---
## Builder permissions (minimal)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ci-builder-role
  namespace: ci-cd
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  resourceNames: ["registry-credentials"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list"]
---
## Deployer permissions (broader but controlled)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cd-deployer-role
  namespace: production
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["services", "configmaps"]
  verbs: ["get", "list", "create", "update", "patch"]
## Explicitly exclude dangerous permissions
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  # Only deployment-specific secrets
  resourceNames: ["app-secrets"]

Network Policy Solutions That Don't Break Everything

Fix 1: Default Deny with Selective Allow

The Problem: Applied NetworkPolicy blocks all traffic including health checks.

The Comprehensive Solution:

## 1. Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
## 2. Allow DNS resolution (critical for service discovery)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
---
## 3. Allow kubelet health checks
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-kubelet-probes
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  # Allow from nodes (kubelet) for health checks
  - from: []
    ports:
    - protocol: TCP
      port: 8080  # Your app's health check port
---
## 4. Allow specific inter-service communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-server
    ports:
    - protocol: TCP
      port: 5432

Fix 2: Ingress Traffic Management

The Problem: External traffic can't reach your services after applying NetworkPolicies.

The Working Solution:

## Allow traffic from ingress controller to your apps
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-traffic
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  ingress:
  # Allow from ingress-nginx namespace
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
  # Allow from LoadBalancer (for cloud providers)
  - from:
    - ipBlock:
        cidr: 10.0.0.0/8  # Your cluster CIDR
    ports:
    - protocol: TCP
      port: 8080

Fix 3: Development vs Production Policy Sets

The Problem: Network policies that work in production break development workflows.

The Environment-Specific Approach:

## Development: More permissive
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: development-policy
  namespace: development
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector: {}  # Allow from any namespace
  egress:
  - to: []  # Allow to anywhere
  - ports:  # Except restrict external internet
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 80
---
## Production: Restrictive
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: production-policy
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          environment: production
  # Allow specific external services
  - to:
    - ipBlock:
        cidr: 203.0.113.0/24  # External API
    ports:
    - protocol: TCP
      port: 443

Monitoring and Alerting for Security Violations

Proactive Detection Setup

## ServiceMonitor for Prometheus to track security violations
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: security-violations
spec:
  selector:
    matchLabels:
      app: kube-apiserver
  endpoints:
  - port: https
    interval: 30s
    path: /metrics
---
## Alert rules for security policy violations
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: security-violations
spec:
  groups:
  - name: security
    rules:
    - alert: PodSecurityViolation
      expr: increase(apiserver_admission_controller_admission_duration_seconds_count{name="PodSecurity",rejected="true"}[5m]) > 0
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: "Pod Security Standard violation detected"
        description: "{{ $value }} pods rejected due to security policy violations in the last 5 minutes"

    - alert: RBACPermissionDenied
      expr: increase(apiserver_audit_total{verb="create",code="403"}[5m]) > 10
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "High rate of RBAC permission denials"
        description: "{{ $value }} permission denied errors in the last 5 minutes"

Logging Configuration for Security Events

## Audit policy for capturing security violations
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
## Log all admission controller failures
- level: Request
  omitStages:
  - RequestReceived
  resources:
  - group: ""
    resources: ["pods"]
  verbs: ["create", "update", "patch"]
  namespaces: ["production", "staging"]

## Log RBAC permission denials
- level: Request
  omitStages:
  - RequestReceived
  verbs: ["*"]
  resources:
  - group: "*"
    resources: ["*"]
  objectRef:
    apiVersion: "v1"
    kind: "Event"
  responseStatus:
    code: 403

These fixes work until the next Kubernetes upgrade inevitably breaks them. Security policies change faster than your ability to keep up, and what works perfectly today gets deprecated in the next minor version. Test everything in staging that actually mirrors prod - those subtle config differences between environments will bite you in the ass every single time.

Reality check: Half these configs will need tweaking when you hit the edge cases that nobody documented. Your database won't start because of some filesystem permission bullshit, or your logging sidecar can't access the shared volume, or your service mesh proxy needs capabilities that violate the security profile. That's when you'll be debugging at 3 AM wondering why you didn't just become a bartender instead.

Cost reality: This security hardening shit will murder your AWS/GCP bill if you're not careful. All those security contexts and capability checks add overhead. And you'll need at least 8GB RAM for anything serious or it'll swap to death and take forever.

FAQ section has quick answers to the stuff that breaks most often (like "why does my pod still fail after I fixed the security context?"). Comparison tables show exactly which policy blocks what, so you don't have to guess.

Frequently Asked Questions

Why does my pod fail with "violates PodSecurity" even though it worked before?

Because Kubernetes loves breaking changes, especially ones that fuck with your weekend deployments. v1.25+ replaced Pod Security Policies with Pod Security Standards, and most clusters now enforce baseline or restricted profiles by default. No migration guide that actually works, just a bunch of "here's the new way" docs without explaining how to unfuck your existing workloads that ran fine for years.

Quick fix: Check your namespace security profile:

kubectl get namespace your-namespace -o yaml | grep pod-security

If you see restricted enforcement: Your pod needs to run as non-root with security constraints:

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]

Temporary workaround (for testing only):

kubectl label namespace your-namespace pod-security.kubernetes.io/enforce=privileged --overwrite

How do I fix "Forbidden: User cannot create resource" RBAC errors?

The ServiceAccount your pod uses lacks required permissions. This is usually caused by overly restrictive RBAC policies.

Diagnostic steps:

## Check what permissions your ServiceAccount actually has
SA_NAME="your-service-account"
NAMESPACE="your-namespace"
kubectl auth can-i --list --as=system:serviceaccount:$NAMESPACE:$SA_NAME

## Test specific permission that's failing
kubectl auth can-i create pods --as=system:serviceaccount:$NAMESPACE:$SA_NAME

Quick fix (create minimal role):

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: minimal-app-permissions
  namespace: your-namespace
rules:
- apiGroups: [""]
  resources: ["pods", "configmaps"]
  verbs: ["get", "list", "create"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  resourceNames: ["your-app-secrets"]  # Limit to specific secrets
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-permissions
  namespace: your-namespace
subjects:
- kind: ServiceAccount
  name: your-service-account
roleRef:
  kind: Role
  name: minimal-app-permissions
  apiGroup: rbac.authorization.k8s.io

Why can't my pods communicate after I applied NetworkPolicies?

NetworkPolicies create a "default deny" environment. Once ANY NetworkPolicy exists in a namespace, all traffic is blocked unless explicitly allowed.

Emergency fix (temporary):

## Remove all NetworkPolicies to restore connectivity
kubectl get networkpolicy -n your-namespace
kubectl delete networkpolicy --all -n your-namespace

Proper solution (selective allow):

## Allow DNS resolution (critical)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53
---
## Allow pod-to-pod communication within namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-internal
spec:
  podSelector: {}
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector: {}

How do I debug "CreateContainerError" when security policies seem fine?

The error often occurs after security admission but during container startup. Common causes are file permission issues or capability restrictions.

Debug process:

## Check detailed container status
kubectl describe pod your-pod | grep -A 10 "State:"

## Look at container logs (might show permission errors)
kubectl logs your-pod --all-containers=true --previous

## Check if security context is causing issues
kubectl get pod your-pod -o yaml | grep -A 20 securityContext

Common fixes:

File permissions: Ensure non-root user owns application files
Volume mounts: Check if mounted volumes have correct permissions
Port binding: Use ports > 1024 or add NET_BIND_SERVICE capability

What's the difference between "warn", "audit", and "enforce" in Pod Security Standards?

These are different enforcement modes for Pod Security Standards:

enforce: Blocks non-compliant pods from being created
audit: Allows pods but logs violations in audit logs
warn: Shows warnings in kubectl output but allows creation

Check current modes:

kubectl get namespace your-namespace -o yaml | grep -A 5 pod-security

Safe rollout strategy:

## Week 1: Start with warnings
kubectl label namespace your-namespace pod-security.kubernetes.io/warn=restricted

## Week 2: Add auditing
kubectl label namespace your-namespace pod-security.kubernetes.io/audit=restricted

## Week 3: Enforce after fixing all violations
kubectl label namespace your-namespace pod-security.kubernetes.io/enforce=restricted

Why do my health checks fail after applying NetworkPolicies?

NetworkPolicies can block kubelet health check probes. Kubernetes nodes need to reach your pod's health check endpoints.

The fix:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-health-checks
spec:
  podSelector:
    matchLabels:
      app: your-app
  policyTypes: [Ingress]
  ingress:
  - from: []  # Allow from any source (includes kubelet)
    ports:
    - protocol: TCP
      port: 8080  # Your health check port

Safer alternative (if you know your cluster CIDR):

  ingress:
  - from:
    - ipBlock:
        cidr: 10.0.0.0/8  # Replace with your cluster CIDR
    ports:
    - protocol: TCP
      port: 8080

How do I allow ingress traffic through NetworkPolicies?

Ingress controllers run in separate namespaces and need explicit permission to reach your pods.

For nginx-ingress:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-nginx
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes: [Ingress]
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx  # ingress-nginx namespace
    ports:
    - protocol: TCP
      port: 8080

For cloud LoadBalancers:

  ingress:
  - from:
    - ipBlock:
        cidr: 0.0.0.0/0  # Allow from internet (use with caution)
    ports:
    - protocol: TCP
      port: 8080

Can I temporarily disable security policies for troubleshooting?

Yes, but do this carefully and only in non-production environments.

Disable Pod Security Standards:

## Temporarily set to privileged mode
kubectl label namespace your-namespace pod-security.kubernetes.io/enforce=privileged --overwrite

## Re-enable after testing
kubectl label namespace your-namespace pod-security.kubernetes.io/enforce=restricted --overwrite

Bypass NetworkPolicies:

## Backup existing policies
kubectl get networkpolicy -n your-namespace -o yaml > backup-netpol.yaml

## Remove all policies
kubectl delete networkpolicy --all -n your-namespace

## Restore after testing
kubectl apply -f backup-netpol.yaml

Create temporary admin access:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: temp-debug-access
subjects:
- kind: ServiceAccount
  name: your-sa
roleRef:
  kind: ClusterRole
  name: admin
  apiGroup: rbac.authorization.k8s.io

Remember: Delete these temporary bypasses immediately after debugging, or your security team will literally murder you and hide the body.

How do I fix "runAsNonRoot specified, but runAsUser set to 0" errors?

Your pod configuration has conflicting security settings - you specified non-root but also set user ID to 0 (root).

The conflicting configuration:

## This WILL FAIL
securityContext:
  runAsNonRoot: true
  runAsUser: 0  # Contradiction: 0 is root!

Correct configuration:

securityContext:
  runAsNonRoot: true
  runAsUser: 1001  # Any non-zero value
  runAsGroup: 1001

If your app needs root initially: Modify the container image to run as non-root:

FROM ubuntu:20.04
RUN useradd -u 1001 appuser
## Do root setup here
USER 1001
CMD ["your-app"]

Why does my database pod fail with security violations?

Database containers often need special privileges that conflict with restricted security policies.

Common database issues:

Need to bind to port 5432/3306 (< 1024) which requires root privileges
Require filesystem permissions for data directories that non-root can't access
Need process capabilities for shared memory that security policies block

Database-specific fixes:

PostgreSQL:

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 999  # postgres user
    runAsGroup: 999
    fsGroup: 999
  containers:
  - name: postgres
    securityContext:
      capabilities:
        drop: ["ALL"]
        add: ["NET_BIND_SERVICE"]  # For port 5432
    volumeMounts:
    - name: postgres-data
      mountPath: /var/lib/postgresql/data
      subPath: postgres

MySQL/MariaDB:

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 999
    fsGroup: 999
  containers:
  - name: mysql
    securityContext:
      capabilities:
        drop: ["ALL"]
        add: ["NET_BIND_SERVICE"]

How do I troubleshoot admission controller webhooks blocking my deployments?

Admission controllers can reject deployments for policy violations beyond standard Kubernetes policies.

Identify the webhook:

## Check admission controller logs
kubectl describe pod your-pod | grep -A 10 "Events:"
## Look for webhook names in rejection messages

## List all admission webhooks
kubectl get validatingadmissionwebhooks
kubectl get mutatingadmissionwebhooks

Common admission controllers:

Gatekeeper/OPA: Policy violations (gatekeeper-audit)
Falco: Runtime security violations (falco-admission)
Polaris: Best practices violations (polaris-webhook)

Debug webhook responses:

## Check webhook service logs
kubectl logs -n gatekeeper-system deployment/gatekeeper-controller-manager

## Test without the webhook (if safe)
kubectl delete validatingadmissionwebhook gatekeeper-validating-webhook-configuration

What should I do when CI/CD pipelines fail due to security policies?

CI/CD failures are often due to overly restrictive RBAC or Pod Security Standards that worked in development but not production.

Systematic approach:

Compare environments:

## Check differences in security policies
kubectl get namespace production -o yaml > prod-security.yaml
kubectl get namespace staging -o yaml > staging-security.yaml
diff prod-security.yaml staging-security.yaml

Test pipeline ServiceAccount permissions:

SA="system:serviceaccount:ci-cd:pipeline"
kubectl auth can-i create deployments --as=$SA -n production
kubectl auth can-i update services --as=$SA -n production
kubectl auth can-i get secrets --as=$SA -n production

Create environment-specific ServiceAccounts:

## Different SAs for different environments
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pipeline-staging
  namespace: ci-cd
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pipeline-production
  namespace: ci-cd

Pipeline-safe Pod Security configuration:

## Allow CI/CD namespace to be more permissive
apiVersion: v1
kind: Namespace
metadata:
  name: ci-cd
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Kubernetes Security Policy Comparison - What Each Policy Type Actually Controls

Security Control	Pod Security Standards	RBAC	NetworkPolicy	Admission Controllers	When It Blocks You
Container privileges	✅ Controls `privileged`, `runAsRoot`, capabilities	❌ No control	❌ No control	⚠️ Can validate via webhooks	Pod creation fails with "violates PodSecurity"
File system access	✅ Controls `readOnlyRootFilesystem`, volume types	❌ No control	❌ No control	⚠️ Can validate via policy engines	Container starts but crashes accessing files
API server permissions	❌ No control	✅ Controls all K8s API access	❌ No control	⚠️ Can enforce additional restrictions	"Forbidden: User cannot create/get/list resource"
Pod-to-pod communication	❌ No control	❌ No control	✅ Controls all network traffic	❌ No control	Services timeout, connection refused
External traffic	❌ No control	❌ No control	✅ Controls ingress/egress	❌ No control	External API calls fail, ingress 502/503
Resource limits	⚠️ Limited validation	❌ No control	❌ No control	✅ Can enforce quotas/limits	Pod stuck in Pending, OOMKilled
Image policies	❌ No control	❌ No control	❌ No control	✅ Can validate image sources	"Image pull rejected by admission webhook"
DNS resolution	❌ No control	❌ No control	✅ Can block DNS traffic	❌ No control	Service discovery fails, nslookup timeouts
Health checks	❌ No control	❌ No control	✅ Can block kubelet probes	❌ No control	Pods show "Not Ready", health checks fail
Secrets access	❌ No control	✅ Controls secret permissions	❌ No control	⚠️ Can validate secret usage	"secrets is forbidden", config loading fails

Essential Kubernetes Security Policy Resources

56%

integration

Similar content

The real-world guide to debugging ArgoCD when your deployments are on fire and your pager won't stop buzzing

Argo CD

/tool/argocd/production-troubleshooting

50%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization

Quick Navigation