Essential GKE Security Configuration That Actually Works

Here's what you need to configure to stop your cluster from becoming someone else's crypto mining rig. Skip this stuff and you'll be explaining to your boss why the morning production standup got interrupted by weird CPU usage bills.

Workload Identity: Stop Putting Secrets in Your Containers

Kubernetes Security Architecture

Service account JSON keys are how most people fuck up K8s security. Someone always commits them to Git, stores them in ConfigMaps, or leaves them in container images. Had our staging cluster compromised because someone left keys in a public Docker image - honestly still not sure exactly how they found it. Could've been automated scanning, could've been dumb luck, could've been some asshole manually browsing Docker Hub. Took us like 3 days to even figure out that's how they got in. Maybe 4 days. Felt like a week.

Why This Matters (A Lot)

Service account keys don't rotate and they don't expire. Once they leak - and they will leak - attackers have access to your Google Cloud resources until you manually revoke them. Had to learn this the hard way when our service account key ended up in a Slack thread during debugging. That was a fun weekend.

Workload Identity lets pods authenticate without storing any credentials. The tokens expire automatically and rotate themselves, which is way better than hoping nobody commits secrets to Git.

Google finally started pushing Workload Identity harder after enough people got burned by service account key leaks. Took them long enough to admit it was a problem.

Setup That Actually Works

1. Enable Workload Identity (This Will Break Things First)

For existing clusters (expect 5-10 minutes of downtime):

gcloud container clusters update production-cluster \
    --location=us-central1 \
    --workload-pool=PROJECT_ID.svc.id.goog

Warning: This restarts all nodes. Do it during your maintenance window or your pods get killed mid-request and your users start filing angry tickets. Our 20-node cluster took forever - couple nodes just got stuck with UpgradeInProgress status and never finished. Had to manually delete them with gcloud compute instances delete node-xyz --zone=us-central1-a. Probably took like 90 minutes total instead of the promised 15. Maybe longer, wasn't exactly timing it while I was panicking and getting Slack messages about the API being down.

For new clusters (much less painful):

gcloud container clusters create secure-cluster \
    --location=us-central1 \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --enable-shielded-nodes

2. Connect the Accounts (Get This Wrong and Nothing Works)

This is where most people fuck up. The binding syntax is picky and if you get it wrong, your pods just hang forever trying to authenticate:

## Create Google Cloud IAM service account
gcloud iam service-accounts create gke-workload-sa \
    --display-name=\"GKE Workload Service Account\"

## Create Kubernetes service account in the right namespace
kubectl create serviceaccount webapp-ksa --namespace=production

## Bind them together (this is the magic sauce)
gcloud iam service-accounts add-iam-policy-binding \
    --role roles/iam.workloadIdentityUser \
    --member \"serviceAccount:PROJECT_ID.svc.id.goog[production/webapp-ksa]\" \
    gke-workload-sa@PROJECT_ID.iam.gserviceaccount.com

## Add the annotation (miss this and you get mystery failures)
kubectl annotate serviceaccount webapp-ksa \
    --namespace production \
    iam.gke.io/gcp-service-account=gke-workload-sa@PROJECT_ID.iam.gserviceaccount.com

Common gotcha: That PROJECT_ID.svc.id.goog[namespace/service-account] syntax is picky as hell. Mistyped the namespace once (prodcution instead of production - fucking autocorrect) and spent half a day figuring out why pods just hung at startup with gke-metadata-server: PERMISSION_DENIED: Unable to authenticate to Google Cloud. kubectl logs showed nothing useful - had to dig into the audit logs with gcloud logging read to see the actual IAM_PERMISSION_DENIED errors. Pretty sure it was the syntax, but honestly could've been three different things wrong at once. Cost us like 4 hours of downtime while I debugged it.

3. Grant Minimal Required Permissions

Don't give everything Editor permissions like I did the first time:

## Grant specific Cloud Storage access
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=\"serviceAccount:gke-workload-sa@PROJECT_ID.iam.gserviceaccount.com\" \
    --role=\"roles/storage.objectViewer\"

## Grant specific BigQuery access for data processing
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=\"serviceAccount:gke-workload-sa@PROJECT_ID.iam.gserviceaccount.com\" \
    --role=\"roles/bigquery.jobUser\"

Private Clusters: The Nuclear Option That Actually Works

Why Your Nodes Need to Be Antisocial

Private clusters are non-negotiable for production. Public nodes are like leaving your front door open with a sign that says "free Bitcoin miners inside." Every time I've seen a cluster get owned, it started with attackers SSH'ing into public nodes.

gcloud container clusters create secure-private-cluster \
    --location=us-central1 \
    --enable-private-nodes \
    --master-ipv4-cidr-block=10.100.0.0/28 \
    --enable-ip-alias \
    --enable-shielded-nodes \
    --enable-autorepair \
    --enable-autoupgrade \
    --workload-pool=PROJECT_ID.svc.id.goog

Reality check: This breaks everything at first. Your CI/CD can't reach the cluster, kubectl fails from your laptop, and everyone blames you for "making everything complicated." That's exactly the point though.

Spent a weekend figuring out the networking. CI/CD needs authorized networks configured or it can't deploy anything - just hangs with Unable to connect to the server: dial tcp: connect: connection timed out. kubectl commands just timeout until you set up VPN or add office IP ranges with gcloud container clusters update --authorized-networks. Our GitLab runners couldn't reach the cluster for like 3 days while we sorted out firewall rules. Might've been longer - definitely felt like 3 weeks. DevOps team was not happy with me.

Pro tip: Enable Private Google Access before you deploy anything, or your pods can't pull images from GCR. Took me 2 hours to figure out why every pod was stuck in ImagePullBackOff with Failed to pull image \"gcr.io/myproject/app:latest\": rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: net/http: request canceled while waiting for connection. Obvious in hindsight, but not when you're staring at failing deployments.

Shielded GKE Nodes

Shielded GKE Nodes protect against rootkits and bootkits by verifying the integrity of the boot sequence. Enable all three protection features:

gcloud container node-pools create shielded-pool \
    --cluster=production-cluster \
    --location=us-central1 \
    --enable-shielded-nodes \
    --shielded-secure-boot \
    --shielded-integrity-monitoring

Network Policies: Expect to Break Everything

Network policies are mandatory but will break your cluster until you get them right. K8s defaults to "everything can talk to everything," which is terrible for security but great for getting stuff working quickly.

The Nuclear Option (Default Deny Everything):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Apply this and watch everything break. API can't reach the database, frontend can't call backend, monitoring dies. That's expected - now you add back only what you actually need.

First time I did this, our entire monitoring died. Prometheus couldn't scrape anything, Grafana showed flat lines, and alertmanager went silent. Took me way too long to realize the monitoring namespace was blocked by the default deny policy - kept getting context deadline exceeded errors in the Prometheus logs. Had to manually allow all the Prometheus service discovery traffic with kubectl apply -f monitoring-network-policy.yaml. We thought we fixed it twice before we actually got it working right. Spent like 6 hours troubleshooting before I realized the DNS policy was also fucked.

Allow Specific Service Communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

Container Runtime Security

GKE Sandbox with gVisor

GKE Sandbox gives you kernel-level isolation using Google's gVisor. Useful for multi-tenant stuff or when you're running code you don't completely trust (like that sketchy third-party service).

gcloud container node-pools create sandbox-pool \
    --cluster=production-cluster \
    --location=us-central1 \
    --sandbox type=gvisor \
    --machine-type=n1-standard-2

Deploy workloads to the sandbox pool using node selectors:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: untrusted-app
spec:
  template:
    spec:
      nodeSelector:
        cloud.google.com/gke-sandbox: \"true\"
      runtimeClassName: gvisor

Pod Security Standards

Pod Security Standards stop containers from doing stupid shit like running as root or mounting the host filesystem:

apiVersion: v1
kind: Namespace
metadata:
  name: restricted-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Secrets Management

Cloud KMS: Because Basic Encryption Isn't Paranoid Enough

If your compliance team is paranoid about encryption (and they should be), KMS integration is pretty straightforward:

gcloud container clusters update production-cluster \
    --location=us-central1 \
    --database-encryption-key projects/PROJECT_ID/locations/us-central1/keyRings/gke-ring/cryptoKeys/gke-key \
    --database-encryption-state ENCRYPTED

External Secrets Operator

If you've got secrets stored in Google Secret Manager and want to sync them into K8s without manually copying shit around, External Secrets Operator can handle the sync:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: gcpsm-secret-store
spec:
  provider:
    gcpsm:
      projectId: \"PROJECT_ID\"
      auth:
        workloadIdentity:
          clusterLocation: us-central1
          clusterName: production-cluster
          serviceAccountRef:
            name: external-secrets-sa

This pulls secrets from Google Secret Manager into K8s secrets automatically. Beats manually copying database passwords around, plus it rotates when the source changes.

GKE Security Features Comparison Matrix

Security Feature

Autopilot

Standard

Use Case

Implementation Complexity

Workload Identity Federation

✅ Always enabled

⚙️ Manual configuration required

Secure Google Cloud API access

Low

Binary Authorization

⚙️ Optional configuration

⚙️ Optional configuration

Container image verification

Medium

Shielded GKE Nodes

✅ Enabled by default

⚙️ Optional during node pool creation

Boot integrity protection

Low

Private Clusters

✅ Nodes always private

⚙️ Optional configuration

Network isolation

Medium

Network Policies

✅ Supported

✅ Supported

Pod-to-pod communication control

High

GKE Sandbox (gVisor)

❌ Not available

✅ Available

Workload isolation

Medium

Pod Security Standards

✅ Enforced by default

⚙️ Manual configuration

Pod security policies

Low

Cluster Encryption at Rest

✅ Always enabled

⚙️ Optional with CMEK

Data protection

Medium

Audit Logging

✅ Enabled by default

⚙️ Manual configuration

Compliance monitoring

Low

VPC-Native Networking

✅ Always enabled

⚙️ Optional

Advanced networking security

Medium

Container Image Scanning

✅ Automatic

✅ Automatic

Vulnerability detection

Low

Resource Quotas

✅ Automatic right-sizing

⚙️ Manual configuration

Resource limits enforcement

Medium

GKE Security FAQ: Common Questions and Answers

Q

How quickly do attackers target new GKE clusters?

A

Pretty fast - usually hours, sometimes less. Bots are constantly scanning for new clusters. Last time I spun up a test cluster, saw connection attempts in our logs within like an hour. Could've been sooner - I wasn't watching it immediately.

The most common shit they try immediately:

  • Port scanning for exposed API servers (6443 and 8080 mostly)
  • Default service account token abuse
  • Looking for publicly accessible NodePort services
  • Brute forcing common RBAC misconfigurations

Protection: Configure security during cluster creation, not after you see weird traffic in your logs. By then it might be too late.

Q

What's the difference between Workload Identity and service account keys?

A

Service Account Keys are those JSON blobs with your GCP credentials that everyone stores in Kubernetes secrets like morons. If these keys get compromised (and they will), attackers have full access to your entire Google Cloud project.

Workload Identity Federation gets rid of storing any credentials at all. Your pods just magically authenticate using temporary tokens that Google handles behind the scenes. No more JSON files, no more wondering who committed secrets to Git.

Why this matters: Service account keys don't expire and don't rotate. Once they leak (and they will), you're screwed until you manually revoke them. I've seen keys in Docker images, Git repos, Slack messages. Workload Identity tokens expire automatically.

Q

Should I use Autopilot or Standard for security-sensitive workloads?

A

Autopilot does all the security shit for you automatically - private nodes, Workload Identity, security policies, the works. You can't break it even if you try.

Standard mode lets you shoot yourself in the foot with infinite customization options. Sure, you can theoretically make it more secure than Autopilot, but most people just make it less secure by accident.

Recommendation: Start with Autopilot unless you specifically need:

  • Custom CNI plugins for advanced networking
  • Windows containers
  • GPU workloads with custom drivers
  • Direct node access for debugging (which you shouldn't do in production anyway)

Security teams love Autopilot because they can't get blamed when developers fuck up the configuration - Google handles it all.

Q

How do I implement zero-trust networking in GKE?

A

Zero-trust means "trust nothing, verify everything" - basically assume every network connection is hostile until proven otherwise. Pain in the ass to implement but worth it:

1. Default Deny Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

2. Explicit Allow Rules
Only permit required communications between specific services.

3. Service Mesh with mTLS
Istio service mesh encrypts all service-to-service communication and provides fine-grained access controls.

4. Workload Identity for External Services
Use Workload Identity Federation instead of API keys for Google Cloud service access.

Implementation time: Takes 2-4 weeks for complete zero-trust setup, depending on how complex your microservices clusterfuck is. Could be way longer if you hit weird networking issues with service mesh - learned that the hard way when Istio decided our ingress was "unhealthy" for no apparent reason.

Q

What are the most critical security misconfigurations in GKE?

A

From what I've seen in production clusters, here are the common fuckups:

  1. Default service account with Editor permissions - gives every pod God mode over your entire Google Cloud project
  2. No network policies - every pod can talk to every other pod and the internet
  3. Public clusters - nodes have public IPs and can download crypto miners directly
  4. Privileged containers everywhere - because someone needed root access once and never removed it
  5. No image verification - deploying random Docker Hub images without checking what's in them

The easy fix: Use Autopilot mode. It fixes most of this automatically, though you'll lose some flexibility.

Q

How do I secure container images and prevent supply chain attacks?

A

Supply chain attacks are getting nastier. Attackers compromise base images, inject code into build pipelines, or upload malware with friendly names to Docker Hub. Here's how to avoid getting owned:

1. Binary Authorization
Requires cryptographic signatures on all container images before deployment:

gcloud container binauthz policy import policy.yaml

2. Container Image Scanning
Automatically scans images for vulnerabilities and malware. Critical and high-severity vulnerabilities should block deployment.

3. Distroless Images
Use Google's distroless images that contain only your application and runtime dependencies, reducing the attack surface by 60-80%.

4. Admission Controllers
Implement OPA Gatekeeper or similar tools to enforce image policies at runtime.

Q

Can GKE integrate with existing enterprise security tools?

A

Yes, GKE provides extensive integration capabilities:

SIEM Integration

Vulnerability Management

  • Container Analysis API integrates with tools like Twistlock, Aqua Security, and Snyk
  • Custom vulnerability scanning via admission webhooks

Identity Integration

  • Google Cloud Directory Sync for Active Directory integration
  • SAML/OIDC federation for enterprise identity providers
  • Certificate-based authentication for service accounts

Network Security

  • Third-party firewall integration via VPC routing
  • DLP (Data Loss Prevention) scanning of traffic

Most enterprises get integration working within 2-3 weeks using APIs and webhooks, assuming security team requirements don't change mid-project.

Q

What's the cost impact of implementing comprehensive GKE security?

A

Security features have varying cost implications:

Free Security Features:

  • Workload Identity Federation
  • Network policies
  • Private clusters
  • Shielded nodes
  • Container image scanning
  • Basic audit logging

Paid Security Features:

  • Binary Authorization: $0.50 per 1,000 attestations
  • GKE Sandbox (gVisor): 10-20% compute overhead
  • Service mesh: 5-15% performance impact + additional resource usage
  • Advanced audit logging: Storage costs for retained logs
  • Cloud KMS for BYOK: $1 per key version per month

Total Cost: Comprehensive security typically adds 15-25% to base GKE costs but prevents expensive security incidents.

Q

How do I migrate from service account keys to Workload Identity?

A

Migration requires careful planning to avoid service disruptions:

Phase 1: Preparation (Week 1)

  1. Audit existing service account key usage
  2. Enable Workload Identity on clusters
  3. Create IAM service accounts with minimal permissions

Phase 2: Parallel Deployment (Weeks 2-3)

  1. Deploy applications with both methods enabled
  2. Test Workload Identity functionality
  3. Monitor for any access issues

Phase 3: Key Removal (Week 4)

  1. Remove service account keys from applications
  2. Delete unused Kubernetes secrets
  3. Audit access patterns to confirm successful migration

Timeline: 3-4 weeks for large deployments if everything goes smooth, which it won't. Smaller applications can migrate in 1-2 weeks unless you discover some legacy service that breaks spectacularly without its hardcoded credentials.

Q

What monitoring should I implement for GKE security?

A

Essential security monitoring includes:

Control Plane Monitoring:

  • API server access patterns and unauthorized requests
  • RBAC permission changes
  • Certificate rotation events

Workload Monitoring:

  • Container behavior analysis for anomalies
  • Network traffic patterns between services
  • Resource usage spikes that might indicate cryptomining

Infrastructure Monitoring:

  • Node integrity validation (Shielded Nodes)
  • Persistent volume access patterns
  • Service mesh traffic encryption status

Integration Tools:

  • Falco for runtime security monitoring
  • Prometheus with security-focused alerts
  • Google Cloud Security Command Center for centralized visibility

Alert Fatigue Prevention: Start with the shit that actually matters (privilege escalation, crypto miners, external network access from restricted pods) before adding alerts for every little thing. Otherwise you'll just ignore everything when the alerts get too noisy.

Advanced GKE Security: The Stuff That Actually Stops Attacks

Basic security is table stakes. If you want to stop determined attackers (not just script kiddies), you need the advanced features most people skip because they look complicated. Here's how to implement security that works when someone really wants to mess with your cluster.

Binary Authorization: Your Container Bouncer

Binary Authorization Workflow

Supply chain attacks are getting nastier. Attackers compromise popular base images, inject code into build pipelines, or upload malware to Docker Hub with friendly names like helpful-nginx or secure-redis. Binary Authorization is your bouncer - it checks IDs before letting anything into your cluster.

Had some weird CPU spikes that took forever to track down - nodes hitting 90%+ usage with no obvious cause. Turned out to be sketchy base image from some random registry called secure-alpine-base that looked legit. Still not sure how long it was actually mining - could've been days, could've been weeks, could've been months. Maybe longer. We thought we had it figured out twice before we found the actual source with kubectl top nodes and docker exec into the containers. Binary Authorization would've told that unsigned image to fuck off since it wasn't from our trusted pipeline.

How To Actually Set This Shit Up

Binary Authorization is basically "no unsigned images allowed." You sign images during builds to prove they came from your pipeline instead of some random Docker Hub account.

1. Create Your "No Entry Without ID" Policy

Tell Binary Authorization what you'll actually accept:

admissionWhitelistPatterns:
- namePattern: gcr.io/PROJECT_ID/trusted-base-images/*
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
  - projects/PROJECT_ID/attestors/prod-attestor
clusterAdmissionRules:
  us-central1.production-cluster:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
    - projects/PROJECT_ID/attestors/prod-attestor
2. Wire It Into Your Build Pipeline (The Annoying Part)

Your CI/CD needs to sign every image or it gets blocked at deployment with Binary Authorization policy rejected image. Took me way too many tries to get the attestation commands right - kept getting INVALID_ARGUMENT: Failed to parse attestation errors because I had the keyversion format wrong:

## Create attestor for production deployments
gcloud container binauthz attestors create prod-attestor \
    --attestation-authority-note=projects/PROJECT_ID/notes/prod-note \
    --description=\"Production deployment attestor\"

## Create attestation during build
gcloud container binauthz attestations sign-and-create \
    --attestor=prod-attestor \
    --artifact-url=gcr.io/PROJECT_ID/app:latest \
    --keyversion=projects/PROJECT_ID/locations/global/keyRings/binauthz/cryptoKeys/attestor-key/cryptoKeyVersions/1

Blocking Images With Known Vulnerabilities

Binary Authorization can check with Container Analysis before letting images run. Handy when someone tries to deploy that nginx image that's been sitting in staging for 6 months with 47 CVEs:

## Configure vulnerability scanning policy
gcloud container binauthz policy import policy.yaml --policy-file=- <<EOF
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
  - projects/PROJECT_ID/attestors/vulnerability-attestor
admissionWhitelistPatterns: []
clusterAdmissionRules: {}
EOF

Images get scanned automatically and critical/high-severity vulnerabilities block deployment with Binary Authorization vulnerability policy check failed unless you override with an attestation. Don't override unless you're really sure - I've seen teams override "just this once" for some urgent hotfix and then forget about it for months. We did that ourselves once with a Redis image that had like 20 CVEs. Oops.

Catching Weird Shit While It's Running

GKE Audit Logs: Your Security Camera

Audit logging tracks everything happening in your cluster. When someone inevitably does something stupid (or malicious), you'll at least know who and when:

## audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
## Log sensitive resource changes at Request level
- level: Request
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
## Log all authentication and authorization decisions
- level: Metadata
  namespaces: ["kube-system", "kube-public"]
## Log pod exec and portforward at Metadata level
- level: Metadata
  resources:
  - group: ""
    resources: ["pods/exec", "pods/portforward", "pods/proxy"]

Apply the policy during cluster creation:

gcloud container clusters create secure-cluster \
    --enable-cloud-logging \
    --logging=SYSTEM,WORKLOAD,API_SERVER \
    --audit-policy=audit-policy.yaml

Falco: Your Runtime Paranoia Engine

Falco watches everything and yells when shit doesn't look right. Saved our ass when we had that crypto mining incident - though to be fair, it took us like 3 days to notice the Cryptocurrency mining detected alerts in Slack. We had alert fatigue and ignored them at first thinking they were false positives:

## falco-rules.yaml
- rule: Detect Cryptocurrency Mining
  desc: Detect cryptocurrency mining activities in containers
  condition: >
    spawned_process and (
      proc.name in (xmrig, cpuminer, ccminer) or
      proc.cmdline contains "stratum+tcp" or
      proc.cmdline contains "mining.pool"
    )
  output: "Cryptocurrency mining detected (user=%user.name command=%proc.cmdline)"
  priority: CRITICAL

- rule: Unexpected Network Connection
  desc: Detect unexpected external network connections
  condition: >
    outbound and not fd.typechar=4 and not fd.is_unix and
    not proc.name in (node, npm, apt, wget, curl) and
    not fd.rip in ("127.0.0.1", "::1")
  output: "Unexpected network connection (connection=%fd.name)"
  priority: WARNING

Pod Security Standards: No Root For You

Pod Security Standards basically tell containers "no, you can't run as root and no, you can't mount the entire filesystem." About time someone made this the default:

## security-context-constraint.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
  namespace: production
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: app
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"
          requests:
            memory: "256Mi"
            cpu: "250m"

Service Mesh: When You Need Everything Encrypted

If you've got compliance people breathing down your neck about encrypting everything "in flight," Anthos Service Mesh handles the mTLS circus automatically. Pain in the ass to set up (took us like 2 weeks to get working right with all the certificate bullshit) but then it just works:

## istio-security-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: frontend-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: frontend
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/api-gateway"]
  - to:
    - operation:
        methods: ["GET", "POST"]

Enable Anthos Service Mesh during cluster creation:

gcloud container clusters create mesh-cluster \
    --location=us-central1 \
    --enable-ip-alias \
    --enable-autoscaling \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --addons=Istio

Compliance Checkbox Theater

SOC 2 and ISO 27001: Making Auditors Happy

If your auditors care about compliance certifications, GKE checks most of their boxes. Here's the minimum viable bureaucracy:

Data Encryption (because auditors love acronyms):
  • Envelope encryption with Cloud KMS (they'll ask for this)
  • Customer-managed keys if you're paranoid
  • TLS 1.2+ because apparently 1.1 is for peasants
gcloud container clusters update production-cluster \
    --database-encryption-key projects/PROJECT_ID/locations/us-central1/keyRings/gke-ring/cryptoKeys/etcd-key \
    --database-encryption-state ENCRYPTED
Access Control Paper Trail (for when things go wrong):
  • RBAC policies that don't give everyone admin
  • Logs showing who did what when
  • Documentation explaining why Bob from marketing can't kubectl into prod
Change Management (cover your ass):
  • GitOps workflows so changes are tracked
  • Approval gates to slow down the cowboys
  • Attestations proving you didn't just kubectl apply random yaml

GDPR: European Data Paranoia

For European organizations or anyone handling EU citizen data (which is basically everyone now, thanks GDPR):

## Create cluster in EU region with data residency controls
gcloud container clusters create gdpr-cluster \
    --location=europe-west1 \
    --enable-private-nodes \
    --enable-ip-alias \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --resource-usage-bigquery-dataset=PROJECT_ID:gke_usage_eu

Configure data processing consent management:

## data-processing-consent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: gdpr-config
  namespace: production
data:
  data_residency: "EU"
  retention_period: "90d"
  consent_required: "true"
  data_processor: "organization-name"

When You Get Owned Anyway

Security Event Correlation (Post-Mortem Prep)

Set up Cloud Security Command Center so you can figure out what went wrong after the incident:

gcloud security-center sources create \
    --organization=ORGANIZATION_ID \
    --display-name=\"GKE Security Events\" \
    --description=\"Security events from GKE clusters\"

Forensic Data Collection (CSI: Kubernetes)

When shit hits the fan, you'll need evidence. This job grabs everything useful for the post-incident "who fucked up" investigation:

## forensic-collector.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: forensic-collection
spec:
  template:
    spec:
      serviceAccountName: forensic-collector
      containers:
      - name: collector
        image: gcr.io/PROJECT_ID/forensic-tools:latest
        command:
        - /bin/sh
        - -c
        - |
          kubectl get events --all-namespaces > /tmp/events.log
          kubectl logs -n kube-system --all-containers > /tmp/system-logs.log
          kubectl get pods -o yaml --all-namespaces > /tmp/pod-manifests.yaml
      restartPolicy: Never

That's the advanced security stuff that might actually stop someone who knows what they're doing. Most of it's a pain in the ass to set up properly, but way better than being the one in the conference room explaining to executives why your K8s cluster became a bitcoin farm and why the AWS bill is suddenly $50K higher this month. No guarantees though - determined attackers are getting scary good at this shit.

Essential GKE Security Resources (Actually Useful Ones)

Related Tools & Recommendations

tool
Similar content

GKE Overview: Google Kubernetes Engine & Managed Clusters

Google runs your Kubernetes clusters so you don't wake up to etcd corruption at 3am. Costs way more than DIY but beats losing your weekend to cluster disasters.

Google Kubernetes Engine (GKE)
/tool/google-kubernetes-engine/overview
100%
tool
Similar content

Google Cloud Run: Deploy Containers, Skip Kubernetes Hell

Skip the Kubernetes hell and deploy containers that actually work.

Google Cloud Run
/tool/google-cloud-run/overview
85%
tool
Similar content

kubeadm - The Official Way to Bootstrap Kubernetes Clusters

Sets up Kubernetes clusters without the vendor bullshit

kubeadm
/tool/kubeadm/overview
84%
tool
Similar content

RHACS Enterprise Deployment: Securing Kubernetes at Scale

Real-world deployment guidance for when you need to secure 50+ clusters without going insane

Red Hat Advanced Cluster Security for Kubernetes
/tool/red-hat-advanced-cluster-security/enterprise-deployment
78%
tool
Similar content

Helm: Simplify Kubernetes Deployments & Avoid YAML Chaos

Package manager for Kubernetes that saves you from copy-pasting deployment configs like a savage. Helm charts beat maintaining separate YAML files for every dam

Helm
/tool/helm/overview
77%
tool
Similar content

etcd Overview: The Core Database Powering Kubernetes Clusters

etcd stores all the important cluster state. When it breaks, your weekend is fucked.

etcd
/tool/etcd/overview
67%
tool
Similar content

containerd - The Container Runtime That Actually Just Works

The boring container runtime that Kubernetes uses instead of Docker (and you probably don't need to care about it)

containerd
/tool/containerd/overview
65%
tool
Similar content

Aqua Security - Container Security That Actually Works

Been scanning containers since Docker was scary, now covers all your cloud stuff without breaking CI/CD

Aqua Security Platform
/tool/aqua-security/overview
56%
review
Similar content

Container Runtime Security: Prevent Escapes with Falco

I've watched container escapes take down entire production environments. Here's what actually works.

Falco
/review/container-runtime-security/comprehensive-security-assessment
52%
tool
Similar content

Google Cloud Migration Center: Simplify Your Cloud Migration

Google Cloud Migration Center tries to prevent the usual migration disasters - like discovering your "simple" 3-tier app actually depends on 47 different servic

Google Cloud Migration Center
/tool/google-cloud-migration-center/overview
45%
tool
Similar content

Development Containers - Production Deployment Guide

Got dev containers working but now you're fucked trying to deploy to production?

Development Containers
/tool/development-containers/production-deployment
43%
tool
Similar content

Google Cloud Storage Transfer Service: Data Migration Guide

Google's tool for moving large amounts of data between cloud storage. Works best for stuff over 1TB.

Google Cloud Storage Transfer Service
/tool/storage-transfer-service/overview
43%
tool
Similar content

ArgoCD - GitOps for Kubernetes That Actually Works

Continuous deployment tool that watches your Git repos and syncs changes to Kubernetes clusters, complete with a web UI you'll actually want to use

Argo CD
/tool/argocd/overview
41%
tool
Similar content

TensorFlow: End-to-End ML Platform - Overview & Getting Started Guide

Google's ML framework that actually works in production (most of the time)

TensorFlow
/tool/tensorflow/overview
41%
troubleshoot
Similar content

Fix Kubernetes OOMKilled Pods: Production Crisis Guide

When your pods die with exit code 137 at 3AM and production is burning - here's the field guide that actually works

Kubernetes
/troubleshoot/kubernetes-oom-killed-pod/oomkilled-production-crisis-management
41%
tool
Similar content

kubectl: Kubernetes CLI - Overview, Usage & Extensibility

Because clicking buttons is for quitters, and YAML indentation is a special kind of hell

kubectl
/tool/kubectl/overview
40%
howto
Similar content

FastAPI Kubernetes Deployment: Production Reality Check

What happens when your single Docker container can't handle real traffic and you need actual uptime

FastAPI
/howto/fastapi-kubernetes-deployment/production-kubernetes-deployment
38%
howto
Similar content

Deploy Kubernetes in Production: A Complete Step-by-Step Guide

The step-by-step playbook to deploy Kubernetes in production without losing your weekends to certificate errors and networking hell

Kubernetes
/howto/setup-kubernetes-production-deployment/production-deployment-guide
37%
tool
Similar content

KEDA - Kubernetes Event-driven Autoscaling: Overview & Deployment Guide

Explore KEDA (Kubernetes Event-driven Autoscaler), a CNCF project. Understand its purpose, why it's essential, and get practical insights into deploying KEDA ef

KEDA
/tool/keda/overview
37%
tool
Similar content

TensorFlow Serving Production Deployment: Debugging & Optimization Guide

Until everything's on fire during your anniversary dinner and you're debugging memory leaks at 11 PM

TensorFlow Serving
/tool/tensorflow-serving/production-deployment-guide
36%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization