RHACS Kubernetes Security Incident Response - AI-Optimized Reference
CRITICAL WARNINGS
UI Performance Limitations
- BREAKING POINT: Web interface times out with excessive violations
- IMPACT: Investigation blocked during critical incidents
- WORKAROUND: Use
roxctl
CLI instead of browser interface - MEMORY REQUIREMENT: Network graph crashes without sufficient memory
- IMAGE SCANNING LIMIT: Scanner chokes on images larger than few GB
Sensor Connectivity Failures
- SILENT FAILURE MODE: Sensors fail without notification when Central goes down
- DATA LOSS RISK: Pods show "Running" but stop collecting data
- STORAGE OVERFLOW: Sensor storage fills up without Central connectivity, causing pod restarts and data loss
- NO AUTOMATIC FAILOVER: Manual intervention required during incidents
Policy Configuration Reality
- DEFAULT POLICY PROBLEMS:
- Init containers flagged as "Privilege Escalation"
- Istio sidecars trigger "Unauthorized Network Flow" constantly
- CI pipelines running
apt update
flagged as "Suspicious Process Execution"
- LEARNING PERIOD: Baseline learning takes 2-3 weeks minimum
- CDN FALSE POSITIVES: Cloudflare and AWS traffic flagged for 3-4 weeks until system learns
CONFIGURATION
Emergency CLI Commands
# When UI is failing
roxctl central violations list --severity=CRITICAL --limit=50
# Real threat detection - mining processes
roxctl central deployments get-processes --deployment=suspicious-app
# Look for: xmrig, cpuminer, stratum connections
# See all processes (UI truncates)
roxctl central deployments get-processes --deployment=compromised-app --limit=0
# Export violations before container restart
roxctl central violations list --deployment=compromised-app --output=json > violations-$(date +%Y%m%d-%H%M).json
# Keep auth token backed up (expires daily)
roxctl auth export --output=auth-token.json
# Direct API when roxctl breaks
curl -k -H "Authorization: Bearer $RHACS_TOKEN" $RHACS_CENTRAL_URL/v1/violations
Quick Containment (Network Isolation)
# Emergency quarantine - blocks everything
kubectl patch deployment compromised-app -p '{"spec":{"template":{"metadata":{"labels":{"quarantine":"true"}}}}}'
WARNING: Breaks health checks and load balancers immediately. Downstream services will fail.
Evidence Collection Before Container Death
# Get process list while container exists
roxctl central deployments get-processes --deployment=compromised-app > processes-$(date +%Y%m%d-%H%M).txt
Investigation Timeline Commands
# Timeline reconstruction
roxctl central violations list --deployment=$AFFECTED_DEPLOYMENT --sort=created_at --output=json | jq '.violations[] | {time: .created_at, policy: .policy.name}'
# Cross-cluster pattern detection
roxctl central violations list --policy-name="Suspicious Process Execution" --all-clusters
# Multi-cluster workaround when --all-clusters times out
for cluster in $(roxctl central clusters list --output=json | jq -r '.[].name'); do
roxctl central violations list --cluster="$cluster" --severity=HIGH
done
RESOURCE REQUIREMENTS
Time Investments
- Initial Setup: 2-3 weeks for baseline learning
- Policy Tuning: Weeks to months to reduce false positives
- Investigation Training: Significant time investment for security team proficiency
Expertise Requirements
- Essential: roxctl CLI familiarity for engineers
- Critical: RHACS training for security team
- Recommended: Regular tabletop exercises for muscle memory
Infrastructure Dependencies
- CNI Compatibility: Network policies require Calico or Cilium (Flannel doesn't support)
- Memory Requirements: Network graph visualization needs sufficient memory
- Storage: Longer retention requires more storage, compliance may require multi-year retention
FAILURE MODES AND WORKAROUNDS
Data Retention Limits
- Process Data: Disappears after ~1 day
- Network Flows: Purged faster in large clusters
- Violation Data: Retained 1-2 months depending on settings
- CRITICAL: Container restarts wipe RHACS data completely
CLI Failures During Incidents
- Auth Expiry: roxctl auth expires daily, usually during incidents
- Rate Limiting: Multiple users trigger "429 Too Many Requests" errors
- Timeout: Responses timeout on large datasets after 30 seconds exactly
- Version Issues: Dashboard links break between versions, saved searches don't survive upgrades
Detection Limitations
- Process Monitoring: 3-5 second delay, truncates commands >1024 characters in RHACS 4.8.x
- Network Detection: 15-30 minute delay, shows IPs but no packet contents
- Missing Coverage: Cannot detect memory-only attacks, node-level compromises, application-level attacks (SQL injection), cloud API attacks
Forensic Evidence Issues
- Timestamp Problems: Uses container timezone, not UTC
- Data Truncation: Long commands truncated, base64 payloads cut off
- Chain of Custody: Breaks with multiple Central access, no cryptographic checksums by default
ATTACK PATTERN IDENTIFICATION
Crypto Mining Indicators
- Process Signatures: xmrig, cpuminer, high CPU sustained usage (>80%)
- Network Patterns: Connections to ports 3333, 4444, 8080 (Stratum mining)
- Infrastructure: IPs from Digital Ocean, Linode, random AWS EC2 instances
Data Exfiltration Indicators
- Process Indicators: Shells in production containers (
/bin/bash
,/bin/sh
), download tools (curl
,wget
), reverse shells (nc
) - Network Indicators: High volume external transfers, unusual outbound connections
Supply Chain Compromise
- Image Analysis: Unscanned images from public registries, suspicious image provenance
- Detection: Scanner V4 vulnerability analysis, policy bypass indicators
CONTAINMENT REALITY
Network Policy Limitations
- No Automatic Application: RHACS cannot automatically apply network policies
- CNI Dependency: Calico works, Flannel doesn't support policies
- Impact Assessment: Even RHACS Sensor loses contact after network isolation
Historical Incident Example
- Database Quarantine: Quarantined database pod during incident, took down entire application for 2 hours because nothing could reach the DB
- Lesson: Plan containment impact on dependent services
SIEM INTEGRATION
AWS Integration
- Automatic: RHACS forwards findings to AWS Security Hub when configured
- Benefit: Correlates container incidents with CloudTrail events and IAM activity
Webhook Integration
# Forward violations to SIEM
roxctl central notifier create webhook --name="security-operations" --endpoint="https://your-siem.company.com/webhooks/rhacs"
PERFORMANCE IMPACT
Investigation Query Impact
- Database Load: Large forensic queries impact RHACS performance
- Timing: Plan complex investigations during maintenance windows
- Scalability: --all-clusters queries timeout with many clusters
DECISION CRITERIA
Response Approach | RHACS Capabilities | Required Resources | Reality Check | Recommendation |
---|---|---|---|---|
RHACS Standalone | Built-in violation analysis, network graph, runtime monitoring | Just RHACS installation | Fast but limited context | Good starting point |
RHACS + SIEM | Policy violations, network flows, runtime events | Splunk/similar + storage costs | Better correlation but slower setup | Worth it with budget |
RHACS + Cloud Security | Container security findings | AWS GuardDuty, etc. | Good for single-cloud environments | Only if AWS-heavy |
Manual Everything | CLI tools and export functions | Time, expertise, patience | Slow but complete control | Only if budget-constrained |
INCIDENT RESPONSE READINESS
Team Preparation Requirements
- Technical Skills: roxctl CLI familiarity, RHACS violation investigation
- Procedures: Incident response playbooks, escalation procedures, evidence collection protocols
- Practice: Regular tabletop exercises for 2am muscle memory
- Resources: Dashboard bookmarks, pre-configured SIEM queries, automated evidence collection scripts
Common Scenarios Training
- Crypto Mining Detection: Process and network pattern recognition
- Data Exfiltration Investigation: External connection analysis
- Supply Chain Compromise: Image provenance verification
- Insider Threats: RBAC violation tracking
RISK ASSESSMENT
High-Risk Operational Situations
- Central Downtime: Sensors fail open after cache expiry (few days)
- Certificate Expiry: Requires manual rotation on each cluster
- Database Corruption: Recovery time depends on database size
- False Positive Fatigue: Risk scores unreliable, sort by violation count instead
Critical Success Factors
- Policy Tuning: Essential for reducing false positives
- Baseline Learning: Allow 2-3 weeks minimum for network pattern learning
- Team Training: Security team must be proficient with roxctl CLI
- Integration: SIEM correlation significantly improves incident context
Useful Links for Further Investigation
Essential RHACS Incident Response Resources
Link | Description |
---|---|
RHACS 4.8 Operating Guide | Red Hat's official documentation for RHACS 4.8. The violation investigation section is particularly useful during incidents, focusing on runtime monitoring rather than general information. |
roxctl CLI Reference | This reference guide for the roxctl CLI is essential for incident response, providing critical commands like violations and network-graph that are used constantly during security investigations. |
Red Hat Advanced Cluster Security Workshop | An interactive workshop offering a practical alternative to documentation. Its violations and network security labs are highly recommended for effective incident response training and skill development. |
DO430 - Securing Kubernetes with RHACS | Red Hat's official, comprehensive training course for securing Kubernetes with RHACS. While expensive, it provides in-depth knowledge crucial for organizations building a robust security team. |
Red Hat Support Portal | The official Red Hat Support Portal for obtaining emergency assistance during critical incidents. Premium support tiers offer phone escalation options for urgent issues. |
RHACS Known Issues Database | A database of known issues for RHACS, often providing quicker resolutions than opening a support ticket for common problems. Recommended to check here first during troubleshooting. |
Kubernetes Security Slack | An active community Slack channel, specifically #security, where discussions frequently cover RHACS and container incident response, providing valuable peer support during challenging situations. |
Related Tools & Recommendations
Stop Fighting Your CI/CD Tools - Make Them Work Together
When Jenkins, GitHub Actions, and GitLab CI All Live in Your Company
RHACS Troubleshooting Guide: Fix the Stuff That Breaks
When your security platform decides to become the security problem
RHACS Enterprise Deployment - Stop Fucking Around With Security At Scale
Real-world deployment guidance for when you need to secure 50+ clusters without going insane
RHACS Performance Benchmarking & Capacity Planning Guide
RHACS eats resources like Chrome eats RAM. Here's how to size it without bankrupting your cloud bill.
RHACS Compliance Implementation: Stop Panicking When Auditors Show Up
I've been through 5 SOC 2 audits with RHACS. Here's what actually works (and what's complete bullshit)
RHACS - Scans Your Containers So They Don't Get You Fired
Red Hat's solution to the "why the hell did we get hacked" problem
Stop RHACS from destroying your CI/CD pipeline and your will to live
Integrate RHACS without your developers plotting your demise
RHACS Cost Analysis & Pricing Guide: Budget Without Breaking Security
Red Hat quoted us $50K. We spent $127K. Here's why their estimates are fantasy.
Which Container Scanner Doesn't Suck?
Trivy vs Snyk vs Anchore vs Clair: Which One Doesn't Suck?
RHEL - For When Your Boss Asks 'What If This Breaks?'
depends on Red Hat Enterprise Linux
RHEL Security Hardening - Lock Down Your Linux Like You Actually Care About Security
depends on Red Hat Enterprise Linux
Aqua Security Production Troubleshooting - When Things Break at 3AM
Real fixes for the shit that goes wrong when Aqua Security decides to ruin your weekend
Twistlock vs Aqua Security vs Snyk Container - Which One Won't Bankrupt You?
We tested all three platforms in production so you don't have to suffer through the sales demos
Aqua Security - Container Security That Actually Works
Been scanning containers since Docker was scary, now covers all your cloud stuff without breaking CI/CD
Sysdig - Security Tools That Actually Watch What's Running
Security tools that watch what your containers are actually doing, not just what they're supposed to do
Red Hat OpenShift Container Platform - Enterprise Kubernetes That Actually Works
More expensive than vanilla K8s but way less painful to operate in production
Docker vs Podman vs Containerd - 2025 安全性能深度对比
哪个容器运行时更适合你的生产环境?从rootless到daemon架构的全面分析
containerd - The Container Runtime That Actually Just Works
The boring container runtime that Kubernetes uses instead of Docker (and you probably don't need to care about it)
containerd 迁移避坑指南 - 三年血泪总结
compatible with containerd
Docker Daemon Won't Start on Windows 11? Here's the Fix
Docker Desktop keeps hanging, crashing, or showing "daemon not running" errors
Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization