RHACS Security Incident Response

Currently viewing the human version

Initial Response: RHACS Runtime Detection and Triage

RHACS Security Violations Dashboard

RHACS alerts at 2am usually mean something's actually wrong or your policies are garbage. The web interface times out when you have too many violations, so use roxctl instead of fighting with the browser:

## When the UI is being useless
roxctl central violations list --severity=CRITICAL --limit=50

The network graph is slow as hell and will crash if you don't have enough memory. Image scanning chokes on anything bigger than a few GB.

RHACS Security Features

Finding the Real Problems

Default policies flag every init container as "Privilege Escalation" because they don't understand that init containers need to do setup work. Istio sidecars trigger "Unauthorized Network Flow" constantly because RHACS doesn't know about service mesh traffic patterns. CI pipelines running apt update get flagged as "Suspicious Process Execution" because the policy doesn't understand that package managers are normal in build environments.

Crypto miners actually running vs build noise:

## Real threat - mining processes
roxctl central deployments get-processes --deployment=suspicious-app
## Look for: xmrig, cpuminer, stratum connections

## False positive - package installs
roxctl central violations list --policy="Privilege Escalation"
## Usually just containers needing CAP_CHOWN

Baseline learning takes 2-3 weeks minimum if you're lucky, longer if your traffic patterns are weird. CDN traffic from Cloudflare and AWS gets flagged as suspicious for 3-4 weeks until the system learns your patterns.

Process Analysis and Network Flows

Process detection is delayed a few seconds. The UI truncates long process lists, so use --limit=0:

## See all processes, not just the first few
roxctl central deployments get-processes --deployment=compromised-app --limit=0

Look for shells (/bin/bash, /bin/sh) in production containers, download tools (curl, wget), or reverse shells (nc). Network flow detection is slower than process monitoring. External IP classification depends on threat feeds that update sporadically.

RHACS Architecture

Containment Reality

RHACS can't automatically apply network policies. Network isolation depends on your CNI - Calico works, Flannel doesn't support policies.

## Quick network isolation - blocks everything
kubectl patch deployment compromised-app -p '{"spec":{"template":{"metadata":{"labels":{"quarantine":"true"}}}}}'

Then create a network policy blocking all traffic to pods with quarantine: "true". This breaks health checks and load balancers immediately. Downstream services will fail. Even RHACS Sensor loses contact after network isolation. We once quarantined a database pod during an incident and took down the entire application for 2 hours because nothing could reach the DB.

Evidence Collection Before Container Restart

Container restarts wipe RHACS data. Collect evidence fast before liveness probes restart everything:

## Export violations before container dies
roxctl central violations list --deployment=compromised-app --output=json > violations-$(date +%Y%m%d-%H%M).json

## Get process list while it still exists
roxctl central deployments get-processes --deployment=compromised-app > processes-$(date +%Y%m%d-%H%M).txt

Process data disappears after about a day. Network flows get purged faster in large clusters. Violation data sticks around for a month or two depending on retention settings.

RHACS Network Graph Visualization

RHACS Sensor Connectivity Issues

Sensors fail silently when Central goes down. Pods show "Running" but stop collecting data. No automatic failover exists.

## Check if sensors are actually working
roxctl sensor get-health

## Sensor logs usually have the real error
kubectl logs -n stackrox deploy/sensor | grep -i error

Sensor storage fills up without Central connectivity. Pod restarts and loses data when storage is full. Manual intervention required during incidents.

What RHACS Actually Records

Process execution shows command arguments but truncates long ones. Base64 payloads get cut off when you need them. Network flows show IPs but not packet contents. You get source/destination but no payload data - like seeing someone walked into a building but not knowing what they did inside.

Process timestamps use container timezone, not UTC - learned this the hard way during a 3am investigation when everything was off by 8 hours. Parent-child relationships break on container restarts. File system monitoring needs privileged containers, which security policies usually block. No memory dumps - crashes leave no traces. RHACS 4.8.x has a bug where process arguments get truncated at 256 characters, so base64 payloads in environment variables get cut off right when you need them.

Common RHACS CLI Failures During Incidents

roxctl auth expires daily, usually during incidents. Token refresh fails during maintenance windows.

## Keep auth token backed up
roxctl auth export --output=auth-token.json

## Direct API when roxctl breaks - replace URL with your RHACS Central endpoint
curl -k -H "Authorization: Bearer $RHACS_TOKEN" \
  $RHACS_CENTRAL_URL/v1/violations

API rate limiting kicks in fast. Multiple people running roxctl triggers limits with "429 Too Many Requests" errors that don't explain how long to wait. Responses timeout on large datasets with "context deadline exceeded" after exactly 30 seconds.

Dashboard links break between versions. Saved searches don't survive upgrades. Use roxctl commands instead of UI bookmarks.

Deep Investigation: Using RHACS to Figure Out What Actually Happened

RHACS Investigation Dashboard

After containing the mess, you need to figure out what happened. RHACS has forensic capabilities if you know where to look and the data hasn't been purged.

Timeline Reconstruction

RHACS tracks violation history in chronological order. Check what happened when:

## Timeline of violations for compromised deployment
roxctl central violations list \
    --deployment=$AFFECTED_DEPLOYMENT \
    --sort=created_at \
    --output=json | jq '.violations[] | {time: .created_at, policy: .policy.name}'

## Network violations show lateral movement
roxctl central violations list \
    --policy-name=\"Unauthorized Network Flow\" \
    --cluster=$AFFECTED_CLUSTER

Network Flow Analysis

Network Graph shows external connections that used to be invisible. Containers talking to IPs from hosting providers like Digital Ocean droplets, Linode, or random AWS EC2 instances usually means downloading payloads or connecting to mining pools. Also flags legitimate CDN traffic from Cloudflare (8.8.8.8 ranges) and AWS CloudFront for weeks until it learns your patterns.

RHACS Network Flow Analysis

Process Investigation

RHACS captures process execution in containers:

## Get process execution for compromised container
roxctl central processes list \
    --deployment=$AFFECTED_DEPLOYMENT \
    --container=$CONTAINER_NAME

Look for shells (/bin/bash, /bin/sh), download tools (wget, curl), or recon commands (ps, netstat). Crypto miners show high CPU processes with network connections.

Container Image Investigation

Scanner V4 shows vulnerabilities in compromised images:

## Scan compromised image for vulnerabilities
roxctl image scan $COMPROMISED_IMAGE --output=json

## Check image layers and metadata
roxctl image scan $COMPROMISED_IMAGE --show-layers

RHACS Dashboard

For supply chain attacks, check image provenance - where did it come from? People pull from public registries when internal ones are slow. Sometimes those images have "bonus features" like crypto miners. RHACS catches unscanned images if policies are configured to block them.

Cross-Cluster Investigation

Check for similar attacks across multiple clusters:

## Find patterns across all clusters
roxctl central violations list \
    --policy-name=\"Suspicious Process Execution\" \
    --all-clusters

Check RBAC violations to see how attackers got access:

## Look for privilege escalation
roxctl central violations list \
    --policy-name=\"Privilege Escalation\" \
    --cluster=$AFFECTED_CLUSTER

## Check service account permissions
kubectl describe serviceaccount $SERVICE_ACCOUNT -n $AFFECTED_NAMESPACE

Network policy analysis shows which connections were allowed or blocked, assuming you set up policies correctly.

Common Attack Scenarios

Crypto Mining:
Check for processes using >80% CPU for extended periods - legitimate workloads spike and drop, miners sustain high usage. Look for connections to ports 3333, 4444, 8080 (common Stratum mining ports) and IPs from hosting providers like Digital Ocean droplets where miners set up pools.

Data Exfiltration:
Look at external IP connections, volume of data transfers, and which containers initiated outbound connections.

Supply Chain Attacks:
Use Scanner V4 on compromised images, check image provenance, and review why policies didn't block malicious images.

Insider Threats:
Track unusual deployments, RBAC violations, and manually added policy exceptions.

SIEM Integration

RHACS forwards findings to AWS Security Hub automatically when configured. Helps correlate container incidents with CloudTrail events and IAM activity.

For other SIEMs, use webhook integration:

## Forward violations to SIEM
roxctl central notifier create webhook \
    --name=\"security-operations\" \
    --endpoint=\"https://your-siem.company.com/webhooks/rhacs\"

RHACS Limitations

RHACS can't see node-level compromises, memory-only attacks, application-level attacks like SQL injection, or cloud API attacks that bypass Kubernetes.

Data retention affects investigations. Default retention may be too short. Longer retention costs more storage. Some compliance requires multi-year retention.

Large forensic queries impact RHACS performance. Database gets loaded with complex queries. Plan investigation during maintenance windows for complex cases.

RHACS Incident Response: Tools and Approaches Comparison

Response Approach	RHACS Capabilities	What You Actually Need	Reality Check	Worth It?
RHACS Standalone	Built-in violation analysis, network graph, runtime monitoring	Just RHACS	Fast but limited context	Good starting point
RHACS + SIEM	Policy violations, network flows, runtime events	Splunk or similar, storage costs	Better correlation but slower setup	Worth it if you have budget
RHACS + Cloud Security	Container security findings	AWS GuardDuty, etc.	Good if you're all-in on one cloud	Only if you're AWS-heavy
Manual Everything	CLI tools and export functions	Time, coffee, patience	Slow but you control everything	Only if you're broke

Common RHACS Incident Response Questions (When Everything's Breaking)

How quickly can RHACS detect a security incident?

Process detection happens in 3-5 seconds if the sensor is healthy. Network violations take 15-30 minutes because baseline learning is slow. CVE detection depends on image scanning

usually runs hourly but sometimes skips if the scanner is overloaded.RHACS misses memory-only attacks. "Suspicious Process Execution" flags curl and wget but can't tell legitimate API calls from data theft.Crypto mining detection works but smart miners that throttle CPU can hide.

Can RHACS automatically contain security incidents?

Admission controllers block new deployments but can't modify running containers. Network policy enforcement depends on CNI

Flannel doesn't support policies, Calico and Cilium work but are slow.Auto-remediation breaks applications. Blocking egress kills database connections and API integrations. DNS fails with restrictive policies.RHACS can't kill pods automatically. Automated RBAC changes risk locking out users during incidents.

What happens if RHACS Central goes down during an incident?

Sensors use cached policies for a couple days. After that, admission controllers fail open

new deployments bypass all security policies. Sensor pods show "Running" but stop collecting data.Database corruption requires backup restore. Recovery time depends on database size. Certificate expiry requires manual rotation on each cluster.

How do I investigate incidents across multiple clusters?

--all-clusters queries timeout with lots of clusters. Query individual clusters when this fails:bash# Workaround when --all-clusters times outfor cluster in $(roxctl central clusters list --output=json | jq -r '.[].name'); do roxctl central violations list --cluster="$cluster" --severity=HIGHdoneCross-cluster correlation requires external tools. Export violation data to SIEM for correlation.

Can RHACS help with forensic evidence collection for legal proceedings?

RHACS timestamps use container time, not UTC

so if your containers are in PST and your SIEM expects UTC, you're fucked. Process data truncates long commands after 1024 characters. Database backups don't preserve exact timestamps. No cryptographic checksums by default.Chain of custody breaks when multiple people access Central. Audit logs show who accessed data but not which specific violations were viewed.

How do I handle false positives during incident response?

"Privilege Escalation" triggers on every init container. "Unauthorized Network Flow" fires on CDN traffic until baselines learn patterns

takes weeks.Risk scores are complete garbage. apt update gets the same "High" score as actual crypto miners because the algorithm can't tell the difference between legitimate package management and malicious activity. Sort by violation count instead
repeated violations are usually just your shitty configs, one-offs might actually be attacks.Policy exceptions need exact matching. No wildcards. Each exception needs individual configuration.

How do I correlate RHACS data with other security tools?

Export violations to Splunk/QRadar for correlation with endpoint data. Use AWS Security Hub for cloud correlation. Forward network flows to security tools.Correlate external IPs with threat feeds. Check image hashes against known malicious images. Cross-reference CVE data with patch management.Automated data sharing is key

manual correlation during incidents is too slow.

How do I prepare my team for RHACS incident response?

RHACS training for security team. roxctl CLI familiarity for engineers.

Incident response playbooks for containers. Regular tabletop exercises.Clear escalation procedures. Evidence collection procedures. Communication templates. Authority delegation for containment.Dashboard bookmarks and saved searches. Pre-configured SIEM queries. Automated evidence collection scripts.Practice scenarios: crypto mining detection, data exfiltration investigation, supply chain compromise.Goal is muscle memory

at 2am you don't want to figure out procedures.

Essential RHACS Incident Response Resources

Related Tools & Recommendations

integration

Recommended

Stop Fighting Your CI/CD Tools - Make Them Work Together

When Jenkins, GitHub Actions, and GitLab CI All Live in Your Company

GitHub Actions

/integration/github-actions-jenkins-gitlab-ci/hybrid-multi-platform-orchestration

Quick Navigation

Finding the Real Problems

Process Analysis and Network Flows

Containment Reality

Evidence Collection Before Container Restart

RHACS Sensor Connectivity Issues

What RHACS Actually Records

Common RHACS CLI Failures During Incidents

Timeline Reconstruction

Network Flow Analysis

Process Investigation

Container Image Investigation

Cross-Cluster Investigation

Common Attack Scenarios

SIEM Integration

RHACS Limitations

How quickly can RHACS detect a security incident?

Can RHACS automatically contain security incidents?

What happens if RHACS Central goes down during an incident?

How do I investigate incidents across multiple clusters?

Can RHACS help with forensic evidence collection for legal proceedings?

How do I handle false positives during incident response?

How do I correlate RHACS data with other security tools?

How do I prepare my team for RHACS incident response?

Related Tools & Recommendations

Stop Fighting Your CI/CD Tools - Make Them Work Together

RHACS Troubleshooting Guide: Fix the Stuff That Breaks

RHACS Enterprise Deployment - Stop Fucking Around With Security At Scale

RHACS Performance Benchmarking & Capacity Planning Guide

RHACS Compliance Implementation: Stop Panicking When Auditors Show Up

RHACS - Scans Your Containers So They Don't Get You Fired

Stop RHACS from destroying your CI/CD pipeline and your will to live

RHACS Cost Analysis & Pricing Guide: Budget Without Breaking Security

Which Container Scanner Doesn't Suck?

RHEL - For When Your Boss Asks 'What If This Breaks?'

RHEL Security Hardening - Lock Down Your Linux Like You Actually Care About Security

Aqua Security Production Troubleshooting - When Things Break at 3AM

Twistlock vs Aqua Security vs Snyk Container - Which One Won't Bankrupt You?

Aqua Security - Container Security That Actually Works

Sysdig - Security Tools That Actually Watch What's Running

Red Hat OpenShift Container Platform - Enterprise Kubernetes That Actually Works

Docker vs Podman vs Containerd - 2025 安全性能深度对比

containerd - The Container Runtime That Actually Just Works

containerd 迁移避坑指南 - 三年血泪总结

Docker Daemon Won't Start on Windows 11? Here's the Fix