Why Engineers Actually Use Falco (And Why You Should Too)

Falco Architecture Overview

Look, I've been burned by security tools before. Spent 6 months with Twistlock that burned through 15% CPU and alerted every time someone ran ps aux. Falco is different - been running it in production for 2+ years and the worst performance hit I've seen is 7% CPU on a database node running like shit anyway. The CNCF graduation status (February 2024) gave our compliance team confidence, plus you can actually audit what it's doing since it's open source.

What Falco Actually Does

Falco sits on your Linux systems watching syscalls through eBPF (or kernel modules if you're stuck on older kernels). When someone tries to escape a container, escalate privileges, or run a reverse shell, Falco catches it in real-time and sends you an alert that actually means something. The modern eBPF driver uses CO-RE which means it works across kernel versions without recompilation.

The key difference from other tools: Falco knows the difference between your application doing legitimate work and someone trying to pwn your system. I've seen it catch everything from cryptominers to privilege escalation attempts that other tools missed completely. Check out the detection capabilities - it's not just another log parser.

Real Production Experience

CPU overhead: Despite the bullshit marketing claims, I've measured 1-3% CPU usage on busy nodes. Version 0.38.x had memory leaks with high syscall volumes - make sure you're on 0.39+ if you're monitoring chatty applications. The current stable version as of September 2025 is 0.41.x, which includes significant performance improvements and better eBPF probe reliability. The performance impact varies significantly based on your syscall volume.

Memory usage: Starts around 50MB but scales with your rule complexity. I've seen it climb to 200MB+ on nodes with aggressive custom rules and verbose logging enabled. Monitor the built-in metrics if you're running tight on resources - Falco exposes Prometheus endpoints starting from 0.38.

Event volume: Can handle thousands of events per second, but your SIEM integration will be the bottleneck. Learned this when our intern deployed a crypto miner and Falco sent 47,000 alerts in 8 minutes, completely destroying our Splunk cluster. That was a fun Monday morning. Check the event dropping documentation if you're seeing ratelimit errors.

The Three Driver Options (And Which to Use)

Modern eBPF (recommended): Uses CO-RE technology so it works across kernel versions without recompilation. Requires kernel 5.8+ with BTF support enabled. This is what you want unless you have a specific reason not to use it. Default since Falco 0.38.0, with significant stability improvements in 0.40+.

Classic eBPF: Still requires kernel headers but more compatible than Modern eBPF. Use this if Modern eBPF doesn't work on your distro. The libs repository has performance comparisons between drivers.

Kernel Module: Maximum compatibility but requires root and kernel headers. Only use this if eBPF completely fails on your system. Check the host installation docs for kernel module setup.

Pro tip: The Modern eBPF driver fails to load on RHEL 7.6 nodes constantly - saw bpf_map_create failed: Operation not permitted errors for 3 days before realizing our kernel was compiled without BTF support. Always have kernel headers installed as a fallback, and check the troubleshooting guide when you see those cryptic eBPF errors.

Integration Hell (And How to Avoid It)

Falcosidekick Integration Architecture

Falco has 50+ output integrations through Falcosidekick but most are community plugins with varying quality. Here's what actually works in production:

Slack/Teams: Works great for initial setup, becomes noise after a week
Elasticsearch: Solid if you already have ELK, pain in the ass to set up from scratch - see the Elastic integration guide
S3: Cheap storage for compliance logging
Webhook: Most flexible - build your own integration using the webhook documentation

Don't try to send every alert to your SIEM initially. Start with critical alerts only or you'll get buried in false positives. The rule adoption guide has good strategies for tuning.

Kubernetes Deployment Reality

Speaking of getting things working properly, let's talk about the actual deployment process.

The Kubernetes operator is still in tech preview (as of 0.41.0) and I've seen it crash on YAML edge cases. Stick with the official Helm charts unless you like debugging operator logs at 3am.

## This actually works:
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco --namespace falco-system --create-namespace

The DaemonSet approach ensures every node gets monitored, but watch out for kernel header issues on mixed node types. Auto-scaling can break if new node AMIs don't have headers pre-installed. Check the EKS deployment guide for AWS-specific gotchas, or the falcoctl documentation for artifact management.

But getting Falco installed is just the beginning. The real challenge starts when you try to deploy it in production and discover all the ways it can break. Let's dive into the painful reality of actually making this thing work reliably.

The Painful Truth About Deploying Falco in Production

Now that you understand what Falco is and why it's useful, let's get into the real shit - actually deploying this thing in production without wanting to throw your laptop out the window.

Rule Tuning: Where Most People Give Up

The default Falco rules will flood you with false positives. Took me 3 weeks to tune them for our environment. The rules repository contains the source code if you need to understand what's triggering. Here's what actually works:

Start with these priorities only:

  • Terminal shell in container - catches obvious container escapes
  • Write below binary dir - detects people modifying system binaries
  • Create files below dev - spots privilege escalation attempts

Don't enable everything at once or you'll get 500 alerts about package managers doing normal shit.

Common Gotchas That Will Bite You

Kernel Headers Hell: Modern eBPF sounds great but I've seen it fail on:

Keep this handy for when eBPF fails:

## Check if headers exist
ls /lib/modules/$(uname -r)/build

## If missing, Falco falls back to kernel module
## which requires: apt-get install linux-headers-$(uname -r)

The startup troubleshooting guide covers these scenarios in detail.

Memory Leaks You'll Hit: Version 0.38.2 has a memory leak that will fuck your day up. Watched our database node's Falco pod climb from 80MB to 1.2GB over 6 hours before OOMKilling itself and taking down alerting for the entire cluster. Happened twice before I realized what was going on. Upgrade to 0.39+ or you'll be debugging this at 3am like I was. Monitor with the built-in Prometheus metrics or set aggressive memory limits.

Container Runtime Issues: Falco integration breaks differently on each runtime:

  • Docker: Works fine, just make sure socket is accessible
  • containerd: Needs CRI socket configuration (--cri /run/containerd/containerd.sock)
  • CRI-O: Pain in the ass - requires proper SELinux contexts on RHEL

Performance Reality Check

Once you get past the initial setup hurdles, you'll want to know what this thing actually costs you in terms of resources.

Don't believe the "2% CPU overhead" marketing bullshit. Here's real numbers from our 200-node cluster - similar to what the performance documentation shows:

CPU usage scales with syscall volume:

  • Idle nodes: 0.5-1% CPU
  • Application nodes: 1-3% CPU
  • Database/high-I/O nodes: 3-7% CPU

Memory grows with rule complexity:

  • Default rules: ~50MB
  • Custom rules + verbose logging: 150-200MB
  • Poorly written rules with regex: 500MB+ (I learned this the hard way)

Use the Grafana dashboard to track actual resource usage and the performance tuning guide for optimization.

Monitor these metrics or you'll get surprised:

kubectl top pod -n falco-system
## Watch for OOMKilled pods - your rules are probably too aggressive

Integration War Stories

Elasticsearch: Works fine until you get owned. During a cryptomining incident, Falco sent 47,853 alerts in 12 minutes and completely destroyed our 3-node ES cluster. Had to restore from backup and explain to management why our logging was down for 4 hours. Rate limiting is not optional.

Slack Integration: Started useful, became the most annoying thing on the planet within 72 hours. Our #alerts channel got 2,400 messages in one day about package managers doing normal shit. Only send CRITICAL and ERROR priority alerts to Slack or your team will mute the channel forever.

S3 Storage: Cheap compliance logging that isn't actually cheap. Hit $347/month in S3 costs before I realized we were storing 600GB of debug logs. Implement log rotation immediately or prepare to explain your AWS bill.

Cloud Deployment Gotchas

AWS EKS: Works fine but Graviton ARM nodes need different container images. Mixed architecture clusters will break your DaemonSet. Check the EKS-specific deployment guide for Bottlerocket compatibility.

GKE: Google's hardened nodes block some eBPF features. Use kernel module driver or switch to standard nodes if you need full functionality. The GKE setup tutorial covers network policy issues.

Azure AKS: SELinux policies can interfere with Falco. Disable SELinux on AKS node pools or you'll get permission errors. The container runtime documentation has AKS-specific notes.

The Plugin Ecosystem Reality

Falco's plugin ecosystem is hit-or-miss. The official plugin repository has varying quality:

Actually work in prod:

  • CloudTrail plugin: Solid for AWS API monitoring
  • Kubernetes audit logs: Essential if you're monitoring K8s API abuse

Buggy/experimental:

  • Okta plugin: Crashes on malformed events
  • GitHub plugin: Rate limiting issues with large orgs

Build your own: The plugin SDK documentation is decent if you know Go. We built a custom plugin for our internal API monitoring that's been rock solid for 18 months. Check the plugin development guide for getting started.

Compliance and Enterprise Use

Despite the rough edges, Falco handles compliance well once tuned. We use it for:

  • SOC 2: Runtime monitoring and incident response
  • PCI-DSS: File integrity monitoring and access controls
  • GDPR: Data access auditing (with custom rules)

The CNCF graduation gives our compliance team confidence, plus having open source code means we can actually audit what it's doing. Check out the enterprise case studies and Incepto Medical's production deployment for real-world compliance examples.

After fighting through all these deployment challenges, you're probably wondering if you should just pay for a commercial alternative instead. The grass always looks greener when you're debugging kernel module issues at 3am. Let's look at how Falco stacks up against the competition.

Real Talk: Falco vs The Competition

Tool

Cost

Performance

Setup time

Support

Best for

Falco (Free but not cheap)

Free to download, expensive to maintain. Budget a full-time engineer or prepare for 3am debugging sessions.

1-7% CPU depending on workload. Forget the "2%" marketing bullshit

  • I've seen 12% on busy database nodes.

1 week if you're lucky, 3 months if you actually want it tuned properly. Rule tuning never ends.

Community Slack is decent

  • maintainers respond. Stack Overflow is a wasteland of outdated answers.

Teams with masochistic tendencies and strong Linux skills.

Sysdig Secure (Falco's commercial sibling)

Starts around $35/node/month as of 2025, scaling to $50+ with threat intelligence and response features.

Similar to Falco since it's built on the same engine, but with better resource management.

1-2 days for basic deployment, their UI is actually usable and includes drag-drop rule editing.

Excellent

  • they created Falco so they know it inside-out. 24/7 support included.

Teams who want Falco's power without the maintenance headaches.

Aqua Security (Enterprise heavy)

Enterprise pricing (think $50K+ annually for decent cluster coverage).

~3-5% resource overhead, more intrusive than Falco.

Few hours with their installer, but lots of configuration needed.

Good commercial support, comprehensive docs.

Large enterprises with compliance requirements and budget.

Datadog Security (If you're already on Datadog)

Adds ~$15/host to your existing Datadog bill.

Lightweight but limited compared to eBPF solutions.

30 minutes if you're already using Datadog agents.

Same as Datadog

  • generally solid.

Teams already invested in Datadog ecosystem.

Wiz (Cloud-first approach)

Enterprise pricing, expensive but comprehensive.

Mostly agentless, minimal impact.

Quick for cloud resources, longer for runtime monitoring.

New company but solid engineering team.

Cloud-native teams focused on CSPM + runtime.

Questions I Actually Get About Falco

Q

Why does Falco keep crashing with "can't load eBPF probe"?

A

This error makes me want to throw my laptop. Happens constantly on mixed environments, especially when some genius decides to mix kernel versions. Check these in order before you lose your mind:

## 1. Do you have kernel headers?
ls /lib/modules/$(uname -r)/build || echo "No headers found"

## 2. Is your kernel too old?
uname -r
## Modern eBPF needs 4.18+, Classic eBPF needs 4.14+

## 3. Try forcing kernel module fallback
helm install falco falcosecurity/falco --set driver.kind=module

Most common culprit: AWS's brilliant EKS Bottlerocket nodes don't include kernel headers because fuck developers, I guess. Switch to Amazon Linux 2 AMIs or prepare to deal with kernel module compilation failures.

Q

Why am I getting 500 alerts per minute about normal operations?

A

Default rules are aggressive as hell. Start by disabling these noise generators:

## In your values.yaml
falco:
  rules_file:
    - /etc/falco/k8s_audit_rules.yaml
    - /etc/falco/rules.d
  # Disable these initially:
  # - Write below etc
  # - Read sensitive file trusted after startup
  # - Package management process launched

Took me 3 weeks to tune rules for our microservices environment and I still get alerts when npm runs postinstall scripts. Build up gradually or you'll be that person who turns off security alerts because they're annoying.

Q

How much will this actually cost my CPU and memory?

A

Forget the marketing numbers. Here's real production data from our 200-node cluster:

Memory: Starts at ~50MB, grows to 200MB+ once you add custom rules. Hit 847MB on one node because I wrote a regex rule that was basically .*.*.* - don't be that stupid.

CPU: Scales with syscall volume and your rules don't suck:

  • Idle/batch nodes: 0.5-1% (best case)
  • Web applications: 1-3% (if you're lucky)
  • Databases/high-I/O: 3-7% (saw 14% on our shitty Cassandra node that was already dying)

Monitor with kubectl top pod and watch for OOMKilled restarts.

Q

Why does Falco keep dropping events?

A

Event dropping happens when your rules can't keep up with syscall volume. Fix it:

## Increase buffer sizes
driver:
  initContainer:
    env:
      - name: FALCO_BPF_PROBE
        value: ""
  config:
    syscall_event_drops:
      threshold: 0.1
      actions:
        - log
        - alert
      rate: 0.03333
      max_burst: 1000

Also check CPU throttling - Falco pods getting CPU-limited will drop events like crazy.

Q

Does the Kubernetes operator actually work?

A

The operator is still tech preview (0.41.0) and it's buggy as shit. Crashed on me with failed to parse yaml: line 47: mapping values are not allowed in this context and other helpful error messages that tell you nothing.

Stick with Helm charts for production unless you enjoy debugging operator logs:

## This actually works reliably:
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco-system \
  --create-namespace \
  --set falco.grpc.enabled=true
Q

How do I integrate with Elasticsearch without destroying my cluster?

A

Rate limiting is mandatory unless you like explaining to your CTO why Elasticsearch is down. We completely destroyed our 5-node ES cluster when Falco sent 52,847 alerts in 11 minutes during a cryptomining incident:

## In falcosidekick config
elasticsearch:
  hostport: "your-es-cluster:9200"
  index: "falco"
  type: "_doc"
  minimumpriority: "warning"  # Don't send everything
  mutualtls: false
  customHeaders:
    - name: "x-rate-limit"
      value: "1000"

Start with WARNING+ priority only, then tune down if needed.

Q

Why won't Falco work on my GKE hardened nodes?

A

Google's hardened GKE nodes block some eBPF functionality for security. You have two options:

  1. Switch to standard nodes (what we did)
  2. Use kernel module driver with --set driver.kind=module

The hardened nodes also have restricted filesystem access that breaks some file-monitoring rules.

Q

What's the deal with all these plugins?

A

The plugin ecosystem is hit-or-miss:

Actually stable:

  • CloudTrail: Works well for AWS API monitoring
  • Kubernetes audit logs: Essential if you're monitoring API abuse

Still buggy:

  • Okta plugin: Crashes on malformed API responses
  • GitHub plugin: Rate limiting issues with large orgs

Build your own: The Go SDK is decent. We built a custom plugin for internal API monitoring that's been solid for 18 months.

Q

Should I use Falco or just pay for Sysdig?

A

If you have strong platform engineering skills and time to maintain it: Falco.

If you want Falco's capabilities without the operational overhead: Sysdig Secure. It's literally Falco with professional support and a UI that doesn't suck.

If your team struggles with Kubernetes troubleshooting: Don't use Falco. You'll spend more time debugging it than the security value you'll get.

Q

How do I tune rules without going insane?

A

Start minimal and build up. Here's my proven approach:

  1. Only enable container escape detection
  2. Add privilege escalation rules
  3. Add file modification monitoring
  4. Custom rules based on your specific threats

The default rules are designed for demo environments, not production. Every rule will need tuning for your specific applications.

Alright, you've survived the FAQ gauntlet and hopefully gotten some useful answers. Now let's wrap this up with the resources that will actually help you succeed with Falco, instead of wasting time on outdated tutorials and abandoned projects.

Actually Useful Falco Resources (Not Marketing Fluff)

Related Tools & Recommendations

tool
Similar content

Django Production Deployment Guide: Docker, Security, Monitoring

From development server to bulletproof production: Docker, Kubernetes, security hardening, and monitoring that doesn't suck

Django
/tool/django/production-deployment-guide
100%
troubleshoot
Similar content

Fix Docker Daemon Not Running on Linux: Troubleshooting Guide

Your containers are useless without a running daemon. Here's how to fix the most common startup failures.

Docker Engine
/troubleshoot/docker-daemon-not-running-linux/daemon-startup-failures
82%
tool
Similar content

Open Policy Agent (OPA): Centralize Authorization & Policy Management

Stop hardcoding "if user.role == admin" across 47 microservices - ask OPA instead

/tool/open-policy-agent/overview
79%
integration
Recommended

Setting Up Prometheus Monitoring That Won't Make You Hate Your Job

How to Connect Prometheus, Grafana, and Alertmanager Without Losing Your Sanity

Prometheus
/integration/prometheus-grafana-alertmanager/complete-monitoring-integration
78%
tool
Similar content

Flux GitOps: Secure Kubernetes Deployments with CI/CD

GitOps controller that pulls from Git instead of having your build pipeline push to Kubernetes

FluxCD (Flux v2)
/tool/flux/overview
68%
tool
Similar content

Nx Monorepo Overview: Caching, Performance & Setup Guide

Monorepo build tool that actually works when your codebase gets too big to manage

Nx
/tool/nx/overview
66%
tool
Similar content

Hugging Face Inference Endpoints: Secure AI Deployment & Production Guide

Don't get fired for a security breach - deploy AI endpoints the right way

Hugging Face Inference Endpoints
/tool/hugging-face-inference-endpoints/security-production-guide
62%
tool
Similar content

Binance API Security Hardening: Protect Your Trading Bots

The complete security checklist for running Binance trading bots in production without losing your shirt

Binance API
/tool/binance-api/production-security-hardening
62%
tool
Similar content

Fix Docker Exit Code 137: Prevent OOM Kills in Containers

When Docker containers die with "exit code 137" in production, you're looking at the OOM killer doing its job. Here's how to debug, prevent, and handle containe

Docker Engine
/tool/docker/fixing-oom-errors
62%
tool
Similar content

BentoML Production Deployment: Secure & Reliable ML Model Serving

Deploy BentoML models to production reliably and securely. This guide addresses common ML deployment challenges, robust architecture, security best practices, a

BentoML
/tool/bentoml/production-deployment-guide
60%
tool
Similar content

Podman: Rootless Containers, Docker Alternative & Key Differences

Runs containers without a daemon, perfect for security-conscious teams and CI/CD pipelines

Podman
/tool/podman/overview
60%
troubleshoot
Similar content

Git Fatal Not a Git Repository: Enterprise Security Solutions

When Git Security Updates Cripple Enterprise Development Workflows

Git
/troubleshoot/git-fatal-not-a-git-repository/enterprise-security-scenarios
56%
tool
Similar content

Node.js Security Hardening Guide: Protect Your Apps

Master Node.js security hardening. Learn to manage npm dependencies, fix vulnerabilities, implement secure authentication, HTTPS, and input validation.

Node.js
/tool/node.js/security-hardening
54%
tool
Similar content

Trivy & Docker Security Scanner Failures: Debugging CI/CD Integration Issues

Troubleshoot common Docker security scanner failures like Trivy database timeouts or 'resource temporarily unavailable' errors in CI/CD. Learn to debug and fix

Docker Security Scanners (Category)
/tool/docker-security-scanners/troubleshooting-failures
54%
tool
Similar content

Git Disaster Recovery & CVE-2025-48384 Security Alert Guide

Learn Git disaster recovery strategies and get immediate action steps for the critical CVE-2025-48384 security alert affecting Linux and macOS users.

Git
/tool/git/disaster-recovery-troubleshooting
54%
tool
Similar content

GraphQL Production Troubleshooting: Fix Errors & Optimize Performance

Fix memory leaks, query complexity attacks, and N+1 disasters that kill production servers

GraphQL
/tool/graphql/production-troubleshooting
54%
tool
Similar content

npm Enterprise Troubleshooting: Fix Corporate IT & Dev Problems

Production failures, proxy hell, and the CI/CD problems that actually cost money

npm
/tool/npm/enterprise-troubleshooting
54%
howto
Similar content

Lock Down Kubernetes: Production Cluster Hardening & Security

Stop getting paged at 3am because someone turned your cluster into a bitcoin miner

Kubernetes
/howto/setup-kubernetes-production-security/hardening-production-clusters
49%
tool
Similar content

Optimize Docker Security Scans in CI/CD: Performance Guide

Optimize Docker security scanner performance in CI/CD. Fix slow builds, troubleshoot Trivy, and apply advanced configurations for faster, more efficient contain

Docker Security Scanners (Category)
/tool/docker-security-scanners/performance-optimization
49%
tool
Similar content

Jenkins Production Deployment Guide: Secure & Bulletproof CI/CD

Master Jenkins production deployment with our guide. Learn robust architecture, essential security hardening, Docker vs. direct install, and zero-downtime updat

Jenkins
/tool/jenkins/production-deployment
49%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization