How Clair Actually Works (And Where It'll Break Your Day)

Look, Clair does one thing really well: it scans your container images for known vulnerabilities before they hit production. That's it. No runtime monitoring, no behavioral analysis - just static analysis of what packages are sitting in your Docker layers.

The beauty is in its simplicity. You throw a container manifest at Clair's API, it downloads your image layers (yes, all of them), figures out what packages you've got installed, and matches them against every CVE database it knows about. Then it tells you which ones are fucked.

Clair Architecture Diagram

Clair AWS Deployment Architecture

The Three-Phase Dance That'll Save Your Ass

Indexing is where Clair downloads your entire image and tears it apart layer by layer like a forensic autopsy. This is when you'll first notice your network bandwidth getting absolutely hammered - I clocked a 2GB ML container with 47 layers taking anywhere from 3 minutes to "holy shit it's been 20 minutes" depending on whether Docker Hub is having one of its mood swings.

Anyway, here's how this actually works - ClairCore (the actual scanning engine) rips through every damn layer and catalogs every package, Python wheel, and that weird shell script you definitely copy-pasted from StackOverflow.

Here's the kicker: Clair is smart about layer deduplication. If you're using the same base Ubuntu image (I think we were on 20.04, can't remember exactly) across 200 containers, it only downloads and scans those base layers once. This is why teams that standardize on base images see massive performance improvements - like, actually fast scan times instead of waiting around for minutes.

Matching happens every time you ask for a vulnerability report. This is the genius part: instead of storing vulnerability data with each scan, Clair keeps live vulnerability databases that update continuously. So when that new OpenSSL vulnerability drops at 2am, you don't need to rescan everything - just re-query the matcher and it'll tell you which of your 400 images are affected.

Notifications are where most people completely lose their shit trying to get webhooks working. You can set up webhooks to ping your Slack channel when new vulnerabilities hit your images, but I guarantee you'll spend at least 2 hours debugging why your JSON parsing keeps failing because the notification payload format is about as intuitive as assembly language.

What Clair Actually Supports (2025 Reality Check)

Linux distros that won't make you cry:

Language ecosystems where Clair won't let you down:

  • Python packages via pip/requirements.txt analysis (solid since day one)
  • Go modules and dependencies (ClairCore v4.8+ - finally works right)
  • Java JARs and Maven dependencies (spotty but improving)
  • OS packages via apt, yum, apk package managers (this is where it shines)

What it still sucks at: JavaScript/Node.js dependencies are a mess, Ruby gems are hit-or-miss, and don't even think about that random shell script you downloaded from GitHub. For comprehensive language coverage, you still need Trivy or Grype to pick up the pieces.

Performance Reality: It's Fast Until It Isn't

Those fast scan times Red Hat loves to brag about? That's for a basic Ubuntu container with maybe 200 packages. Try scanning a fucking TensorFlow image with CUDA libraries and you're looking at anywhere from 2-3 minutes on a good day to "I'll come back after lunch" on a bad one - assuming your network doesn't completely shit itself trying to pull whatever crazy amount of image data these ML people think is reasonable.

I've seen Clair supposedly handle millions of images on Quay.io, but that's with dedicated PostgreSQL clusters, Redis caching, and probably more AWS credits than your entire engineering budget for the next decade. In the real world, plan for maybe one Clair instance per 10,000 images if you want sub-minute scan times, and even then you'll probably end up disappointed.

The real performance killer is vulnerability database updates, and this is where everything goes to hell. When Ubuntu releases their daily security updates, every matcher instance needs to rebuild its correlation data. This can lock up scanning for anywhere from 5 minutes to "holy shit it's been half an hour and nothing's working" during peak hours. There's no good way to predict when this will happen or how long it'll take.

Clair vs. The Competition (Real Talk)

Tool

Summary

Details

Clair

Use when you need registry integration and don't mind PostgreSQL bullshit

  • Works at massive scale (Quay.io handles millions of images)* Apache 2.0 license means no vendor lock-in bullshit* But you're signing up to manage PostgreSQL, Redis, and a handful of microservices* Setup takes a full day if you know what you're doing, three if you don't* Database updates can lock up scanning for 15 minutes during peak hours* Red Hat backing means it's not disappearing, but updates come slowly

Trivy

Use when you want results in 30 seconds and don't care about perfect accuracy

  • One binary, immediate results
  • the dream for CI/CD pipelines* Covers Java

Script, Ruby, PHP

  • all the languages Clair pretends don't exist* Less accurate package detection than Clair, but fast enough you don't care* Perfect for "fail the build if anything critical" workflows* Free and actively maintained by Aqua Security

Snyk Container

Use when your company has budget and wants pretty dashboards

  • Executives love showing the vulnerability trends to other executives* Fix suggestions are actually helpful (when they're not completely wrong)* Pricing starts reasonable then scales with your success (and nightmares)* Works great until you hit API rate limits during peak CI/CD hours* The sales team will definitely call you

Grype

Use when you want Trivy's speed but with better accuracy

  • Faster than Clair, more accurate than Trivy (when it works)* Anchore's open source tool
  • company knows container security* Still new enough that you'll hit edge cases regularly* Good compromise between accuracy and speed* Less mature than Trivy but improving fast

Aqua Security

Use when someone else pays the bill and you need everything

  • Runtime protection, network policies, compliance reports
  • the full package* Enterprise support that actually responds to your tickets* Costs more than most engineer salaries* Overkill if you just want to scan containers* Don't use this in production unless you enjoy 2am phone calls about licensing costs

Actually Deploying Clair (Without Losing Your Mind)

The Docker Compose Trap Everyone Falls Into

Docker Logo

Everyone starts with the official Docker Compose example because it looks so fucking simple. Just docker-compose up and boom, you're scanning containers like a pro, right?

Yeah, no. The compose file works great for about 5 minutes until you try to scan anything more complex than a "hello world" container, then you discover:

  • PostgreSQL runs out of connections at 100 concurrent scans
  • Redis memory limits kill the process during large image indexing
  • Clair containers restart-loop when the database goes down for 30 seconds

Redis Logo

The docker-compose.yaml is a starting point, not a production deployment. You'll spend your first week fixing database connection pools, memory limits, and network timeouts. But hey, at least you learn how all the pieces fit together.

PostgreSQL Elephant Logo

Kubernetes: Where Real Deployments Go to Die

Kubernetes Logo

Kubernetes deployment should be straightforward - Red Hat even provides Helm charts. Except those charts assume you know what you're doing with PostgreSQL clustering, Redis persistence, and ingress configuration.

The docs skip the painful reality:

Why Your Database Will Hate You: A single Clair indexer can absolutely destroy a basic PostgreSQL instance - learned this one the hard way when our "development" database started sweating bullets under a production workload. Plan for at least 4 CPU cores and 8GB RAM for the database, or you'll sit there watching scan queues back up like cars at a traffic jam while PostgreSQL wheezes through vulnerability correlation queries that should take milliseconds.

Network Timeouts Will Kill You: Clair downloads entire container images during indexing. That 5GB ML container with 73 layers? It's going to timeout on the default Kubernetes service mesh settings. Bump your ingress timeouts to 10+ minutes or prepare for endless "failed to index" errors.

Resource Limits Are Lies: The Helm chart suggests 1GB memory limits for indexer pods, which is like bringing a knife to a gun fight. I've personally watched single container scans spike to 3GB+ when trying to analyze some data scientist's TensorFlow monstrosity with 200+ Python packages. Set realistic limits or spend your afternoon wondering why your pods keep mysteriously dying mid-scan.

Registry Integration: The Holy Grail (That Actually Works)

This is where Clair shines. Instead of manually submitting images, registries can trigger scans automatically via webhooks. Harbor includes Clair as a built-in option, Quay.io runs it in production.

But webhook integration means debugging network connectivity between your registry and Clair instances. When scans randomly stop working, it's usually because:

  • Registry webhook timeouts (Clair indexing takes longer than webhook timeout)
  • Authentication failures between registry and Clair
  • Network policies blocking registry → Clair communication

Configuration Hell: Where YAML Goes to Die

The config.yaml.sample file is 300+ lines of YAML that controls everything. Getting it wrong means nothing works, getting it half-right means intermittent failures that'll drive you crazy.

Vulnerability Data Sources: The default config enables Ubuntu USN, Debian DSA, Red Hat RHSA, and PyPI advisories. Each source hammers external APIs during updates. Too many sources = rate limiting. Too few = missed vulnerabilities.

Database Connection Nightmares: PostgreSQL connection string format is unforgiving. One typo in the SSL parameters and Clair won't start. The connection examples help, but you'll still waste 2 hours debugging sslmode=require vs sslmode=verify-full.

Webhook Configuration: Notification webhooks use a JSON payload format that changes between versions. Your Slack integration will break silently, and you won't notice until someone asks why vulnerability alerts stopped weeks ago.

Production Gotchas That'll Ruin Your Weekend

Internet Access Requirements: Clair needs to download vulnerability databases from NVD, Ubuntu, Debian, etc. Air-gapped environments are possible but require vulnerability database mirroring - expect a weekend project to get it right.

Database Performance Cliffs: Vulnerability correlation queries get expensive fast. Once you hit 100,000+ indexed images, PostgreSQL query performance falls off a cliff without proper indexing and maintenance. I've watched deployments go from "working fine" to "completely fucked" over a weekend when this happens. Plan for dedicated database instances and regular VACUUM operations.

Memory Usage Spikes: Indexing large container images (looking at you, TensorFlow) can spike memory usage to 4GB+ per worker. Size your containers accordingly or watch them get killed by the OOM reaper mid-scan.

High Availability Complexity: Multiple Clair instances share PostgreSQL and Redis state, but coordinating updates and webhook deliveries gets tricky. The HA deployment guide makes it sound simple - it's not. Plan for load balancer configuration, database failover, and split-brain scenarios.

Questions Nobody Wants to Ask (But Everyone Should)

Q

Wait, Clair doesn't monitor running containers?

A

Nope, it's static analysis only.

Clair scans your images before deployment and tells you what vulnerabilities exist in the packages. It won't catch someone exploiting those vulnerabilities at runtime

  • that's Falco's job or whatever expensive runtime security platform your CISO bought.

Think of it this way: Clair tells you your front door lock is broken, but it won't stop the burglar from walking through it.

Q

Can it scan our private registry that requires seventeen different authentication methods?

A

Probably, but you'll hate the configuration process. Clair supports basic auth, bearer tokens, and certificate-based auth for private registries like Harbor, Quay Enterprise, and AWS ECR.The fun part is when your registry uses some custom auth proxy, or when certificates expire, or when your network team changes firewall rules without telling you. Expect to debug connectivity issues every few months.

Q

What happens when Clair finds hundreds of "critical" vulnerabilities?

A

Clair reports everything it finds

  • it doesn't decide what's actually critical for your environment.

You'll get alerts for every CVE in your base Ubuntu image, including stuff for packages you don't even use.Most teams implement Open Policy Agent or similar policy engines to filter results. Otherwise, you'll be drowning in alerts for theoretical vulnerabilities in packages your application never touches.

Q

How often do vulnerability databases update (and break everything)?

A

Vulnerability sources update constantly

  • Ubuntu USN daily, Debian DSA as needed, NVD multiple times per day. Each update can trigger matcher rebuilds that lock up scanning for 5-15 minutes.I've seen production deployments get blocked because Ubuntu released a security update mid-deployment and Clair was rebuilding its correlation database. Fun times.
Q

Does it work in our air-gapped environment that can't talk to the internet?

A

Technically yes, practically no.

You need to mirror vulnerability databases and container registries, sync them regularly, and pray nothing breaks. Red Hat's air-gapped deployment guide makes it sound simple

  • it's a nightmare.Expect to spend weeks setting up database synchronization, only to discover you're missing critical vulnerability updates because one mirror failed silently.
Q

Why doesn't it support our Node.js/Ruby applications fully?

A

Because ClairCore development focuses on the most common container ecosystems first. As of v4.8+, Clair supports Python, Go modules, and Java JARs with varying degrees of completeness. JavaScript and Ruby analyzers are still limited.For comprehensive language support, use Trivy or Grype instead. They're not as accurate for OS packages but cover way more languages.

Q

How do I deal with all the false positives?

A

You build a whitelist of accepted vulnerabilities and pray your security team doesn't audit it. Most organizations maintain exception lists for:

  • CVEs in packages that aren't exposed (database drivers in web-only containers)
  • Vulnerabilities with no available fixes
  • Low-priority issues that would take longer to fix than the container lifespan
Q

Can it find secrets or malware in our images?

A

Hell no. Clair only matches known CVE databases against detected packages. For secrets, use TruffleHog. For malware, hope your base images aren't compromised and run ClamAV if you're paranoid.

Q

Why is scanning so slow for our machine learning images?

A

Because your TensorFlow container is 8GB with 73 layers and Clair downloads every single byte to analyze package contents. Network bandwidth, layer deduplication, and package analysis all impact performance.Smaller base images scan faster. Standardized base images (same layers across containers) scan much faster due to layer caching. Your monolithic ML image with custom-compiled everything will always be slow.

Q

How do I fix Clair when it randomly stops working?

A

Check these in order:

  1. PostgreSQL connection pool exhaustion (most common)
  2. Vulnerability database update failures
  3. Network connectivity to external CVE sources
  4. Memory exhaustion during large image scanning
  5. Webhook authentication failures with your registry
    The logs will tell you what's broken, but decoding the error messages takes practice. When in doubt, restart everything and see what breaks first.

Resources That Actually Help