Why does Trivy exit with code 137 when scanning large containers?

Exit code 137 means your container got OOMKilled. Docker killed Trivy because it exceeded memory limits. Java apps are the worst - they take forever because they have 847 different dependency files scattered everywhere. **Quick fix:** `docker run --memory=8g aquasec/trivy:latest image your-image:latest` **Better fix:** Use server mode so you're not fighting Docker's memory limits every scan.

Trivy hangs on "Downloading vulnerability DB" for 20+ minutes. What's wrong?

GitHub's API rate limiting kicked in. Without a token, you get 60 requests per hour. With proper auth, you get 5,000. But even then, corporate networks with shitty proxies will timeout the downloads. **Copy this:** `export GITHUB_TOKEN="ghp_your_token_here"` **Nuclear option:** Pre-download the databases: `trivy image --download-db-only && trivy image --skip-db-update your-image:latest`

Why does scanning work locally but fail in CI/CD?

Resource constraints. Your laptop has 32GB RAM and fast internet. Your CI runner has 2GB RAM and routes through a proxy that times out after 5 minutes. Different environments, different problems. **Check first:** Available memory in your CI environment **Fix:** Use dedicated scanning instances or server mode

Can I scan without internet access?

Yes, but it's a pain in the ass. Download databases on a connected machine, copy them to your air-gapped environment. Our security team requires this but won't approve the process to update DBs - took 3 months to fix that bureaucratic clusterfuck. **Process:** Download with `--download-db-only`, copy cache directory, scan with `--skip-db-update`

Trivy says "database not found" after downloading. What gives?

Database download was corrupted or interrupted. This happens with unreliable networks or aggressive proxy timeouts. Delete the cache directory and try again. **Location:** `~/.cache/trivy/` on Linux, `/Users/username/Library/Caches/trivy/` on macOS **Fix:** `rm -rf ~/.cache/trivy && trivy image --download-db-only`

How long should a Trivy scan actually take?

Depends entirely on image complexity: - Alpine Linux: 30 seconds - Node.js app: 2-5 minutes - Spring Boot app: 10-15 minutes - TensorFlow/ML image: 30+ minutes (or failure) If you're hitting timeout errors before these timeframes, you have infrastructure problems, not Trivy problems.

Trivy works fine for small images but crashes on anything over 2GB. Why?

Memory usage isn't proportional to image size - it's proportional to dependency complexity. A 500MB Spring Boot app will consume more memory than a 2GB Ubuntu base image because of JAR file analysis. **Rule of thumb:** - 2GB minimum for any real application - 4GB for Java hell - 8GB for ML clusterfucks

Why does `--timeout` not fix my timeout issues?

Because timeout isn't your problem - resource exhaustion is. `--timeout 30m` doesn't magically give you more memory or CPU. It just tells Trivy to wait longer before giving up. **Real fix:** Address the underlying resource constraint, not the timeout setting.

Can I exclude certain files to make scanning faster?

Yes, but be careful what you skip: ```bash trivy image --skip-files "*.log,*.tmp" --skip-dirs "/tmp,/var/cache" your-image:latest ``` Don't skip JAR files or package managers unless you want incomplete vulnerability data.

Trivy scanning breaks my Docker socket permissions. How do I fix this?

Known issue in Docker 20.10.18. Either downgrade to 20.10.17 or fix permissions after scanning: ```bash sudo chmod 666 /var/run/docker.sock ``` **Better solution:** Run Trivy in a dedicated container with proper socket mounting.

Currently viewing the AI version

Switch to human version

Trivy Container Scanning: Production Troubleshooting Guide

Critical Configuration Requirements

Memory Resource Allocation

Production Minimums:

Alpine Linux containers: 2GB RAM
Node.js/Python applications: 4GB RAM
Java/Spring Boot applications: 8GB RAM
ML frameworks (TensorFlow/PyTorch): 16GB+ RAM

Failure Pattern: Memory consumption spikes during JAR analysis phase, not gradually. Container sits at 512MB for 10 minutes, then instantly consumes 6GB during dependency analysis.

Exit Code 137 Solution:

docker run --memory=8g --memory-swap=16g aquasec/trivy:latest image your-spring-boot-app:latest

Database Download Configuration

GitHub API Rate Limits:

Unauthenticated: 60 requests/hour
Authenticated: 5,000 requests/hour

Required Authentication:

export GITHUB_TOKEN="your-personal-access-token"
trivy image --timeout 20m your-image:latest

Pre-download Strategy (85% success rate):

# Separate database download from scanning
trivy image --download-db-only
trivy image --download-java-db-only
trivy image --skip-db-update your-image:latest

Failure Modes and Solutions

Memory Exhaustion (OOMKilled/Exit Code 137)

Success Rate: 90% with memory increase

Symptom: Container killed during JAR analysis phase
Root Cause: Resource limits exceeded during dependency tree analysis
Solution: Increase container memory limits or use server mode

Database Download Timeouts

Success Rate: 85% with pre-download strategy

Symptom: Hangs on "Downloading vulnerability DB" for 20+ minutes
Root Cause: Network timeouts, proxy interference, or API rate limiting
Solution: GitHub token authentication + pre-download databases

Network/Proxy Issues

Corporate Environment Failures:

SSL inspection breaks signature verification
VPN packet loss corrupts database cache
Proxy timeouts during database downloads

Configuration Fix:

export HTTP_PROXY="http://proxy.company.com:8080"
export HTTPS_PROXY="http://proxy.company.com:8080"
export NO_PROXY="localhost,127.0.0.1,.company.com"

Performance Characteristics

Scan Duration by Image Type

Alpine Linux: 30 seconds
Node.js applications: 2-5 minutes
Spring Boot applications: 10-15 minutes
TensorFlow/ML images: 30+ minutes (high failure risk)

Resource Usage Patterns

Memory consumption follows predictable pattern:

Low usage during setup (512MB)
Massive spikes during JAR analysis (6-8GB)
Gradual decline during vulnerability matching

Production Architecture Solutions

Server Mode (95% Success Rate)

Separates resource-intensive scanning from client environment:

# Run dedicated scanning server
trivy server --listen 0.0.0.0:4954 --cache-dir /tmp/trivy-cache

# Client scanning with resource isolation
trivy image --server YOUR_TRIVY_SERVER:4954 your-image:latest

CI/CD Integration Requirements

Dedicated Scanning Infrastructure:

Separate scanning nodes from build infrastructure
Minimum t3.medium instances (4GB RAM)
Shared cache volumes for database reuse
Proper timeout configuration (20m+ for complex images)

Resource Planning Multipliers:

Base workload + 3-5x multiplier for complex images
Peak memory during dependency analysis phases
Account for scanning spikes during CI/CD peak hours

Critical Warnings

What Will Break in Production

t2.micro instances - Insufficient memory for real applications
Shared CI/CD resources - Resource contention causes failures
Default timeouts - Inadequate for complex dependency analysis
Missing GitHub tokens - API rate limiting guaranteed failure
Corporate proxies without SSL bypass - Database corruption

Hidden Costs

Time Investment: 3+ months to resolve enterprise network issues
Expertise Requirements: DevOps + security team coordination
Infrastructure Costs: Dedicated scanning instances required
Maintenance Overhead: Daily database refresh automation needed

Alternative Solutions

When Trivy Fails

Fallback Scanning Tools:

Grype: Lower memory usage, faster scanning
Snyk: Better Docker Desktop integration
Anchore: Enterprise policy management

Decision Criteria:

Memory constraints → Grype
Enterprise policies → Anchore
Developer workflow → Snyk
Air-gapped environments → Custom database management

Monitoring and Prevention

Success Metrics

Scan completion rate: >95%
Zero OOMKilled failures per week
Database download success: >99%
Average scan time within 2x baseline

Infrastructure Monitoring

# Resource tracking during scans
docker stats --no-stream trivy_container_id

# Failure pattern logging
trivy image --timeout 30m your-image:latest 2>&1 | tee scan-$(date +%Y%m%d).log

Alert Thresholds

Memory usage exceeding 80% during scans
Scan duration exceeding baseline by 200%
Persistent database download failures
Network timeout patterns

Enterprise Implementation Strategy

Phase 1: Infrastructure Preparation

Provision dedicated scanning instances (t3.large minimum)
Configure GitHub tokens and proxy settings
Implement shared cache volumes
Set up database pre-download automation

Phase 2: CI/CD Integration

Separate scanning from build pipelines
Implement proper error handling for exit codes
Configure resource monitoring and alerting
Establish fallback scanning options

Phase 3: Optimization

Image layer optimization for faster scanning
Dependency caching strategies
Performance baseline establishment
Automated scaling based on scan volume

Resource Requirements Summary

Minimum Viable Production Setup

Compute: t3.medium (4GB RAM) minimum
Storage: 50GB for database cache
Network: Direct internet or properly configured proxy
Authentication: GitHub personal access token
Monitoring: Resource usage tracking and alerting

Cost-Performance Trade-offs

Bigger instances: Higher cost, higher reliability
Server mode: Infrastructure complexity, better resource isolation
Pre-download databases: Network overhead, scanning reliability
Alternative scanners: Tool learning curve, different vulnerability coverage

Useful Links for Further Investigation

Essential Resources and Documentation

Link	Description
Trivy Official Documentation	The source of truth for configuration options and supported features. Actually useful, unlike most security tool docs.
Trivy GitHub Repository	Issues section is gold for troubleshooting. Search for your specific error message - someone's already filed a bug report.
Trivy Troubleshooting Guide	Official troubleshooting documentation. Covers the basics but lacks real-world production scenarios.
Trivy GitHub Discussions	Active community discussing configurations, performance issues, and integration challenges.
Stack Overflow - Trivy Tag	Real solutions from engineers who've debugged this stuff in production. Sort by votes, ignore the theoretical answers.
DevOps StackExchange	Search for "Trivy" to find unfiltered war stories and solutions from DevOps engineers dealing with Trivy in enterprise environments.
Docker Memory and Resource Constraints	Essential reading for understanding why your scans get OOMKilled.
Docker Daemon Configuration	Daemon configuration options that affect scanning performance and resource allocation.
GitHub Personal Access Tokens	Create tokens with appropriate scopes for Trivy database downloads. Use classic tokens, not fine-grained ones.
GitHub API Rate Limiting	Understanding rate limits prevents database download failures.
Grype by Anchore	Faster scanning with lower memory usage. Good fallback when Trivy can't handle your images.
Snyk Container CLI	Enterprise-grade scanning with better support for complex dependency trees.
Docker Scout	Docker's built-in scanning. Less comprehensive but integrates well with Docker workflows.
Prometheus Docker Metrics	Monitor container resource usage during scanning to identify bottlenecks.
cAdvisor	Container resource monitoring. Essential for understanding scan resource consumption patterns.
Trivy in CI/CD Pipelines	Integration guides for major CI/CD platforms including Jenkins, GitLab, Azure DevOps, and GitHub Actions.
Harbor Integration with Trivy	Harbor's built-in vulnerability scanning uses Trivy. Configuration affects scanning performance.
Trivy GitHub Issues	Search existing issues for specific error patterns and community solutions.
Docker System Information	`docker system info` reveals resource constraints and configuration issues affecting Trivy.
NIST Container Security Guidelines	Best practices for container security scanning in regulated environments (NIST SP 800-190).
CIS Docker Benchmark	Security hardening guidelines that affect how Trivy integrates with Docker.

Trivy Container Scanning: Production Troubleshooting Guide

Critical Configuration Requirements

Memory Resource Allocation

Database Download Configuration

Failure Modes and Solutions

Memory Exhaustion (OOMKilled/Exit Code 137)

Database Download Timeouts

Network/Proxy Issues

Performance Characteristics

Scan Duration by Image Type

Resource Usage Patterns

Production Architecture Solutions

Server Mode (95% Success Rate)

CI/CD Integration Requirements

Critical Warnings

What Will Break in Production

Hidden Costs

Alternative Solutions

When Trivy Fails

Monitoring and Prevention

Success Metrics

Infrastructure Monitoring

Alert Thresholds

Enterprise Implementation Strategy

Phase 1: Infrastructure Preparation

Phase 2: CI/CD Integration

Phase 3: Optimization

Resource Requirements Summary

Minimum Viable Production Setup

Cost-Performance Trade-offs

Useful Links for Further Investigation

Essential Resources and Documentation

Related Tools & Recommendations

GitOps Integration Hell: Docker + Kubernetes + ArgoCD + Prometheus

Kafka + MongoDB + Kubernetes + Prometheus Integration - When Event Streams Break

Container Security Pricing Reality Check 2025: What You'll Actually Pay

Snyk Container - Because Finding CVEs After Deployment Sucks

Snyk + Trivy + Prisma Cloud: Stop Your Security Tools From Fighting Each Other

GitHub Actions Marketplace - Where CI/CD Actually Gets Easier

GitHub Actions Alternatives That Don't Suck

GitHub Actions + Docker + ECS: Stop SSH-ing Into Servers Like It's 2015

RAG on Kubernetes: Why You Probably Don't Need It (But If You Do, Here's How)

Docker Alternatives That Won't Break Your Budget

I Tested 5 Container Security Scanners in CI/CD - Here's What Actually Works

VS Code Settings Are Probably Fucked - Here's How to Fix Them

VS Code Alternatives That Don't Suck - What Actually Works in 2024

VS Code Performance Troubleshooting Guide

JetBrains AI Credits: From Unlimited to Pay-Per-Thought Bullshit

JetBrains AI Assistant Alternatives That Won't Bankrupt You

JetBrains AI Assistant - The Only AI That Gets My Weird Codebase

Azure DevOps Services - Microsoft's Answer to GitHub

Fix Azure DevOps Pipeline Performance - Stop Waiting 45 Minutes for Builds

Clair Production Monitoring - Keep Your Scanner Running (Or Watch Everything Break)