My FastMCP containers keep getting OOMKilled - what's happening?

FastMCP can have memory leak issues that become worse in containers. Set appropriate memory limits and monitor usage: ```dockerfile # Set memory limits ENV PYTHONMALLOC=malloc ENV MALLOC_TRIM_THRESHOLD_=100000 # In Kubernetes resources: limits: memory: "1Gi" # Set appropriate limit requests: memory: "512Mi" ``` Also restart containers periodically with a CronJob - I restart every 24 hours in production to prevent memory buildup.

Why does my FastMCP server work locally but not in Kubernetes?

Common issues: 1. **Transport problems**: Using STDIO transport doesn't work in containers. Switch to HTTP. 2. **Network binding**: Binding to `127.0.0.1` won't work in containers. Use `0.0.0.0` so K8s can reach it. 3. **Environment configuration**: Different configs between dev/prod, especially database URLs and API keys 4. **Health check failures**: K8s kills pods with failing health checks - make sure your endpoints are correct Debug this by running the exact same image locally first: `docker run -p 8000:8000 your-image`. If it breaks there too, it's your container. If it works, it's K8s being K8s.

My FastMCP server dies randomly - how do I debug this?

This is usually connection pool exhaustion from misconfigured timeouts, or deadlock issues. Add monitoring to see what's happening: ```python import threading import time def connection_monitor(): """Monitor connection pool status""" while True: pool = engine.pool logger.info("Connection pool status", size=pool.size(), checked_in=pool.checkedin(), checked_out=pool.checkedout(), invalidated=pool.invalidated()) time.sleep(60) threading.Thread(target=connection_monitor, daemon=True).start() ``` Also enable deadlock detection and set aggressive timeouts. Trust me - I've seen `asyncio.wait_for` hang forever because of some SQLAlchemy connection pool bullshit.

How do I scale FastMCP servers horizontally?

FastMCP servers should be stateless for horizontal scaling. Use Kubernetes HPA with custom metrics: ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: fastmcp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: fastmcp-server minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Pods pods: metric: name: mcp_requests_per_second target: type: AverageValue averageValue: "10" ``` Use session affinity only if absolutely required - stateless is better for scaling.

How do I implement blue-green deployments for FastMCP?

Use Argo Rollouts or similar tools for zero-downtime deployments: ```yaml apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: fastmcp-rollout spec: replicas: 10 strategy: blueGreen: activeService: fastmcp-active previewService: fastmcp-preview autoPromotionEnabled: false prePromotionAnalysis: templates: - templateName: success-rate args: - name: service-name value: fastmcp-preview ``` Include health checks and automated rollback on failure. Test database schema migrations separately.

My FastMCP tools timeout in production but work in development - why?

Production networks and databases have different latency characteristics. Implement proper timeout hierarchies: ```python # Tool timeout should be shorter than client timeout @mcp.tool async def slow_operation(data: str) -> str: async with asyncio.timeout(25): # Tool timeout: 25s result = await external_api_call(data, timeout=20) # API timeout: 20s return result ``` Also check if resource limits are causing performance degradation. CPU throttling causes unexpected timeouts.

How do I monitor FastMCP performance in production?

Implement comprehensive monitoring with Prometheus metrics: ```python from prometheus_client import Counter, Histogram, Gauge mcp_tool_calls = Counter('mcp_tool_calls_total', 'Total tool calls', ['tool', 'status']) mcp_response_time = Histogram('mcp_response_time_seconds', 'Response time', ['tool']) mcp_active_connections = Gauge('mcp_active_connections', 'Active connections') # Alert on key metrics # - Error rate > 5% # - P95 response time > 10s # - Memory usage growing > 100MB/hour # - No active connections for 10+ minutes ``` Use Grafana dashboards and set up proper alerting (PagerDuty, OpsGenie, etc.) for critical issues. Better to catch problems early than discover them from user complaints.

What happens if my FastMCP server becomes CPU or memory constrained?

Resource constraints cause degraded performance and eventual failures: **CPU constraints:** - Tools take longer to execute - Connection timeouts increase - GIL contention in Python becomes worse **Memory constraints:** - Python garbage collection pauses increase - Risk of OOMKill - Memory allocation failures Set appropriate resource requests and limits, monitor usage patterns, and scale before hitting limits.

How do I handle long-running tasks in FastMCP tools?

Break long tasks into smaller chunks or use background processing: ```python import asyncio from celery import Celery celery_app = Celery('fastmcp-tasks') @mcp.tool async def start_long_task(task_id: str) -> str: """Start long-running task and return immediately""" task = process_large_dataset.delay(task_id) return f"Task started with ID: {task.id}" @mcp.tool async def check_task_status(task_id: str) -> str: """Check status of long-running task""" task = celery_app.AsyncResult(task_id) return f"Task status: {task.status}" ``` For very long operations, consider webhook-based completion notifications.

Can I run multiple FastMCP servers behind a load balancer?

Yes, but ensure servers are stateless and session affinity is configured correctly: ```yaml apiVersion: v1 kind: Service metadata: name: fastmcp-service spec: sessionAffinity: ClientIP # If sessions matter sessionAffinityConfig: clientIP: timeoutSeconds: 3600 ports: - port: 80 targetPort: 8000 selector: app: fastmcp-server ``` For HTTP transport, most load balancers work fine. SSE transport requires sticky sessions.

How do I backup and restore FastMCP server data?

FastMCP servers should be stateless, but if you need to backup state: ```bash # Backup database kubectl exec -n production postgres-0 -- pg_dump mcpdb > backup.sql # Backup persistent volumes kubectl get pv fastmcp-storage -o yaml > pv-backup.yaml # Backup secrets and configs kubectl get secret fastmcp-secrets -o yaml > secrets-backup.yaml kubectl get configmap fastmcp-config -o yaml > config-backup.yaml ``` Store backups in object storage (S3, GCS) with encryption and retention policies.

My FastMCP deployment works but clients can't connect - networking issues?

Check network policies and service mesh configuration: ```bash # Test internal connectivity (service name resolves within cluster) kubectl exec -it test-pod -- curl fastmcp-service:8000/health # Check DNS resolution kubectl exec -it test-pod -- nslookup fastmcp-service # Check network policies kubectl get networkpolicy -n production kubectl describe networkpolicy fastmcp-policy ``` Common issues: restrictive NetworkPolicies, service mesh misconfiguration, or firewall rules blocking traffic.

How do I implement disaster recovery for FastMCP?

Multi-region deployment with automated failover: ```yaml # Primary region deployment apiVersion: apps/v1 kind: Deployment metadata: name: fastmcp-primary annotations: config.linkerd.io/proxy-cpu-request: "100m" config.linkerd.io/proxy-memory-request: "20Mi" spec: replicas: 5 # ... deployment spec --- # Secondary region (standby) apiVersion: apps/v1 kind: Deployment metadata: name: fastmcp-secondary annotations: argocd.argoproj.io/sync-wave: "2" spec: replicas: 0 # Scale up during failover # ... deployment spec ``` Use external DNS and health checks to route traffic to healthy regions automatically.

Currently viewing the AI version

Switch to human version

FastMCP Production Deployment: Technical Reference Guide

Overview

FastMCP production deployment requires specific containerization, orchestration, and monitoring approaches due to unique failure modes including memory leaks, connection pool exhaustion, and transport protocol limitations.

Docker Containerization

Critical Requirements

Multi-stage builds are mandatory - Single-stage builds cause:

Image sizes exceed storage budgets
CI timeouts during deployment
Performance degradation

Transport limitations:

STDIO transport: Completely broken in containers
HTTP transport: Only reliable option for production
SSE transport: Causes timeout issues, inconsistent performance

Production Dockerfile Configuration

# Build stage - install dependencies
FROM python:3.11-slim as builder
WORKDIR /app

# Python 3.12 breaks asyncio with SSL context errors
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# gcc required for FastMCP dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    curl \
    gcc \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir -r requirements.txt

# Runtime stage - minimal image
FROM python:3.11-slim
WORKDIR /app

COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Security requirement - non-root user mandatory for audits
RUN groupadd -r mcpuser && useradd -r -g mcpuser mcpuser \
    && chown -R mcpuser:mcpuser /app
USER mcpuser

COPY --chown=mcpuser:mcpuser . .

# Health check using Python (curl not available in slim images)
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

EXPOSE 8000
# CRITICAL: Bind to 0.0.0.0 or container networking fails
CMD ["python", "server.py", "--transport", "http", "--port", "8000", "--host", "0.0.0.0"]

Memory Management Issues

Known problem: FastMCP has memory leak issues in long-running containers

Containers slowly consume memory during heavy traffic
Eventually triggers OOMKill events
Garbage collection issues with long-running tool calls

Mitigation strategies:

ENV PYTHONMALLOC=malloc
ENV MALLOC_TRIM_THRESHOLD_=100000
ENV PYTHONFAULTHANDLER=1

# Container memory limits
docker run -m 1g --oom-kill-disable=false your-fastmcp-server

Operational workaround: Restart containers every 24 hours via CronJob

Resource Requirements by Use Case

Use Case	Memory	CPU	Notes
Simple tools (file ops, basic APIs)	256-512MB	200m	Adjust CPU for load
Database operations	512MB-1GB	Variable	CPU depends on query complexity
ML/AI tools	1-2GB+	High	Memory intensive, model-dependent CPU
High-traffic APIs	2GB+	Multiple cores	Scale resources generously

Security Configuration

# Remove attack vectors
RUN apt-get remove -y apt curl wget && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/*

# Read-only filesystem prevents attacks
VOLUME ["/tmp"]
USER 65534:65534  # nobody:nobody

# Runtime security
docker run \
  --read-only \
  --tmpfs /tmp \
  --security-opt=no-new-privileges \
  --cap-drop=ALL \
  --user 65534:65534 \
  your-fastmcp-server

Kubernetes Orchestration

Production Deployment Manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastmcp-server
  namespace: mcp-production
spec:
  replicas: 3  # Minimum for high availability
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      annotations:
        # Memory leak mitigation - restart every 24 hours
        rollme: "{{ date \"20060102-1504\" .Release.Time }}"
    spec:
      securityContext:
        fsGroup: 65534
        runAsUser: 65534
        runAsNonRoot: true  # Required for security audits
      containers:
      - name: fastmcp-server
        image: your-registry/fastmcp-server:v1.2.0
        imagePullPolicy: Always
        resources:
          requests:
            memory: "512Mi"
            cpu: "200m"
          limits:
            memory: "2Gi"    # Account for memory leaks
            cpu: "1000m"
        # Health checks are critical - K8s fails without them
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        startupProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 2
          failureThreshold: 30  # 60 seconds total startup time

Resource Limit Configuration

Quality of Service Classes:

# Guaranteed QoS - predictable but wasteful
resources:
  requests:
    memory: "1Gi"
    cpu: "500m"
  limits:
    memory: "1Gi"     # Hard limit - pod dies if exceeded
    cpu: "500m"       # CPU throttling at this level

# Burstable QoS - recommended for production
resources:
  requests:
    memory: "512Mi"   # Guaranteed minimum
    cpu: "200m"
  limits:
    memory: "2Gi"     # Can burst up to 2GB
    cpu: "1000m"      # Can burst up to 1 CPU core

Critical failure scenario: Memory limits set too low (512MB) resulted in OOMKill events during traffic spikes, causing 45-minute service outage.

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastmcp-hpa
spec:
  minReplicas: 3
  maxReplicas: 20  # Prevent cost overruns
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70  # Sweet spot for CPU
  - type: Resource
    resource:
      name: memory
      target:
        averageUtilization: 80  # Higher threshold due to leaks
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100        # Double pods when needed
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes
      policies:
      - type: Percent
        value: 10         # Scale down 10% at a time
        periodSeconds: 60

Network Security

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: fastmcp-network-policy
spec:
  podSelector:
    matchLabels:
      app: fastmcp-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: fastmcp-client
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432

Monitoring and Observability

Critical Metrics

from prometheus_client import Counter, Histogram, Gauge

# Essential metrics for production
mcp_tool_calls_total = Counter('mcp_tool_calls_total', 'Total MCP tool calls', ['tool_name', 'status'])
mcp_tool_duration = Histogram('mcp_tool_duration_seconds', 'MCP tool execution time', ['tool_name'],
                            buckets=[0.1, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, float('inf')])
mcp_active_connections = Gauge('mcp_active_connections', 'Active MCP connections')
mcp_memory_usage = Gauge('mcp_memory_usage_bytes', 'Memory usage in bytes')
mcp_protocol_errors = Counter('mcp_protocol_errors_total', 'MCP protocol errors', ['error_type'])
mcp_connection_pool_size = Gauge('mcp_connection_pool_size', 'Database connection pool size')

Production Alert Thresholds

Based on production experience:

# High error rate alert
- alert: FastMCPHighErrorRate
  expr: rate(mcp_tool_calls_total{status="error"}[5m]) / rate(mcp_tool_calls_total[5m]) > 0.1
  for: 5m
  labels:
    severity: warning

# High latency alert
- alert: FastMCPHighLatency
  expr: histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m])) > 10
  for: 5m
  labels:
    severity: warning

# Memory leak detection - critical alert
- alert: FastMCPMemoryLeak
  expr: increase(mcp_memory_usage_bytes[1h]) > 1073741824  # 1GB/hour
  for: 0m  # Fire immediately
  labels:
    severity: critical
    escalate: "true"

Health Check Implementation

@app.get("/health/ready")
async def readiness_check():
    """Kubernetes readiness probe"""
    db_health = await check_database_connection()
    if db_health["status"] != "healthy":
        raise HTTPException(status_code=503, detail="Database not available")
    return {"status": "ready", "database": db_health}

@app.get("/health/live")
async def liveness_check():
    """Kubernetes liveness probe"""
    system_metrics = get_system_metrics()

    # Fail if memory usage too high (memory leak detection)
    if system_metrics["memory_percent"] > 90:
        raise HTTPException(status_code=503, detail="High memory usage detected")

    return {"status": "alive", "system": system_metrics}

Database Connection Management

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Production database configuration
engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=20,        # Connections to keep open
    max_overflow=10,     # Additional connections when needed
    pool_timeout=30,     # Seconds to wait for connection
    pool_recycle=3600,   # Recycle connections every hour
    pool_pre_ping=True   # Validate connections before use
)

Production Deployment Options Comparison

Method	Monthly Cost	Complexity	Scalability	Reliability	Best For
Single Docker	$20-50	Low	Manual only	Single failure point	Proof of concept
Docker Compose	$50-100	Low-Medium	Limited horizontal	Host-dependent	Small teams
Kubernetes Managed	$200-1000	High	Excellent auto-scaling	High availability	Enterprise
Kubernetes Self-Managed	$100-500	Very High	Excellent auto-scaling	Setup-dependent	Advanced teams
Serverless	$10-500	Medium	Automatic	Provider-managed	Cold start tolerant
Cloud Run	$30-200	Low-Medium	Automatic	Provider-managed	Simple services

Common Production Issues and Solutions

Memory-Related Failures

Issue: Containers get OOMKilled randomly

Cause: Memory leaks in long-running processes
Solution: Set appropriate memory limits, restart containers every 24 hours
Prevention: Monitor memory growth > 100MB/hour

Connection Pool Exhaustion

Issue: Database connections disappear, new requests fail

Cause: Connection pool misconfiguration, timeout mismatches
Solution: Monitor connection pool metrics, implement aggressive timeouts
Detection: Alert when connection pool size = 0

Transport Protocol Issues

Issue: Server works locally but fails in Kubernetes

Common causes:
- Using STDIO transport (broken in containers)
- Binding to 127.0.0.1 instead of 0.0.0.0
- Missing health check endpoints
Solution: Use HTTP transport with proper host binding

Performance Degradation

Issue: Response times increase from 50ms to 2+ seconds

Cause: CPU throttling when hitting resource limits
Solution: Monitor CPU utilization, adjust limits before hitting 70%
Prevention: Set CPU limits at 1000m+ for production workloads

Security Considerations

Container Security Requirements

Run as non-root user (required for security audits)
Read-only filesystem with tmpfs for /tmp
Drop all capabilities
Use distroless or minimal base images
Regular security scanning of images

Network Security

Implement NetworkPolicies for pod-to-pod communication
Use service mesh (Istio) for advanced traffic management
Enable mutual TLS between services
Restrict egress to necessary endpoints only

Secret Management

# Use Kubernetes secrets, not environment variables
apiVersion: v1
kind: Secret
metadata:
  name: fastmcp-secrets
type: Opaque
stringData:
  database-url: "postgresql://user:password@host/db"
  api-key: "your-secret-key"

Resource Requirements and Constraints

Minimum Production Requirements

Memory: 512MB minimum, 2GB+ recommended for stability
CPU: 200m minimum, 1000m+ for production workloads
Storage: Ephemeral storage sufficient for stateless deployments
Network: Ingress controller and load balancer required

Scaling Thresholds

Scale up: CPU > 70%, Memory > 80%, Request rate > 10 RPS per pod
Scale down: Stabilization window of 5 minutes to prevent flapping
Maximum replicas: Set based on cost constraints and traffic patterns

This technical reference provides the operational intelligence needed for successful FastMCP production deployments, including specific failure scenarios, resource requirements, and proven configurations that prevent common production issues.

Useful Links for Further Investigation

Production Resources (The Stuff That Actually Helps)

Link	Description
FastMCP Production Deployment Guide	The official docs are solid. Skip the basics and go to the production section. Good coverage of HTTP transport and working health check examples.
FastMCP Server Logging	Useful logging setup with good structured logging examples that work in production environments.
MCP Inspector	Essential tool for testing MCP servers locally before production deployment. Catches configuration issues and protocol problems early.
FastMCP GitHub Repository	The source of truth. The issues section is pure gold for production gotchas - other people have already fucked up so you don't have to. Read the closed issues before you deploy anything.
Building Production-Ready MCP Servers	Practical tutorial from people who've actually deployed this in production. Good security hardening section with real-world considerations.
Docker MCP Server in Python	Decent practical tutorial. The multi-stage build examples are pretty good and the production optimizations seem legit. Worth checking out the Dockerfile approach.
FastMCP Docker Deployment Tutorial	Complete end-to-end walkthrough that goes from "I have code" to "it's running in the cloud." The cloud platform sections are actually useful instead of just saying "deploy to AWS."
Dockerized SSE MCP Servers	Guide for deploying FastMCP servers with SSE transport in Docker containers, including networking and configuration considerations.
KMCP - Enterprise MCP Development	Kubernetes-native toolkit that's actually built for enterprise instead of just claiming it. If you're stuck with K8s and need proper CRDs and operators, this is your lifeline. Skip if you're not already balls-deep in Kubernetes hell.
KMCP Documentation	Comprehensive documentation for deploying MCP servers to Kubernetes using the KMCP CLI and operators.
Microsoft MCP Gateway	Kubernetes-native reverse proxy and management layer for MCP servers. Enables scalable, session-aware routing and lifecycle management.
Kubernetes MCP Server Implementation	Detailed guide for building MCP servers that interact with Kubernetes clusters, including RBAC and security considerations.
FastMCP SRE Agent Implementation	Real-world case study of building production monitoring and alerting systems using FastMCP with Kubernetes integration.
MCP Production Monitoring Guide	Actually practical monitoring advice without the usual "just use Prometheus" handwaving. Their alerting thresholds are realistic - I've used their error rate alerts for months without getting paged for bullshit.
Building AI-Powered Applications with MCP and Docker	Comprehensive metrics collection and monitoring setup for production MCP deployments with Docker and Kubernetes.
Securing MCP: From Vulnerable to Fortified	Security best practices for production MCP deployments, including HTTPS, OAuth, authentication, and vulnerability mitigation.
MCP Security Best Practices	Enterprise security considerations for MCP deployments, covering attack vectors, threat modeling, and security controls.
Enterprise MCP Tools Collection	Curated collection of enterprise-focused MCP tools, platforms, and deployment resources for production environments.
Deploy Production MCP Server with Docker	Step-by-step guide for deploying MCP servers to production using Docker and cloud platforms with proper CI/CD pipelines.
FastMCP Remote SSE Deployment	Real-world deployment of remote SSE MCP servers to cloud infrastructure with production considerations and lessons learned.
MCP Production Deployment with Ray Serve	Advanced deployment patterns using Ray Serve for scalable MCP server deployments with custom monitoring and logging.
FastMCP Performance Optimization	Performance tuning and middleware implementation for FastMCP servers, including authentication, logging, and monitoring optimizations.
MCP Multi-Agent Architecture	Scalable architecture patterns for multi-agent MCP deployments with service provisioning and real-time monitoring.
MCP Automation Platforms for Enterprise	Enterprise automation platforms and deployment strategies for large-scale MCP server implementations.
FastMCP Documentation	Complete documentation covering production deployment, authentication, and advanced patterns for FastMCP servers.
MCP Development and Production Workflow	Complete development to production workflow covering testing, deployment strategies, and operational considerations.
FastMCP Proxy Server	Production-ready proxy server implementation for FastMCP with load balancing, health checks, and monitoring capabilities.
MCP Servers Directory	Community-maintained directory of production MCP server implementations with architecture diagrams and deployment guides.
30+ MCP Production Examples	Comprehensive collection of production-ready MCP server examples with complete source code and deployment instructions.
Docker MCP Developer Guide	Developer-focused guide covering Docker containerization, deployment patterns, and production best practices for MCP servers.