FastMCP Production Deployment - From Working on Localhost to Actually Working

Currently viewing the human version

Containerizing FastMCP: What Actually Works (And What Will Ruin Your Weekend)

The Docker Problems Nobody Talks About

Been running FastMCP servers for over a year, and container deployment has some nasty surprises. Here's what actually works and what'll break in ways you didn't expect.

Multi-stage builds are mandatory - learned this when Docker images got huge and deployments crawled to a halt. FastMCP pulls in tons of dependencies - ML libraries, database drivers, auth modules. Your CI will timeout, storage costs explode, and everyone gets mad.

Single-stage builds are career suicide. The Docker multi-stage build docs are fine for theory, but here's what actually keeps production running.

Dockerfile That Won't Make You Cry

This Dockerfile survived three different companies and their production nightmares. It follows Docker's official best practices, but more importantly, it won't break at 2am:

## Build stage - install dependencies
FROM python:3.11-slim as builder
WORKDIR /app

## Still on Python 3.11 because 3.12 broke our asyncio shit with some bullshit SSL context error
RUN python -m venv /opt/venv
ENV PATH=\"/opt/venv/bin:$PATH\"

## Install build dependencies (gcc is required for some FastMCP deps)
RUN apt-get update && apt-get install -y \
    build-essential \
    curl \
    gcc \
    && rm -rf /var/lib/apt/lists/* \
    # Clean up immediately or your image will be huge
    && apt-get clean

## Install Python dependencies first (better caching)
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir -r requirements.txt

## Runtime stage - minimal image
FROM python:3.11-slim
WORKDIR /app

## Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH=\"/opt/venv/bin:$PATH\"

## Create non-root user (security audits will fail without this)
RUN groupadd -r mcpuser && useradd -r -g mcpuser mcpuser \
    && chown -R mcpuser:mcpuser /app
USER mcpuser

## Copy application code
COPY --chown=mcpuser:mcpuser . .

## Health check that actually works
## Note: curl might not be installed in slim image - use python instead
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8000/health')\" || exit 1

## CRITICAL: Bind to 0.0.0.0 or container networking won't work
EXPOSE 8000
CMD [\"python\", \"server.py\", \"--transport\", \"http\", \"--port\", \"8000\", \"--host\", \"0.0.0.0\"]

Why Each Line Matters (Learned the Hard Way):

Multi-stage build: Images used to be massive, now they're much smaller. Build tools in production containers are security holes.
Non-root user: Security scans will fail without this. DevSecOps will reject your deployment and you'll look like an amateur.
Health checks: Without proper health checks, Kubernetes thinks dying containers are healthy. Use Python instead of curl - slim images don't include curl by default.
HTTP transport: STDIO is useless in containers. HTTP is the only transport that works reliably.
Bind to 0.0.0.0: If you bind to 127.0.0.1, external traffic can't reach your container. This will break staging in frustrating ways.

Environment Configuration

FastMCP servers need different configurations between development and production. Here's how I actually configure this shit in production (learned this after fucking around with config files for weeks):

import os
from fastmcp import FastMCP

## Configuration that won't break at 3am
mcp = FastMCP(
    name=os.getenv(\"SERVER_NAME\", \"production-server\"),
    version=os.getenv(\"SERVER_VERSION\", \"1.0.0\")
)

## Configure based on environment
if os.getenv(\"ENVIRONMENT\") == \"production\":
    # Production logging
    import structlog
    structlog.configure(
        processors=[
            structlog.stdlib.filter_by_level,
            structlog.stdlib.add_logger_name,
            structlog.stdlib.add_log_level,
            structlog.stdlib.PositionalArgumentsFormatter(),
            structlog.processors.TimeStamper(fmt=\"iso\"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.JSONRenderer()
        ],
        logger_factory=structlog.stdlib.LoggerFactory(),
        wrapper_class=structlog.stdlib.BoundLogger,
        cache_logger_on_first_use=True,
    )

Memory Management: The Silent Killer

FastMCP has memory leak issues. Containers slowly consume more memory and eventually get OOMKilled during heavy traffic. Long-running tool calls seem to hold onto memory - it's a garbage collection issue but the exact cause isn't clear.

Memory configuration that actually works:

## These settings help with memory management
ENV PYTHONMALLOC=malloc
ENV MALLOC_TRIM_THRESHOLD_=100000
ENV PYTHONFAULTHANDLER=1
## Enable this if you want to debug memory leaks (adds overhead)
## ENV PYTHONMALLOC=debug

## Never set memory limits too low - containers will die randomly
## 512MB is barely enough for anything useful
docker run -m 1g --oom-kill-disable=false your-fastmcp-server

What These Actually Do:

PYTHONMALLOC=malloc: Python's default allocator is garbage for long-running processes
MALLOC_TRIM_THRESHOLD_: Forces the OS to reclaim memory instead of hoarding it
Container memory limits: Without this, one container can kill your entire host (happened to us twice)

Pro tip: Restart your containers every 24 hours with a CronJob. It's hacky but it works. Memory leaks are a fact of life with current FastMCP versions.

Transport Selection: What Actually Works

Transport	Container Viability	Performance	Debugging	Reality Check
STDIO	❌ Completely broken	N/A	N/A	Don't waste your time
HTTP	✅ Only sane choice	Good enough	Easy to debug	Use this or suffer
SSE	⚠️ Timeout hell	Inconsistent	Pain in the ass	Avoid unless forced

HTTP transport that won't break:

if __name__ == \"__main__\":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument(\"--transport\", default=\"http\")
    parser.add_argument(\"--port\", type=int, default=8000)
    # NEVER bind to 127.0.0.1 in containers - learned this the hard way
    parser.add_argument(\"--host\", default=\"0.0.0.0\")
    args = parser.parse_args()

    if args.transport == \"http\":
        # Add some basic error handling because this WILL fail sometimes
        try:
            mcp.run_http(host=args.host, port=args.port)
        except Exception as e:
            print(f\"Failed to start HTTP server: {e}\")
            sys.exit(1)
    else:
        # Don't even bother with other transports in containers
        print(\"Use HTTP transport in containers or you'll have a bad time\")
        sys.exit(1)

Listen carefully: If you bind to 127.0.0.1 in a container, external traffic can't reach it. This seems obvious but I've seen senior engineers make this mistake.

Security Hardening (Or: How Not to Get Pwned)

Production FastMCP containers will get attacked. Google's distroless images reduce attack surface, but Container Registry shuts down March 18, 2025 so plan accordingly:

## Distroless is great until you need to debug something at 3am
FROM gcr.io/distroless/python3-debian11

## Or keep debugging tools (recommended for sanity)
FROM python:3.11-slim

## Remove anything attackers can use
RUN apt-get remove -y apt curl wget && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/*

## Read-only filesystem prevents most attacks
VOLUME [\"/tmp\"]
USER 65534:65534  # nobody:nobody - security scans love this

Runtime security (DevSecOps will check this):

docker run \
  --read-only \
  --tmpfs /tmp \
  --security-opt=no-new-privileges \
  --cap-drop=ALL \
  --user 65534:65534 \
  # Don't mount your entire file system as a volume (yes, people do this)
  --volume /app/data:/app/data:ro \
  your-fastmcp-server

Real talk: Most security breaches happen because someone mounted / as a volume or ran as root. Don't be that person.

Image Optimization: Lessons from Our Storage Bill Horror Stories

I learned image optimization when Docker registry costs got out of hand from building oversized images. Here's what actually works:

Layer optimization lessons:

Group RUN commands or every line creates a new layer (learned this when builds got painfully slow)
Use .dockerignore - accidentally including large files will bloat every image
Multi-stage builds saved our ass - build tools stayed in build stage, runtime stayed clean
Pin package versions or Docker will re-download everything on every build

Size optimization experience:

Base Python image: Around 1GB (eats storage budget)
With FastMCP + dependencies: Images get huge, 1.5GB+ easily
After multi-stage build: Much smaller and deployments are faster
With distroless base: Smaller still but debugging becomes difficult

Reality check: Smaller images deploy faster and cost less to store. But if you can't debug in production, you'll spend hours trying to figure out why things break. Pick your poison.

Container Resource Requirements

From production experience:

Simple tools (file operations, basic APIs): Start with 256-512MB memory, adjust CPU as needed
Database work: 512MB-1GB memory works well, CPU matters for complex queries
ML/AI tools: Memory intensive - 1-2GB+ required, CPU depends on model complexity
High-traffic APIs: Scale resources generously - 2GB+ memory, multiple CPU cores

Memory leak mitigation:

## Set JVM-like memory management for Python
ENV PYTHONMALLOC=pymalloc_debug
ENV PYTHONFAULTHANDLER=1
ENV PYTHONUNBUFFERED=1

## Enable garbage collection debugging
ENV PYTHONDEVMODE=1

Development vs Production Image Patterns

Separate images for development and production environments:

Development Dockerfile:

FROM python:3.11-slim
RUN apt-get update && apt-get install -y curl vim htop
COPY requirements-dev.txt .
RUN pip install -r requirements-dev.txt
## Include development tools, debuggers, etc.

Production Dockerfile:

FROM python:3.11-slim as production
## Minimal dependencies only
## No development tools
## Security hardening
## Performance optimizations

Monitoring Container Health

FastMCP containers need comprehensive health monitoring. Follow Kubernetes health check best practices with proper liveness and readiness probes:

from fastapi import FastAPI
from fastmcp import FastMCP

## Add FastAPI for health endpoints
app = FastAPI()

@app.get(\"/health\")
async def health_check():
    \"\"\"Basic health check\"\"\"
    return {\"status\": \"healthy\", \"timestamp\": time.time()}

@app.get(\"/health/ready\")
async def readiness_check():
    \"\"\"Kubernetes readiness probe\"\"\"
    # Check database connections, external dependencies
    return {\"status\": \"ready\"}

@app.get(\"/health/live\")
async def liveness_check():
    \"\"\"Kubernetes liveness probe\"\"\"
    # Check if server is responsive
    return {\"status\": \"alive\"}

## Mount FastMCP on subpath
app.mount(\"/mcp\", mcp.app)

For comprehensive health check implementation, consider using fastapi-healthchecks for structured dependency checking.

Container Security Considerations

Recent security research has identified critical vulnerabilities in MCP integrations where attackers can hijack AI agents through prompt injection. Production deployments must implement proper input validation and sandboxing.

This containerization foundation enables the Kubernetes orchestration patterns covered in the next section. Without proper Docker practices, Kubernetes deployments fail unpredictably in production.

Kubernetes Orchestration: How to Not Hate Your Life at Scale

From "It Works on My Machine" to Production Reality

Running FastMCP in Kubernetes is where dreams go to die. Sure, you can write a basic Deployment manifest, but enterprise K8s is a different beast entirely. You need service mesh integration, custom resource management, horizontal scaling that doesn't bankrupt you, and operational patterns that survive traffic spikes without waking you up at 3am.

Follow the official Kubernetes security best practices if you want to keep your job.

Kubernetes Cluster Architecture

I've deployed FastMCP on several K8s clusters - from basic cloud setups to enterprise environments. Here's what actually works when you have real traffic and users.

Production Kubernetes Manifests That Actually Work

Deployment that won't randomly die:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastmcp-server
  namespace: mcp-production
  labels:
    app: fastmcp-server
    version: v1.2.0
spec:
  replicas: 3  # Always run at least 3 - you'll have node failures
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Never kill more than 1 at once
      maxSurge: 1        # Only create 1 extra during updates
  selector:
    matchLabels:
      app: fastmcp-server
  template:
    metadata:
      labels:
        app: fastmcp-server
        version: v1.2.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
        # This annotation is required or everything breaks - restarts pods every 24 hours to deal with memory leaks
        rollme: "{{ date \"20060102-1504\" .Release.Time }}"
    spec:
      serviceAccountName: fastmcp-service-account
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsUser: 65534
        runAsNonRoot: true  # Security audits will fail without this
      containers:
      - name: fastmcp-server
        image: your-registry/fastmcp-server:v1.2.0
        imagePullPolicy: Always  # "latest" bit me in the ass when different nodes had different images cached
        ports:
        - containerPort: 8000
          name: http
          protocol: TCP
        env:
        - name: ENVIRONMENT
          value: "production"
        - name: SERVER_NAME
          value: "fastmcp-production"
        - name: LOG_LEVEL
          value: "INFO"  # DEBUG will kill your performance
        - name: PYTHONUNBUFFERED
          value: "1"   # Without this, logs disappear into the void
        envFrom:
        - secretRef:
            name: fastmcp-secrets
        - configMapRef:
            name: fastmcp-config
        resources:
          requests:
            memory: "512Mi"  # Don't set this too low
            cpu: "200m"      # 0.2 CPU cores minimum
          limits:
            memory: "2Gi"    # Memory leaks will hit this eventually
            cpu: "1000m"     # 1 CPU core max - tune based on load
        # Health checks are CRITICAL - K8s is dumb without them
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8000
          initialDelaySeconds: 30    # Give it time to start
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3        # Don't restart on single failures
        startupProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1
          failureThreshold: 30       # 60 seconds total startup time
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: config-volume
          mountPath: /app/config
          readOnly: true             # Never mount configs as writable
      volumes:
      - name: tmp-volume
        emptyDir: {}                 # Some apps need /tmp to be writable
      - name: config-volume
        configMap:
          name: fastmcp-config
      nodeSelector:
        kubernetes.io/os: linux      # Don't accidentally deploy to Windows nodes
      # Spot instances are cheap but unreliable - plan accordingly
      tolerations:
      - key: "node-role.kubernetes.io/spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      # Spread pods across nodes so one node failure doesn't kill everything
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - fastmcp-server
              topologyKey: "kubernetes.io/hostname"

Service configuration for load balancing:

apiVersion: v1
kind: Service
metadata:
  name: fastmcp-service
  namespace: mcp-production
  labels:
    app: fastmcp-server
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
    name: http
  selector:
    app: fastmcp-server
  sessionAffinity: None

Horizontal Pod Autoscaling (HPA): When Things Go Sideways

FastMCP autoscaling based on CPU alone is like driving blindfolded - you'll either under-scale and crash, or over-scale and bankrupt yourself. The Kubernetes HPA docs cover the basics, but custom metrics are where the magic happens.

Here's what I learned after HPA scaled aggressively during a memory leak incident (AWS bill was painful):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastmcp-hpa
  namespace: mcp-production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fastmcp-server
  minReplicas: 3        # Never go below 3 - single points of failure suck
  maxReplicas: 20       # Keep this reasonable or costs get out of hand
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # 70% CPU is the sweet spot
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80    # Memory is trickier due to leaks
  - type: Pods
    pods:
      metric:
        name: mcp_requests_per_second
      target:
        type: AverageValue
        averageValue: "10"        # Tune this based on your load
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Don't scale up instantly
      policies:
      - type: Percent
        value: 100        # Double pods when needed (was 200%, too aggressive)
        periodSeconds: 15
      - type: Pods
        value: 4          # Or add 4 pods max per scale event
        periodSeconds: 15
      selectPolicy: Max   # Use the most conservative policy
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down
      policies:
      - type: Percent
        value: 10         # Only scale down 10% at a time
        periodSeconds: 60

ConfigMaps and Secrets Management

Production FastMCP servers need secure configuration management:

ConfigMap for non-sensitive configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fastmcp-config
  namespace: mcp-production
data:
  server.yaml: |
    server:
      name: "FastMCP Production Server"
      version: "1.2.0"
      timeout: 30
      max_connections: 1000
    logging:
      level: "INFO"
      format: "json"
      structured: true
    features:
      enable_metrics: true
      enable_tracing: true
      enable_auth: true
    database:
      pool_size: 20
      max_overflow: 10
      pool_timeout: 30

Secret for sensitive data:

apiVersion: v1
kind: Secret
metadata:
  name: fastmcp-secrets
  namespace: mcp-production
type: Opaque
stringData:
  DATABASE_URL: "postgresql://user:password@postgres-service:5432/mcpdb"
  API_KEY: "your-secret-api-key"
  JWT_SECRET: "your-jwt-secret"
  REDIS_PASSWORD: "your-redis-password"

Network Policies for Security

Kubernetes network policies restrict communication between pods. Implement network policies as part of your zero-trust security architecture:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: fastmcp-network-policy
  namespace: mcp-production
spec:
  podSelector:
    matchLabels:
      app: fastmcp-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: fastmcp-client
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379
  - to: []  # Allow DNS
    ports:
    - protocol: UDP
      port: 53

Service Mesh Integration with Istio

For enterprise deployments, service mesh provides traffic management, security, and observability. Istio's service mesh enables advanced traffic management with VirtualService and DestinationRule:

Istio VirtualService for traffic routing:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: fastmcp-virtual-service
  namespace: mcp-production
spec:
  hosts:
  - fastmcp-service
  - mcp.company.com
  gateways:
  - fastmcp-gateway
  http:
  - match:
    - uri:
        prefix: "/v1/"
    route:
    - destination:
        host: fastmcp-service
        port:
          number: 80
      weight: 90
    - destination:
        host: fastmcp-service-canary
        port:
          number: 80
      weight: 10
    fault:
      delay:
        percentage:
          value: 0.1
        fixedDelay: 5s
    retries:
      attempts: 3
      perTryTimeout: 10s

Istio DestinationRule for load balancing:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: fastmcp-destination-rule
  namespace: mcp-production
spec:
  host: fastmcp-service
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
        maxRequestsPerConnection: 10
        maxRetries: 3
        idleTimeout: 30s
    circuitBreaker:
      consecutiveGatewayErrors: 5
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 50

Custom Resource Definitions (CRDs)

For enterprise FastMCP management, custom resources provide declarative configuration:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: fastmcpservers.mcp.company.com
spec:
  group: mcp.company.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 100
              resources:
                type: object
                properties:
                  requests:
                    type: object
                  limits:
                    type: object
              tools:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    config:
                      type: object
          status:
            type: object
            properties:
              replicas:
                type: integer
              readyReplicas:
                type: integer
              conditions:
                type: array
                items:
                  type: object
  scope: Namespaced
  names:
    plural: fastmcpservers
    singular: fastmcpserver
    kind: FastMCPServer
    shortNames:
    - fmcp

Production Deployment Strategies

Blue-Green Deployments using Argo Rollouts:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: fastmcp-rollout
  namespace: mcp-production
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: fastmcp-service-active
      previewService: fastmcp-service-preview
      autoPromotionEnabled: false
      scaleDownDelaySeconds: 30
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: fastmcp-service-preview.mcp-production.svc.cluster.local
      postPromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: fastmcp-service-active.mcp-production.svc.cluster.local
  selector:
    matchLabels:
      app: fastmcp-server
  template:
    metadata:
      labels:
        app: fastmcp-server
    spec:
      containers:
      - name: fastmcp-server
        image: your-registry/fastmcp-server:latest

Persistent Storage for Stateful Components

While FastMCP servers should be stateless, some deployments require persistent storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fastmcp-storage
  namespace: mcp-production
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: fast-ssd
---
## Mount in deployment
        volumeMounts:
        - name: persistent-storage
          mountPath: /app/data
      volumes:
      - name: persistent-storage
        persistentVolumeClaim:
          claimName: fastmcp-storage

Resource Limits: Don't Let K8s Kill Your Pods

Resource limits in Kubernetes are where good intentions go to die. Get them wrong and your pods will be randomly evicted, throttled, or OOMKilled. Configure proper resource requests and limits or suffer:

## Guaranteed QoS - requests = limits (predictable but wasteful)
resources:
  requests:
    memory: "1Gi"     # You pay for this even if unused
    cpu: "500m"       # 0.5 CPU cores reserved
  limits:
    memory: "1Gi"     # Hard limit - pod dies if exceeded
    cpu: "500m"       # CPU throttling kicks in here

## Burstable QoS - requests < limits (recommended for production)
resources:
  requests:
    memory: "512Mi"   # Guaranteed minimum
    cpu: "200m"       # 0.2 CPU cores reserved
  limits:
    memory: "2Gi"     # Can burst up to 2GB
    cpu: "1000m"      # Can burst up to 1 CPU core

## Best-effort QoS - no requests/limits (career suicide in production)
## Don't even think about it

Reality check from production hell: I once set memory limits to 512MB thinking "FastMCP is lightweight, right?" Wrong. During a traffic spike, every single pod got OOMKilled and our entire service went down for 45 minutes. Turns out some tool was caching massive responses in memory.

CPU limits are even trickier - set them too low and your pods get throttled to death. One time our response times went from 50ms to 2 seconds because CPU throttling kicked in at 80% usage. Customers were pissed, and it took me hours to figure out it wasn't our code - it was Kubernetes being "helpful."

Memory requests are equally evil. Set them too low and during node resource pressure, your pods get evicted randomly. I watched perfectly healthy pods die because another team's deployment needed resources and K8s decided our "low priority" pods had to go.

Multi-Cluster Deployments

Enterprise FastMCP deployments often span multiple clusters:

Cross-cluster service mesh:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: control-plane
spec:
  values:
    pilot:
      env:
        ENABLE_CROSS_CLUSTER_WORKLOAD_ENTRY: true
  components:
    pilot:
      k8s:
        env:
        - name: PILOT_ENABLE_CROSS_CLUSTER_WORKLOAD_ENTRY
          value: true

Cross-cluster load balancing:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: fastmcp-remote-service
  namespace: mcp-production
spec:
  hosts:
  - fastmcp.remote-cluster.local
  ports:
  - number: 80
    name: http
    protocol: HTTP
  location: MESH_EXTERNAL
  resolution: DNS
  endpoints:
  - address: fastmcp-service.mcp-production.svc.cluster.local
    network: cluster-2
    ports:
      http: 80

RBAC and Security Context

Production deployments require strict RBAC and security contexts. Follow RBAC best practices and the principle of least privilege:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fastmcp-service-account
  namespace: mcp-production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: mcp-production
  name: fastmcp-role
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: fastmcp-role-binding
  namespace: mcp-production
subjects:
- kind: ServiceAccount
  name: fastmcp-service-account
  namespace: mcp-production
roleRef:
  kind: Role
  name: fastmcp-role
  apiGroup: rbac.authorization.k8s.io

Kubernetes Architecture Diagram

This Kubernetes foundation enables comprehensive monitoring and observability, which is critical for maintaining production FastMCP deployments at scale. For comprehensive implementation guidance, see the complete Kubernetes service mesh guide.

Production Deployment Options Comparison

Deployment Method	Cost	Complexity	Scalability	Reliability	Security	Maintenance	Best For
Single Docker Container	Low ($20-50/mo)	Low	Manual scaling only	Single point of failure	Basic container security	Low	Proof of concept, development
Docker Compose	Low ($50-100/mo)	Low-Medium	Limited horizontal scaling	Depends on host reliability	Host-based security	Medium	Small teams, simple services
Docker Swarm	Medium ($100-300/mo)	Medium	Good horizontal scaling	Built-in redundancy	Swarm secrets, overlay networks	Medium	Mid-size deployments, Docker expertise
Kubernetes (Managed)	Medium-High ($200-1000/mo)	High	Excellent auto-scaling	High availability built-in	RBAC, network policies, secrets	High	Enterprise, high availability
Kubernetes (Self-Managed)	Medium ($100-500/mo)	Very High	Excellent auto-scaling	Depends on setup quality	Full control, complex to secure	Very High	Masochists with unlimited time
Serverless (AWS Lambda)	Variable ($10-500/mo)	Medium	Automatic, excellent	Managed by provider	Provider security model	Low	If you enjoy cold start delays
Cloud Run / Container Apps	Low-Medium ($30-200/mo)	Low-Medium	Automatic scaling	High, provider-managed	Managed security	Low-Medium	Simple services, cost-sensitive

Monitoring FastMCP: Or How I Learned to Stop Worrying and Love Alerts

When Everything Looks Fine But Nothing Actually Works

FastMCP servers are sneaky bastards. They'll happily return 200 OK on health checks while quietly shitting the bed on every single tool call. Memory usage creeps up like that toxic ex who slowly destroys your life until BAM - OOMKill. Connection pools disappear into the void for reasons that make no fucking sense. Without proper observability, you'll spend your entire weekend debugging ghosts.

You need metrics, logs, and traces - the holy trinity of not wanting to jump out a window during production incidents. But FastMCP isn't your typical REST API that fails predictably. Oh no. This thing has creative new ways to break that'll make you question everything you thought you knew about distributed systems.

After dealing with production alerts for a while, monitoring FastMCP requires specific approaches. MCP servers have unique failure modes - tool timeouts that look like network issues, connection problems that appear hours later, and protocol errors that don't show up in standard HTTP metrics.

The Metrics That Actually Matter

Here's which metrics help when things break:

from prometheus_client import Counter, Histogram, Gauge
import time
import functools
import psutil
import os

## The metrics that matter (learned this the hard way)
mcp_tool_calls_total = Counter('mcp_tool_calls_total', 'Total MCP tool calls', ['tool_name', 'status'])
mcp_tool_duration = Histogram('mcp_tool_duration_seconds', 'MCP tool execution time', ['tool_name'],
                             # Custom buckets - default ones suck for MCP
                             buckets=[0.1, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0, float('inf')])
mcp_active_connections = Gauge('mcp_active_connections', 'Active MCP connections')
mcp_memory_usage = Gauge('mcp_memory_usage_bytes', 'Memory usage in bytes')
mcp_protocol_errors = Counter('mcp_protocol_errors_total', 'MCP protocol errors', ['error_type'])
## This one saved us during the great connection pool disaster of 2024
mcp_connection_pool_size = Gauge('mcp_connection_pool_size', 'Database connection pool size')

def monitor_tool_calls(func):
    """Decorator that actually works in production"""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        tool_name = func.__name__

        # Update memory usage (important for detecting leaks)
        process = psutil.Process(os.getpid())
        mcp_memory_usage.set(process.memory_info().rss)

        try:
            result = func(*args, **kwargs)
            mcp_tool_calls_total.labels(tool_name=tool_name, status='success').inc()
            return result
        except TimeoutError as e:
            mcp_tool_calls_total.labels(tool_name=tool_name, status='timeout').inc()
            mcp_protocol_errors.labels(error_type='timeout').inc()
            raise
        except ConnectionError as e:
            mcp_tool_calls_total.labels(tool_name=tool_name, status='connection_error').inc()
            mcp_protocol_errors.labels(error_type='connection').inc()
            raise
        except Exception as e:
            mcp_tool_calls_total.labels(tool_name=tool_name, status='error').inc()
            mcp_protocol_errors.labels(error_type=type(e).__name__.lower()).inc()
            raise
        finally:
            duration = time.time() - start_time
            mcp_tool_duration.labels(tool_name=tool_name).observe(duration)

    return wrapper

## Example usage (this is how you actually do it)
@mcp.tool
@monitor_tool_calls
def database_query(query: str) -> str:
    """Execute database query with proper monitoring"""
    # Update connection pool metrics
    mcp_connection_pool_size.set(get_connection_pool_size())
    return execute_query(query)

Here's the Prometheus config that actually works (learned this after the old config broke spectacularly and took down monitoring for 3 hours):

Prometheus Architecture

## prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "fastmcp_alerts.yml"

scrape_configs:
  - job_name: 'fastmcp-servers'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

Alerting rules that will save your sanity:

## fastmcp_alerts.yml - These are battle-tested
groups:
- name: fastmcp_alerts
  rules:
  # Don't make this threshold too low - you'll get paged constantly
  - alert: FastMCPHighErrorRate
    expr: rate(mcp_tool_calls_total{status="error"}[5m]) / rate(mcp_tool_calls_total[5m]) > 0.1
    for: 5m  # Wait 5 minutes - temporary spikes are normal
    labels:
      severity: warning
      service: fastmcp
    annotations:
      summary: "High error rate detected for {{ $labels.tool_name }}"
      description: "Error rate for tool {{ $labels.tool_name }} is {{ $value | humanizePercentage }} over the last 5 minutes. Check logs immediately."

  # 10 seconds is generous - tune based on your SLA
  - alert: FastMCPHighLatency
    expr: histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m])) > 10
    for: 5m
    labels:
      severity: warning
      service: fastmcp
    annotations:
      summary: "High latency detected for {{ $labels.tool_name }}"
      description: "95th percentile latency for {{ $labels.tool_name }} is {{ $value }}s. Users are probably complaining."

  # This alert has saved us multiple times - memory leaks are real
  - alert: FastMCPMemoryLeak
    expr: increase(mcp_memory_usage_bytes[1h]) > 1073741824  # 1GB/hour is concerning
    for: 0m  # Fire immediately - memory leaks accelerate
    labels:
      severity: critical
      service: fastmcp
      escalate: "true"  # Wake someone up
    annotations:
      summary: "Memory leak detected in {{ $labels.instance }}"
      description: "Memory usage increased by {{ $value | humanize1024 }}B in the last hour. Pod will OOMKill soon."

  # This usually means your load balancer health check failed
  - alert: FastMCPNoActiveConnections
    expr: mcp_active_connections == 0
    for: 10m  # Give it time - temporary connection drops are normal
    labels:
      severity: warning
      service: fastmcp
    annotations:
      summary: "No active MCP connections on {{ $labels.instance }}"
      description: "FastMCP server has no active connections for 10 minutes. Check networking."

  # The nuclear alert - something is seriously wrong
  - alert: FastMCPServerDown
    expr: up{job="fastmcp-servers"} == 0
    for: 2m  # Don't wait 5 minutes when the server is completely dead
    labels:
      severity: critical
      service: fastmcp
      escalate: "true"
    annotations:
      summary: "FastMCP server is completely down"
      description: "FastMCP server {{ $labels.instance }} unreachable for 2+ minutes. Revenue impact likely."

  # Bonus alert that saved us during Black Friday
  - alert: FastMCPConnectionPoolExhaustion
    expr: mcp_connection_pool_size == 0
    for: 1m
    labels:
      severity: critical
      service: fastmcp
    annotations:
      summary: "Database connection pool exhausted"
      description: "No available database connections on {{ $labels.instance }}. New requests will fail."

Structured Logging (Or Why I Stopped Using Print Statements)

Debugging FastMCP with print statements and tail -f doesn't scale. You need structured logging that works with whatever log aggregation system you have. This becomes important when you need to correlate logs across multiple pods:

import structlog
import sys
from pythonjsonlogger import jsonlogger

## Configure structured logging
def configure_logging():
    """Configure production-ready structured logging"""

    # JSON formatter for log aggregation
    json_handler = logging.StreamHandler(sys.stdout)
    formatter = jsonlogger.JsonFormatter(
        fmt='%(asctime)s %(name)s %(levelname)s %(message)s',
        datefmt='%Y-%m-%dT%H:%M:%S'
    )
    json_handler.setFormatter(formatter)

    # Configure structlog
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.StackInfoRenderer(),
            structlog.dev.set_exc_info,
            structlog.processors.JSONRenderer()
        ],
        wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        logger_factory=structlog.stdlib.LoggerFactory(),
        cache_logger_on_first_use=True,
    )

## Context-aware logging
logger = structlog.get_logger()

@mcp.tool
def monitored_database_tool(query: str) -> str:
    """Database tool with comprehensive logging"""
    request_id = str(uuid.uuid4())

    # Add context to all log messages
    log = logger.bind(
        tool_name="database_query",
        request_id=request_id,
        query_length=len(query),
        user_id=get_current_user_id()
    )

    log.info("Tool call started", query_hash=hashlib.sha256(query.encode()).hexdigest()[:8])

    start_time = time.time()
    try:
        result = execute_database_query(query)
        duration = time.time() - start_time

        log.info(
            "Tool call completed successfully",
            duration=duration,
            result_length=len(result),
            rows_affected=get_rows_affected()
        )
        return result

    except DatabaseTimeoutError as e:
        log.error(
            "Database timeout error",
            error_type="timeout",
            timeout_duration=e.timeout,
            duration=time.time() - start_time
        )
        raise

    except DatabaseConnectionError as e:
        log.error(
            "Database connection error",
            error_type="connection",
            connection_pool_size=get_pool_size(),
            active_connections=get_active_connections()
        )
        raise

Distributed Tracing (Because One Day You'll Need It)

Distributed tracing feels like overkill until you have a request that touches 5 different services and you can't figure out which one is the slow piece of shit. FastMCP requests hop around more than you'd think - tool calls trigger database queries, API calls, file system operations. When it breaks, you need to see the whole flow or you'll be guessing forever:

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor

## Initialize tracing
def setup_tracing():
    trace.set_tracer_provider(TracerProvider())
    tracer = trace.get_tracer(__name__)

    # Configure Jaeger exporter
    jaeger_exporter = JaegerExporter(
        agent_host_name="jaeger-agent",
        agent_port=6831,
        collector_endpoint="http://jaeger-collector:14268/api/traces",
    )

    span_processor = BatchSpanProcessor(jaeger_exporter)
    trace.get_tracer_provider().add_span_processor(span_processor)

    # Auto-instrument libraries
    RequestsInstrumentor().instrument()
    SQLAlchemyInstrumentor().instrument()

    return tracer

tracer = setup_tracing()

@mcp.tool
def traced_api_call(endpoint: str) -> str:
    """API call with distributed tracing"""
    with tracer.start_as_current_span("mcp_api_call") as span:
        span.set_attribute("mcp.tool_name", "api_call")
        span.set_attribute("mcp.endpoint", endpoint)
        span.set_attribute("mcp.request_id", get_request_id())

        try:
            with tracer.start_as_current_span("external_api_request") as api_span:
                api_span.set_attribute("http.url", endpoint)
                response = requests.get(endpoint)
                api_span.set_attribute("http.status_code", response.status_code)

            span.set_attribute("mcp.response_size", len(response.text))
            span.set_status(trace.Status(trace.StatusCode.OK))
            return response.text

        except Exception as e:
            span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
            span.set_attribute("mcp.error_type", type(e).__name__)
            raise

Health Checks That Don't Suck

Basic health checks are useless - they just return 200 OK while your app shits itself in the background. I learned this when our load balancer kept routing traffic to a server that was completely fucked but still responded to /health. Here's how to actually check if your shit is working:

from typing import Dict, Any
from fastapi import FastAPI, HTTPException
import psutil
import asyncio

app = FastAPI()

class HealthChecker:
    def __init__(self):
        self.startup_time = time.time()

    async def check_database_connection(self) -> Dict[str, Any]:
        """Check database connectivity and performance"""
        try:
            start_time = time.time()
            result = await execute_query("SELECT 1")
            query_time = time.time() - start_time

            return {
                "status": "healthy",
                "query_time_ms": query_time * 1000,
                "connection_pool_size": get_pool_size(),
                "active_connections": get_active_connections()
            }
        except Exception as e:
            return {
                "status": "unhealthy",
                "error": str(e),
                "error_type": type(e).__name__
            }

    async def check_external_dependencies(self) -> Dict[str, Any]:
        """Check external API dependencies"""
        dependencies = {}

        for service_name, endpoint in EXTERNAL_SERVICES.items():
            try:
                start_time = time.time()
                response = await asyncio.wait_for(
                    requests.get(f"{endpoint}/health"),
                    timeout=5.0
                )
                response_time = time.time() - start_time

                dependencies[service_name] = {
                    "status": "healthy" if response.status_code == 200 else "degraded",
                    "response_time_ms": response_time * 1000,
                    "status_code": response.status_code
                }
            except asyncio.TimeoutError:
                dependencies[service_name] = {
                    "status": "timeout",
                    "error": "Health check timeout after 5 seconds"
                }
            except Exception as e:
                dependencies[service_name] = {
                    "status": "error",
                    "error": str(e)
                }

        return dependencies

    def get_system_metrics(self) -> Dict[str, Any]:
        """Get system resource metrics"""
        return {
            "cpu_percent": psutil.cpu_percent(interval=1),
            "memory_percent": psutil.virtual_memory().percent,
            "memory_available_mb": psutil.virtual_memory().available / 1024 / 1024,
            "disk_usage_percent": psutil.disk_usage('/').percent,
            "uptime_seconds": time.time() - self.startup_time,
            "process_memory_mb": psutil.Process().memory_info().rss / 1024 / 1024
        }

health_checker = HealthChecker()

@app.get("/health")
async def basic_health():
    """Basic health check for load balancers"""
    return {"status": "healthy", "timestamp": time.time()}

@app.get("/health/ready")
async def readiness_check():
    """Kubernetes readiness probe - check if ready to serve traffic"""
    db_health = await health_checker.check_database_connection()

    if db_health["status"] != "healthy":
        raise HTTPException(status_code=503, detail="Database not available")

    return {
        "status": "ready",
        "database": db_health,
        "timestamp": time.time()
    }

@app.get("/health/live")
async def liveness_check():
    """Kubernetes liveness probe - check if container should be restarted"""
    system_metrics = health_checker.get_system_metrics()

    # Fail liveness if memory usage is too high (potential memory leak)
    if system_metrics["memory_percent"] > 90:
        raise HTTPException(status_code=503, detail="High memory usage detected")

    # Fail if disk usage is critical
    if system_metrics["disk_usage_percent"] > 95:
        raise HTTPException(status_code=503, detail="Disk space critical")

    return {
        "status": "alive",
        "system": system_metrics,
        "timestamp": time.time()
    }

@app.get("/health/detailed")
async def detailed_health():
    """Comprehensive health check for monitoring systems"""
    db_health = await health_checker.check_database_connection()
    dependencies = await health_checker.check_external_dependencies()
    system_metrics = health_checker.get_system_metrics()

    overall_status = "healthy"
    if db_health["status"] != "healthy":
        overall_status = "degraded"

    unhealthy_deps = [name for name, dep in dependencies.items()
                     if dep["status"] not in ["healthy", "degraded"]]
    if unhealthy_deps:
        overall_status = "degraded"

    return {
        "status": overall_status,
        "database": db_health,
        "dependencies": dependencies,
        "system": system_metrics,
        "mcp_metrics": {
            "active_connections": mcp_active_connections._value._value,
            "total_tool_calls": sum(mcp_tool_calls_total._value.values()),
            "memory_usage_bytes": mcp_memory_usage._value._value
        },
        "timestamp": time.time()
    }

Log Aggregation (Or How to Find Needles in Haystacks)

Debugging issues across many containers when logs are scattered is difficult. Proper log aggregation becomes essential. The ELK stack is complex but effective for finding logs quickly during incidents:

Fluentd configuration for Kubernetes:

## fluentd-fastmcp.conf
<source>
  @type tail
  path /var/log/containers/fastmcp-server-*.log
  pos_file /var/log/fluentd-fastmcp.log.pos
  tag kubernetes.fastmcp.*
  read_from_head true
  <parse>
    @type json
    keep_time_key true
  </parse>
</source>

<filter kubernetes.fastmcp.**>
  @type kubernetes_metadata
  @id kubernetes_metadata
</filter>

<filter kubernetes.fastmcp.**>
  @type record_transformer
  <record>
    service_name "fastmcp"
    environment "production"
    log_level ${record['level']}
  </record>
</filter>

## Forward to Elasticsearch
<match kubernetes.fastmcp.**>
  @type elasticsearch
  host elasticsearch.logging.svc.cluster.local
  port 9200
  index_name fastmcp-logs
  type_name _doc
  include_tag_key true
  tag_key @log_name
  <buffer>
    @type file
    path /var/log/fluentd-buffers/fastmcp
    flush_mode interval
    retry_type exponential_backoff
    flush_thread_count 2
    flush_interval 5s
    retry_forever
    retry_max_interval 30
    chunk_limit_size 2M
    queue_limit_length 8
    overflow_action block
  </buffer>
</match>

Dashboards That Actually Help When You're Fucked

I've built a lot of Grafana dashboards that looked pretty but were useless when things broke. Here are the panels that I actually look at during incidents (and the alert thresholds that won't wake you up for bullshit):

The panels that matter:

Request Rate: rate(mcp_tool_calls_total[1m])
Error Rate: rate(mcp_tool_calls_total{status="error"}[1m]) / rate(mcp_tool_calls_total[1m])
Response Time: histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m]))
Memory Usage: mcp_memory_usage_bytes
Active Connections: mcp_active_connections
CPU Usage: rate(container_cpu_usage_seconds_total[1m]) * 100

Alert thresholds based on production experience:

Error rate > 5% for 5 minutes: Warning
Error rate > 15% for 2 minutes: Critical
Response time p95 > 10 seconds: Warning
Memory growth > 100MB/hour: Warning
No active connections for 10 minutes: Warning

Modern Observability Architecture

The 2024 Observability Survey shows 89% of organizations use Prometheus and 85% use OpenTelemetry in production. This observability foundation provides the visibility needed to maintain FastMCP servers in production environments where failures must be detected and resolved quickly.

For comprehensive monitoring implementation, consider production logging best practices and advanced Python logging patterns.

FastMCP Production Deployment FAQ

My FastMCP containers keep getting OOMKilled - what's happening?

FastMCP can have memory leak issues that become worse in containers. Set appropriate memory limits and monitor usage:

## Set memory limits
ENV PYTHONMALLOC=malloc
ENV MALLOC_TRIM_THRESHOLD_=100000

## In Kubernetes
resources:
  limits:
    memory: "1Gi"  # Set appropriate limit
  requests:
    memory: "512Mi"

Also restart containers periodically with a CronJob - I restart every 24 hours in production to prevent memory buildup.

Why does my FastMCP server work locally but not in Kubernetes?

Common issues:

Transport problems: Using STDIO transport doesn't work in containers. Switch to HTTP.
Network binding: Binding to 127.0.0.1 won't work in containers. Use 0.0.0.0 so K8s can reach it.
Environment configuration: Different configs between dev/prod, especially database URLs and API keys
Health check failures: K8s kills pods with failing health checks - make sure your endpoints are correct

Debug this by running the exact same image locally first: docker run -p 8000:8000 your-image. If it breaks there too, it's your container. If it works, it's K8s being K8s.

How do I handle database connections in production FastMCP?

Use connection pooling and proper configuration for production database loads:

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

## Production database configuration
engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=20,        # Connections to keep open
    max_overflow=10,     # Additional connections when needed
    pool_timeout=30,     # Seconds to wait for connection
    pool_recycle=3600,   # Recycle connections every hour
    pool_pre_ping=True   # Validate connections before use
)

Monitor connection pool usage and adjust based on actual load. Set database timeouts shorter than tool timeouts to prevent hanging connections.

My FastMCP server dies randomly - how do I debug this?

This is usually connection pool exhaustion from misconfigured timeouts, or deadlock issues. Add monitoring to see what's happening:

import threading
import time

def connection_monitor():
    """Monitor connection pool status"""
    while True:
        pool = engine.pool
        logger.info("Connection pool status",
                   size=pool.size(),
                   checked_in=pool.checkedin(),
                   checked_out=pool.checkedout(),
                   invalidated=pool.invalidated())
        time.sleep(60)

threading.Thread(target=connection_monitor, daemon=True).start()

Also enable deadlock detection and set aggressive timeouts. Trust me - I've seen asyncio.wait_for hang forever because of some SQLAlchemy connection pool bullshit.

How do I scale FastMCP servers horizontally?

FastMCP servers should be stateless for horizontal scaling. Use Kubernetes HPA with custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastmcp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fastmcp-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: mcp_requests_per_second
      target:
        type: AverageValue
        averageValue: "10"

Use session affinity only if absolutely required - stateless is better for scaling.

What's the best way to handle secrets in production?

Don't hardcode passwords in your code. Production databases get compromised when API keys are committed to repos. Use Kubernetes secrets or external secret managers instead of environment variables for sensitive data:

apiVersion: v1
kind: Secret
metadata:
  name: fastmcp-secrets
type: Opaque
stringData:
  database-url: "postgresql://user:password@host/db"
  api-key: "your-secret-key"
---
## Mount as volume
      volumeMounts:
      - name: secrets
        mountPath: "/app/secrets"
        readOnly: true
      volumes:
      - name: secrets
        secret:
          secretName: fastmcp-secrets

For external secrets, integrate with AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault.

How do I implement blue-green deployments for FastMCP?

Use Argo Rollouts or similar tools for zero-downtime deployments:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: fastmcp-rollout
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: fastmcp-active
      previewService: fastmcp-preview
      autoPromotionEnabled: false
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: fastmcp-preview

Include health checks and automated rollback on failure. Test database schema migrations separately.

My FastMCP tools timeout in production but work in development - why?

Production networks and databases have different latency characteristics. Implement proper timeout hierarchies:

## Tool timeout should be shorter than client timeout
@mcp.tool
async def slow_operation(data: str) -> str:
    async with asyncio.timeout(25):  # Tool timeout: 25s
        result = await external_api_call(data, timeout=20)  # API timeout: 20s
        return result

Also check if resource limits are causing performance degradation. CPU throttling causes unexpected timeouts.

How do I monitor FastMCP performance in production?

Implement comprehensive monitoring with Prometheus metrics:

from prometheus_client import Counter, Histogram, Gauge

mcp_tool_calls = Counter('mcp_tool_calls_total', 'Total tool calls', ['tool', 'status'])
mcp_response_time = Histogram('mcp_response_time_seconds', 'Response time', ['tool'])
mcp_active_connections = Gauge('mcp_active_connections', 'Active connections')

## Alert on key metrics
## - Error rate > 5%
## - P95 response time > 10s
## - Memory usage growing > 100MB/hour
## - No active connections for 10+ minutes

Use Grafana dashboards and set up proper alerting (PagerDuty, OpsGenie, etc.) for critical issues. Better to catch problems early than discover them from user complaints.

What happens if my FastMCP server becomes CPU or memory constrained?

Resource constraints cause degraded performance and eventual failures:

CPU constraints:

Tools take longer to execute
Connection timeouts increase
GIL contention in Python becomes worse

Memory constraints:

Python garbage collection pauses increase
Risk of OOMKill
Memory allocation failures

Set appropriate resource requests and limits, monitor usage patterns, and scale before hitting limits.

How do I handle long-running tasks in FastMCP tools?

Break long tasks into smaller chunks or use background processing:

import asyncio
from celery import Celery

celery_app = Celery('fastmcp-tasks')

@mcp.tool
async def start_long_task(task_id: str) -> str:
    """Start long-running task and return immediately"""
    task = process_large_dataset.delay(task_id)
    return f"Task started with ID: {task.id}"

@mcp.tool
async def check_task_status(task_id: str) -> str:
    """Check status of long-running task"""
    task = celery_app.AsyncResult(task_id)
    return f"Task status: {task.status}"

For very long operations, consider webhook-based completion notifications.

Can I run multiple FastMCP servers behind a load balancer?

Yes, but ensure servers are stateless and session affinity is configured correctly:

apiVersion: v1
kind: Service
metadata:
  name: fastmcp-service
spec:
  sessionAffinity: ClientIP  # If sessions matter
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: fastmcp-server

For HTTP transport, most load balancers work fine. SSE transport requires sticky sessions.

How do I backup and restore FastMCP server data?

FastMCP servers should be stateless, but if you need to backup state:

## Backup database
kubectl exec -n production postgres-0 -- pg_dump mcpdb > backup.sql

## Backup persistent volumes
kubectl get pv fastmcp-storage -o yaml > pv-backup.yaml

## Backup secrets and configs
kubectl get secret fastmcp-secrets -o yaml > secrets-backup.yaml
kubectl get configmap fastmcp-config -o yaml > config-backup.yaml

Store backups in object storage (S3, GCS) with encryption and retention policies.

My FastMCP deployment works but clients can't connect - networking issues?

Check network policies and service mesh configuration:

## Test internal connectivity (service name resolves within cluster)
kubectl exec -it test-pod -- curl fastmcp-service:8000/health

## Check DNS resolution
kubectl exec -it test-pod -- nslookup fastmcp-service

## Check network policies
kubectl get networkpolicy -n production
kubectl describe networkpolicy fastmcp-policy

Common issues: restrictive NetworkPolicies, service mesh misconfiguration, or firewall rules blocking traffic.

How do I implement disaster recovery for FastMCP?

Multi-region deployment with automated failover:

## Primary region deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastmcp-primary
  annotations:
    config.linkerd.io/proxy-cpu-request: "100m"
    config.linkerd.io/proxy-memory-request: "20Mi"
spec:
  replicas: 5
  # ... deployment spec

---
## Secondary region (standby)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastmcp-secondary
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  replicas: 0  # Scale up during failover
  # ... deployment spec

Use external DNS and health checks to route traffic to healthy regions automatically.