Look, container orchestration is a pain in the ass. Kubernetes is like learning to fly a spaceship when you just want to drive to the grocery store. Docker Swarm is the sedan - boring, reliable, and you won't spend three months reading documentation to deploy a simple web app.
The Reality of Running Swarm in Production
I've deployed Swarm clusters that have been running for years without drama. The secret is understanding what you're getting into. Swarm has manager nodes that make decisions and worker nodes that do the actual work. Sounds simple, right? It mostly is, until networking decides to have a personality.
The Docker Swarm mode documentation covers the basics, and the swarm mode key concepts explain the node roles in detail.
The architecture is straightforward: managers handle cluster state and scheduling decisions, while workers run your actual containers. Managers use Raft consensus to stay in sync, which is just a fancy way of saying they vote on decisions instead of having one dictator node.
In a typical Docker Swarm cluster, you'll have 3-5 manager nodes (for fault tolerance) and any number of worker nodes that actually run your containers. The managers coordinate everything - from load balancing to container placement - while workers just follow orders.
Manager nodes use the Raft consensus algorithm, which means you need an odd number (3, 5, 7) to avoid split-brain scenarios. I learned this the hard way when two managers in different data centers couldn't talk to each other and the cluster basically had a nervous breakdown.
Here's what actually happens: your cluster works fine with one manager until that node dies and takes your entire orchestration with it. Run at least three managers unless you enjoy 3am phone calls. The Docker best practices guide recommends odd numbers to maintain quorum, and this Stack Overflow discussion covers real-world scenarios.
Services vs Containers: The Thing That Trips Everyone Up
Forget everything you know about docker run
. In Swarm, you create services, not containers directly. A service is like saying "I want 3 copies of nginx running somewhere in this cluster, and I don't really care where."
The Docker service documentation explains the difference thoroughly, and this Digital Ocean guide shows practical service creation examples.
When you create a service, the manager node takes your service definition, breaks it into individual tasks (container instances), and schedules them across available worker nodes. If a worker node fails, the manager detects it and reschedules those tasks on healthy nodes.
version: '3.8'
services:
web:
image: nginx:alpine
replicas: 3
ports:
- "80:80"
deploy:
resources:
limits:
memory: 128M
restart_policy:
condition: on-failure
The beauty is that Swarm will keep trying to maintain 3 replicas even if nodes fail. The pain is debugging when services won't start and the error message is "failed to start" with zero useful context. The Docker Compose file reference shows all available options, and this troubleshooting guide helps with common service startup issues.
Networking: Where Dreams Go to Die
Swarm's routing mesh is brilliant when it works. Every node can accept traffic for any service, even if that service isn't running on that node. The mesh routes requests to healthy containers automatically.
The routing mesh essentially creates a virtual load balancer that spans all nodes. Hit any node on port 8080, and it'll route your request to a healthy container running your service, even if that container is on a different node entirely.
When it breaks? Good fucking luck. I've spent entire weekends debugging overlay network issues where containers couldn't talk to each other because of some arcane iptables rule or kernel version incompatibility. The official networking docs help, but they assume your network isn't a disaster. This GitHub issue thread documents common overlay networking problems, and Docker's network troubleshooting guide provides debugging steps.
Pro tip: Use docker network ls
and docker network inspect
obsessively. Half of Swarm debugging is network debugging.
Security That Actually Works
Here's the one area where Swarm doesn't disappoint. When you run docker swarm init
, it generates certificates and encrypts everything automatically. Node-to-node communication is secured with mutual TLS, certificates rotate every 90 days, and overlay networks encrypt traffic by default.
Swarm's built-in PKI (Public Key Infrastructure) handles certificate generation, distribution, and rotation completely automatically. Each node gets its own certificate signed by the cluster's root CA, and everything just works without you having to manage certificate files or expiration dates.
Docker secrets actually work properly - sensitive data gets encrypted and only delivered to containers that need it. Unlike trying to manage secrets with environment variables like a barbarian. Check out this practical secrets tutorial and security best practices for production deployments.
Is Swarm Dead? Not Really, But...
As of September 2025, Docker still ships Swarm with Docker Engine 28.4.0 (the latest stable release) and maintains active development. Recent updates include improved device file support, better multi-platform handling, and enhanced security patches for container isolation. The Docker team continues maintaining it with regular security patches, and companies like those discussed in recent Medium articles still run serious production workloads on it.
But let's be honest - the ecosystem moved to Kubernetes. Finding Swarm-specific tools, monitoring solutions, or expert help is harder than it was in 2018. If you're starting fresh and have the resources, Kubernetes is probably the safer long-term bet. This comparison article and Hacker News thread show current community sentiment about Swarm vs K8s.
That said, if you have 5 services and 3 servers, Swarm will get you running faster than you can spell "YAML indentation error."