I've deployed Nix to production in three different environments over the past 4 years. Each approach has its place, but they're not interchangeable.
The Simple Way: Direct nixos-rebuild
This is how you start. SSH into your server and run nixos-rebuild switch
. Your configuration lives in /etc/nixos/configuration.nix
and you edit it directly on the server.
I used this for my first production NixOS server in 2021. It worked fine for a single-server Rails app with low traffic.
When this works:
- Single server or very few servers
- You don't mind SSHing into production to deploy
- Configuration changes are infrequent
- Team is small (1-2 people max)
When this breaks:
- Multiple servers need identical configs
- You want deployment history and rollbacks
- Team growth means multiple people touching production
- Compliance requires audit trails of who changed what
The moment you have two servers, direct editing becomes a nightmare. Trust me, I've been there. You make a change on server A, forget to apply it to server B, and spend 2 hours debugging why they behave differently.
The Remote Way: nixos-rebuild with --build-host
This is the middle ground. Your configuration lives in version control, and you build remotely but deploy from your local machine:
nixos-rebuild switch \
--build-host build-server.example.com \
--target-host prod-server.example.com \
--use-remote-sudo
The --build-host
flag is crucial for production. Building Firefox from source on a 1-CPU production server will kill your site for 3 hours. Build on a separate machine with more cores and push the result.
When this works:
- 2-10 servers that need coordinated updates
- You have a beefy build server
- Manual deployment process is acceptable
- Want version control for configurations
When this starts sucking:
- Deploys take forever (hitting servers one by one like it's 2005)
- More than one person trying to deploy causes chaos
- Rolling back means manually SSHing into each server
- Binary cache misconfiguration means you're building Firefox from source during peak traffic
I used this approach for a client with 8 NixOS servers. Deployments took 15 minutes because I had to hit each server sequentially. The binary cache saved us from recompiling, but the serial deployment was painful.
The Production Way: Deploy-rs and Flakes
This is how you do it when you're serious. Deploy-rs treats deployment as a first-class problem with proper tooling.
Your flake.nix defines everything:
{
deploy.nodes.web-server = {
hostname = \"web01.prod.example.com\";
profiles.system = {
user = \"root\";
path = deploy-rs.lib.x86_64-linux.activate.nixos
self.nixosConfigurations.web-server;
};
};
deploy.nodes.api-server = {
hostname = \"api01.prod.example.com\";
profiles.system = {
user = \"root\";
path = deploy-rs.lib.x86_64-linux.activate.nixos
self.nixosConfigurations.api-server;
};
};
}
Deploy everything with deploy .
and it runs in parallel. Magic rollback means if you break SSH access, the server reverts automatically after 30 seconds.
Why this is better:
- Parallel deployments: 20 servers finish as fast as 1 server
- Atomic rollbacks: If any server fails, everything rolls back
- Interactive mode: Preview changes before deployment
- Multi-profile support: Deploy apps without root access
- Proper error handling: Clear failures, not silent corruption
I've used this for clients with 50+ servers. A full deployment finishes in under 5 minutes, including application updates and OS configuration changes.
Binary Caches: Don't Build in Production
Here's the thing nobody tells you: binary caches are not optional for production. They're mandatory.
Without a cache, every deployment compiles everything from source. I've seen production deployments take 4 hours because someone modified a low-level dependency.
Your options:
- cache.nixos.org: Free, covers 95% of nixpkgs
- Cachix: $45/month, handles custom packages
- FlakeHub Cache: Enterprise solution with private flakes
- Self-hosted: Attic or Nix-serve
For production, I recommend Cachix for the convenience, or self-hosted Attic if you need full control. FlightAware uses self-hosted caches because they need guaranteed availability.
The cache hit rate for standard nixpkgs is usually 90%+. For custom applications, you'll build once and cache forever. This turns 2-hour deployments into 2-minute deployments.
CI/CD Integration That Actually Works
Don't try to adapt Docker-based CI/CD to Nix. Build a Nix-native pipeline instead.
Our GitHub Actions workflow looks like this:
- uses: DeterminateSystems/nix-installer-action@v4
- uses: DeterminateSystems/magic-nix-cache-action@v2
- name: Build system configurations
run: nix build '.#nixosConfigurations.web-server.config.system.build.toplevel'
- name: Deploy to production
run: deploy . --skip-checks
env:
SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
The Magic Nix Cache speeds up CI builds dramatically. Combined with deploy-rs, you get proper deployment automation.
Key insights from production use:
- Build everything in CI, never on production servers
- Use
--skip-checks
in automated deployments (checks already ran in CI) - Set up proper SSH key management for deploy access
- Monitor deployment times - anything over 10 minutes needs investigation
The whole process from git push to production deployment takes 5-8 minutes for our largest clients. Compare that to Docker-based pipelines that take 20-30 minutes for similar complexity.
Companies like Shopify and Tweag use variations of this approach for hundreds of servers.