GitOps: Because Manual Deployments Are for Masochists

GitOps means Git controls your deployments - no more logging into servers to run random kubectl commands at 2am when shit breaks. The core stack is Docker + Kubernetes + ArgoCD + Prometheus. When it works, it's actually pretty sweet. When it doesn't, you'll burn 6 hours debugging why ArgoCD is stuck syncing.

The Stack That'll Make You Question Your Life Choices

ArgoCD Logo

Look, here's the deal with GitOps: Git is your source of truth, which sounds great until ArgoCD decides it wants to take a coffee break and stops syncing for no fucking reason. I've spent more 3am nights debugging "why isn't this deploying" than I care to admit.

Docker Logo

Docker: Containers are supposed to solve "works on my machine" but they just move the problem to "works in my container but not in prod." You'll spend hours debugging why your Alpine Linux container breaks when you need glibc libraries, or why your multi-stage builds work fine locally but fail in CI/CD pipelines.

Kubernetes Logo

Kubernetes: K8s is like that friend who's really smart but explains things in the most complicated way possible. Sure, it orchestrates everything beautifully, but try debugging why your pods are stuck in Pending state at 2am. The official troubleshooting guide won't help when you're dealing with resource quotas that somebody forgot to configure properly.

ArgoCD: The GitOps controller that's supposed to watch your Git repos and deploy changes automatically. Works great until it doesn't sync, shows "OutOfSync" for no reason, or gets stuck on that one namespace deletion that's been running for 3 hours. The ArgoCD troubleshooting docs are helpful until you hit edge cases that require diving into application resource management or understanding sync phases.

Prometheus Logo

Prometheus: The monitoring stack that'll consume more RAM than your actual applications. Great for metrics until you realize you're storing high-cardinality data and your storage costs just doubled.

When It Actually Works (Sometimes)

GitOps automates deployments so you don't have to SSH into production servers and manually run kubectl commands like some kind of caveman. Until ArgoCD breaks, then you're back to manual debugging anyway.

Drift Detection: ArgoCD is supposed to keep your cluster in sync with Git. In theory, this prevents the clusterfuck of "who changed what in production." In practice, ArgoCD sometimes thinks your ServiceMonitor is out of sync even when it's not. Understanding drift detection mechanisms and sync policies becomes essential when dealing with server-side apply conflicts.

Monitoring Integration: Prometheus scrapes metrics from everything, including ArgoCD itself. Cool until you realize your monitoring stack is using more resources than the apps you're monitoring.

Multi-Cluster Pain: Sure, you can manage multiple clusters with one ArgoCD instance. Just be prepared for network timeouts, authentication issues, and that one cluster that randomly loses connection during your demo. The cluster management docs won't prepare you for debugging RBAC permissions across environments.

Real-World Implementation (AKA Where Dreams Die)

Most teams start with the app-of-apps pattern because it looks clean in diagrams. Then you realize managing 50+ applications through a single ArgoCD UI is like trying to herd cats through molasses.

Secret Management: Never put secrets in Git. Use External Secrets Operator to pull from Vault or AWS Secrets Manager. This works great until your vault is down and nothing can start. Pro tip: your monitoring won't help because the monitoring needs secrets too. Learn about Kubernetes secrets and secret management best practices before you fuck up production.

Repository Structure: Separate your app configs from ArgoCD configs. Sounds obvious until you're 6 months in and your monorepo has become an unmaintainable mess of YAML files that nobody wants to touch.

Advanced Deployments: Argo Rollouts gives you canary deployments and blue-green releases. It's actually pretty sweet when it works. Just don't expect the rollback to work perfectly when your canary deployment takes down production.

GitOps Workflow

Bottom line: GitOps is better than manually deploying shit, but it's not magic. You'll still spend weekends debugging why your app won't start, except now you get to debug Kubernetes, ArgoCD, AND your application.

The real question isn't whether you should adopt GitOps - it's how to implement it without losing your sanity. Before you jump in, you need to understand the different approaches and what actually works in production environments where uptime matters and stakeholders are watching.

GitOps Stack Implementation Approaches

Implementation Approach

GitOps Playground

Helm-Based Setup

Custom Manifests

Enterprise Platform

Setup Complexity

Single command deployment

Moderate Helm chart management

High

  • manual YAML creation

Low

  • managed service

Initial Setup Time

15-30 minutes

2-4 hours

8-16 hours

1-2 hours configuration

Production Readiness

Development/learning focused

Production-ready with customization

Fully customizable for production

Enterprise-grade out of box

Customization Level

Limited to provided options

High via Helm values

Complete control

Platform-specific options

Multi-Cluster Support

Single cluster focus

Manual multi-cluster setup

Custom multi-cluster implementation

Built-in multi-cluster

Component Versions

Pre-selected stable versions

Latest stable versions

Any version you choose

Vendor-managed versions

Repository Structure

Predefined GitOps layout

Flexible Helm structure

Completely custom

Platform conventions

Secret Management

Basic External Secrets

External Secrets Operator integration

Custom secret solutions

Enterprise secret management

Monitoring Stack

kube-prometheus-stack included

kube-prometheus-stack v77.5.0

Custom Prometheus setup

Vendor monitoring integration

ArgoCD Configuration

Basic ArgoCD setup

ArgoCD v3.1.4 with custom config

Fully customized ArgoCD

Managed ArgoCD service

Learning Curve

Beginner-friendly

Intermediate Kubernetes knowledge

Advanced Kubernetes expertise

Platform-specific training

Operational Overhead

Minimal

  • automated setup

Moderate

  • Helm maintenance

High

  • manual maintenance

Low

  • vendor managed

Update Management

Playground script updates

Helm chart version management

Manual component updates

Automated vendor updates

Cost Structure

Free (infrastructure only)

Free tools + infrastructure

Free tools + infrastructure

Enterprise licensing + infrastructure

Support Options

Community documentation

Community + vendor docs

Community support only

Enterprise support included

Best For

Learning and prototyping

Small to medium production

Large enterprise with specific needs

Enterprise with budget

Production Reality: Where Tutorials Go to Die

Every GitOps tutorial makes this shit look easy. "Just deploy kube-prometheus-stack and you're done!" Sure. Here's what actually happens when you try to run this in production.

The Shit That Actually Breaks

ArgoCD CRD Error

The "Too Long" Annotation Error: kube-prometheus-stack will fail with a "metadata too long" error that tells you absolutely nothing useful. Took me 4 hours to figure out it was the CRD size limit. ArgoCD stores the entire manifest in annotations, and Prometheus CRDs are fucking huge.

Fix: Deploy CRDs separately with Replace=true, then use skipCrds: true for the main chart. This is not documented anywhere obvious.

Dependency Hell: ArgoCD doesn't care about deployment order by default. Your app will try to start before its ConfigMap exists, then crash in a loop while you wonder what's wrong.

Use sync waves: infrastructure gets -1, core services get 0, apps get 1+. Obvious in hindsight, not so much when you're debugging at midnight.

Secrets Are Still A Pain: Never put secrets in Git. Use External Secrets Operator to pull from Vault. This works great until your vault is unreachable and nothing can start because everything needs a secret to initialize.

External Secrets Operator Architecture

Scale Problems You Didn't Expect

Single ArgoCD Gets Slow As Hell: One ArgoCD works fine until you hit 50+ apps, then the UI becomes painfully slow and sync operations start timing out. You'll need to shard or deploy separate ArgoCD instances per environment.

ApplicationSets help template apps across clusters, but good luck debugging when one of your 20 templated applications is broken.

Prometheus Resource Usage

Prometheus Will Eat All Your RAM: The monitoring stack uses more resources than the actual apps you're monitoring. Prometheus memory usage scales with cardinality, so avoid labels like user_id or request_id unless you want your monitoring to OOM.

I've seen Prometheus consume 16GB of RAM just to monitor a cluster with 10 applications running with default scraping intervals and 30-day retention. Set retention policies and be ruthless about what metrics you actually need.

Repository Structure Hell: Start with separate repos per environment or you'll hate life. Monorepos become unmaintainable messes of YAML that nobody wants to touch. Use Kustomize for environment-specific configs, Helm for templates.

The Production Pain Points

Meta-Monitoring Problems: You need to monitor your monitoring, but what monitors the thing monitoring your monitoring? I've seen entire teams spend a day debugging why alerts weren't firing, only to discover Prometheus was down and Alertmanager couldn't reach anything.

Run separate monitoring for your GitOps infrastructure. Use external services for critical "is my cluster dead" alerts.

ArgoCD Architecture

Disaster Recovery Is An Afterthought: Your Git repos are backed up, right? What about ArgoCD's configuration? Or the etcd cluster state?

Document your recovery procedures and test them. The 3am outage is not the time to learn that your backups don't actually work.

Security Theatre vs Reality: Default ArgoCD runs with cluster-admin privileges. Cool. Implement RBAC, use OPA for policies, enable audit logging. Your security team will thank you.

The dirty secret: most "production-ready" GitOps setups are held together with duct tape and prayers. Plan for failure, because it's not if, it's when.

Speaking of failure - here are the questions you'll be frantically Googling at 3am when everything's broken, your pager is going off, and you need answers that actually work instead of another "have you tried turning it off and on again" response.

FAQ: The Shit Nobody Tells You About GitOps

Q

Why does my kube-prometheus-stack keep failing with some cryptic "too long" error?

A

Because ArgoCD stores your entire manifest in annotations and Prometheus CRDs are massive. Kubernetes has a 262KB limit on annotations. You'll get this exact useless error: metadata.annotations: Too long: must have at most 262144 bytes and waste hours of your time figuring out what the fuck that means.Fix: Split CRD deployment from the main chart. Deploy CRDs with Replace=true, then deploy the main chart with skipCrds: true. This should be the default but isn't.

Q

Why does my app keep crashing with "ConfigMap not found" even though I deployed it?

A

ArgoCD deploys things in random order by default. Your app starts before its ConfigMap exists.Use sync waves: argocd.argoproj.io/sync-wave: "-1" for infrastructure, "0" for services, "1" for apps. Should be obvious but apparently isn't.

Q

How do I handle secrets without putting them in Git?

A

Don't be an idiot and put secrets in Git. Use External Secrets Operator for Vault/AWS/Azure integration, or Sealed Secrets if you're lazy.Both work until your secret provider is down and nothing can start. Always fun at 3am.

Q

Why does ArgoCD think my perfectly fine deployment is "OutOfSync"?

A

ArgoCD gets confused by status fields that controllers add after deployment. It's especially bad with ServiceMonitors and CRDs.Enable Server-Side Apply with ServerSideApply=true. Should fix most false positives.

Q

ArgoCD is slow as shit with lots of apps. How do I fix it?

A

Single ArgoCD instances choke around 50+ applications. UI becomes unusable, sync operations timeout.Shard ArgoCD with multiple replicas or deploy separate instances per environment. ApplicationSets help template across clusters.

Q

How much resources does this monitoring nightmare actually need?

A

More than you think:

  • ArgoCD: 2-4 cores, 4-8GB RAM (more with lots of apps)
  • Prometheus: 4-8 cores, 8-16GB RAM (scales with metric cardinality)
  • Grafana: 1-2 cores, 2-4GB RAM
  • Everything else: 2-4 cores, 4-8GB RAMExpect 15+ cores and 30+ GB RAM just for monitoring on a production cluster with 50+ services and 30-day metric retention. High-cardinality metrics will double this.
Q

ArgoCD is stuck syncing forever. What now?

A

Usual suspects:

  1. Competing operators fighting over resources
  2. Admission webhooks timing out (looking at you, OPA)3. RBAC problems
    • service account can't do shit
  3. Jobs stuck in Running
    • delete them manuallyTry argocd app sync --force but figure out why it happened or it'll repeat.
Q

Helm or raw YAML manifests?

A

Helm for standard stuff like kube-prometheus-stack. ArgoCD's Helm support works fine.Raw YAML when you need complete control or Helm charts are broken (which happens).Reality: Mix of Helm for common components, raw YAML for custom shit, Kustomize for environment differences.

Q

How do I backup this clusterfuck?

A

Your disaster recovery plan better be solid:

  1. Git repos:

Multiple remotes, mirror everything 2. ArgoCD config: Backup the namespace, CRDs, secrets 3. etcd:

Automated backups of cluster state 4. Prometheus data: Remote write to external storageTest your recovery procedures. The outage is not the time to learn they don't work.

Q

What's the difference between push-based and pull-based GitOps?

A

Pull-based (ArgoCD): Agents in clusters pull changes from Git repositories. More secure as no external access to clusters required, but requires agents in each cluster.Push-based (Traditional CI/CD): External systems push changes to clusters. Simpler for single clusters but requires secure access to production environments and doesn't provide drift detection.GitOps traditionally refers to pull-based approaches, offering better security posture and drift detection capabilities.

Q

How do I handle GitOps with multiple environments and promotion workflows?

A

Implement environment progression through:

  1. Branch-based:

Separate branches per environment with promotion PRs 2. Repository-based: Separate repos per environment with automated promotion 3. Overlay-based: Kustomize overlays with shared base configurationsEach approach has trade-offs. Most organizations start with branch-based and migrate to repository-based as complexity increases.

Q

Why is Prometheus eating all my RAM?

A

Cardinality is a bitch. Every unique label combination = more memory.Avoid labels like user_id, request_id, session_id. Set retention policies, reduce scrape intervals, use recording rules.Or just throw more RAM at it like everyone else.

Q

How do I monitor my monitoring?

A

Meta-monitoring is required but painful:

  • Expose ArgoCD metrics via ServiceMonitor
  • Run separate monitoring for GitOps health
  • Define SLIs/SLOs for sync success rates
  • External alerting for "is my cluster dead" scenariosBecause nothing's worse than discovering your monitoring was down during an outage.

Essential Resources for GitOps Stack Implementation

Related Tools & Recommendations

integration
Similar content

GitOps Integration: Docker, Kubernetes, Argo CD, Prometheus Setup

How to Wire Together the Modern DevOps Stack Without Losing Your Sanity

/integration/docker-kubernetes-argocd-prometheus/gitops-workflow-integration
100%
tool
Similar content

GitOps Overview: Principles, Benefits & Implementation Guide

Finally, a deployment method that doesn't require you to SSH into production servers at 3am to fix what some jackass manually changed

Argo CD
/tool/gitops/overview
82%
integration
Similar content

Pulumi Kubernetes Helm GitOps Workflow: Production Integration Guide

Stop fighting with YAML hell and infrastructure drift - here's how to manage everything through Git without losing your sanity

Pulumi
/integration/pulumi-kubernetes-helm-gitops/complete-workflow-integration
78%
tool
Similar content

ArgoCD Production Troubleshooting: Debugging & Fixing Deployments

The real-world guide to debugging ArgoCD when your deployments are on fire and your pager won't stop buzzing

Argo CD
/tool/argocd/production-troubleshooting
78%
tool
Similar content

ArgoCD - GitOps for Kubernetes That Actually Works

Continuous deployment tool that watches your Git repos and syncs changes to Kubernetes clusters, complete with a web UI you'll actually want to use

Argo CD
/tool/argocd/overview
78%
tool
Similar content

Flux GitOps: Secure Kubernetes Deployments with CI/CD

GitOps controller that pulls from Git instead of having your build pipeline push to Kubernetes

FluxCD (Flux v2)
/tool/flux/overview
63%
tool
Similar content

Development Containers - Production Deployment Guide

Got dev containers working but now you're fucked trying to deploy to production?

Development Containers
/tool/development-containers/production-deployment
60%
tool
Similar content

kubectl: Kubernetes CLI - Overview, Usage & Extensibility

Because clicking buttons is for quitters, and YAML indentation is a special kind of hell

kubectl
/tool/kubectl/overview
58%
tool
Similar content

containerd - The Container Runtime That Actually Just Works

The boring container runtime that Kubernetes uses instead of Docker (and you probably don't need to care about it)

containerd
/tool/containerd/overview
57%
tool
Similar content

Flux Performance Troubleshooting - When GitOps Goes Wrong

Fix reconciliation failures, memory leaks, and scaling issues that break production deployments

Flux v2 (FluxCD)
/tool/flux/performance-troubleshooting
57%
tool
Similar content

Debugging Istio Production Issues: The 3AM Survival Guide

When traffic disappears and your service mesh is the prime suspect

Istio
/tool/istio/debugging-production-issues
55%
tool
Similar content

KEDA - Kubernetes Event-driven Autoscaling: Overview & Deployment Guide

Explore KEDA (Kubernetes Event-driven Autoscaler), a CNCF project. Understand its purpose, why it's essential, and get practical insights into deploying KEDA ef

KEDA
/tool/keda/overview
53%
tool
Similar content

kubeadm - The Official Way to Bootstrap Kubernetes Clusters

Sets up Kubernetes clusters without the vendor bullshit

kubeadm
/tool/kubeadm/overview
49%
tool
Similar content

LangChain Production Deployment Guide: What Actually Breaks

Learn how to deploy LangChain applications to production, covering common pitfalls, infrastructure, monitoring, security, API key management, and troubleshootin

LangChain
/tool/langchain/production-deployment-guide
47%
tool
Similar content

Rancher Desktop: The Free Docker Desktop Alternative That Works

Discover why Rancher Desktop is a powerful, free alternative to Docker Desktop. Learn its features, installation process, and solutions for common issues on mac

Rancher Desktop
/tool/rancher-desktop/overview
45%
troubleshoot
Similar content

Fix Kubernetes Pod CrashLoopBackOff - Complete Troubleshooting Guide

Master Kubernetes CrashLoopBackOff. This complete guide explains what it means, diagnoses common causes, provides proven solutions, and offers advanced preventi

Kubernetes
/troubleshoot/kubernetes-pod-crashloopbackoff/crashloop-diagnosis-solutions
45%
tool
Similar content

Rancher - Manage Multiple Kubernetes Clusters Without Losing Your Sanity

One dashboard for all your clusters, whether they're on AWS, your basement server, or that sketchy cloud provider your CTO picked

Rancher
/tool/rancher/overview
45%
troubleshoot
Similar content

Fix Docker Won't Start on Windows 11: Daemon Startup Issues

Stop the whale logo from spinning forever and actually get Docker working

Docker Desktop
/troubleshoot/docker-daemon-not-running-windows-11/daemon-startup-issues
44%
integration
Similar content

gRPC Service Mesh Integration: Solve Load Balancing & Production Issues

What happens when your gRPC services meet service mesh reality

gRPC
/integration/microservices-grpc/service-mesh-integration
41%
tool
Similar content

Azure Container Instances (ACI): Run Containers Without Kubernetes

Deploy containers fast without cluster management hell

Azure Container Instances
/tool/azure-container-instances/overview
41%

Recommendations combine user behavior, content similarity, research intelligence, and SEO optimization