GitLab Container Registry

Editorial

Docker Registry Architecture

The Reality of GitLab Container Registry

GitLab's container registry solves one of the most annoying problems in DevOps: managing separate credentials for your code repo and your Docker images. Before this, you'd have to juggle Docker Hub credentials, set up separate authentication in CI/CD, and pray that the tokens didn't expire during a critical deployment.

The registry runs alongside your GitLab instance and uses the same permissions. If you can push code to the repo, you can push images to the registry. No more "docker login" commands scattered across your CI files, no more service accounts with mysterious permissions, no more authentication failures that break your builds at 2am.

It's built on Docker Distribution, which means it talks the same Registry HTTP API V2 as everything else. Your existing docker commands work: docker push, docker pull, all the shit you're already used to. The difference is authentication just works because it's integrated with GitLab's JWT system.

How It Actually Works (And Where It Breaks)

The registry runs as a separate service, usually on port 5000, talking to GitLab through JWT tokens. When you docker push, GitLab generates a token with your repo permissions and hands it to the registry. Simple enough, until the clock skew between servers makes tokens invalid and you get hit with HTTP 401 Unauthorized: authentication required errors at 3am. Clock drift of more than 5 minutes breaks JWT validation completely.

Storage is where things get expensive fast. You can run it on local filesystem for dev, but production means S3 or equivalent. Enable lifecycle policies immediately or your storage bill will make you cry. I've seen orgs hit $50k/month in S3 costs because no one set up cleanup policies and developers kept pushing 2GB images with every commit. Use multi-stage builds and proper storage optimization to avoid this nightmare.

The registry authenticates through GitLab's main auth system, which sounds great until LDAP is down and no one can deploy. At least with Docker Hub, when their auth breaks, it's their problem.

The Metadata Database Finally Fixes the Garbage Collection Nightmare

GitLab 17.3 introduced a metadata database that moves registry metadata from object storage into PostgreSQL. This finally fixes the garbage collection problem that's been making ops teams miserable for years.

Before this, cleaning up old images meant taking the entire registry offline - coordinating with every team, setting maintenance windows, and inevitably someone's deployment would break because they didn't get the memo. Online garbage collection runs in the background now, cleaning up orphaned layers without downtime.

The migration to metadata database is scary as hell for production systems though. You're basically moving the registry's brain from file-based storage to PostgreSQL. The migration process can take hours or days depending on how much shit you've accumulated, and there's no rollback if something goes wrong.

Here's what actually works: the metadata database also gives you storage usage metrics that actually work. Before this, figuring out which projects were eating your storage budget meant parsing S3 logs like a caveman. You can finally get real-time storage reports, project-level usage breakdown, and automated cleanup alerts instead of surprise billing.

Security Scanning (And the False Positive Hell)

Security scanning runs automatically with Trivy built in. It'll find vulnerabilities in your images, then you'll spend 3 hours figuring out which ones actually matter and which are false positives. Container scanning happens during CI builds and dumps results into GitLab's security dashboard.

The good news is access control just works with GitLab's existing permissions. If you can push to the repo, you can push images. If you can read the project, you can pull images. No separate ACL bullshit to maintain. Project-level permissions control registry access automatically.

The bad news is vulnerability scanning can slow down your builds significantly, especially for large images. You can disable it, but then security teams get grumpy. You can configure it to only scan certain branches, but then you miss vulnerabilities in development. There's no perfect solution - just different levels of pain. The scanner finds CVE-2023-44487 (HTTP/2 Rapid Reset) in every image using Alpine 3.17, but you can't fix it because updating breaks your application dependencies. Configure security policies and vulnerability management for the full security theater experience.

GitLab Container Registry vs Major Alternatives

Feature	GitLab Container Registry	Harbor	JFrog Artifactory	Docker Hub	AWS ECR	Azure ACR
OCI Compliance	Full OCI v1.1	Full OCI v2.0	Full OCI v1.1	Full OCI	Full OCI	Full OCI v1.1
Deployment Model	SaaS + Self-hosted	Self-hosted only	Self-hosted + Cloud	SaaS only	SaaS only	SaaS only
Vulnerability Scanning	Trivy (built-in)	Trivy, Clair	Xray (commercial)	Snyk (paid)	Inspector	Qualys VMDR
CI/CD Integration	Native GitLab CI/CD	Webhook-based	Multiple platforms	Limited	AWS native	Azure native
Storage Backend	File/S3/GCS/Azure	File/S3/GCS/Azure	Multiple backends	Proprietary	S3-based	Azure Blob
Access Control	GitLab RBAC	Project-based RBAC	Complex ACL system	Org/Team based	AWS IAM	Azure AD/RBAC
Pricing Model	Usage-based/Free tier	Open source	Per user/feature	Free/Pro tiers	Pay per usage	Pay per usage
Multi-format Support	Container images only	OCI artifacts	30+ package types	Container images	Container images	Multi-format
Image Signing	Built-in (metadata DB)	Built-in	Built-in	Third-party	Third-party	Third-party
Garbage Collection	Online (zero-downtime)	Manual/scheduled	Automatic	N/A	Lifecycle rules	Lifecycle rules
Geographic Distribution	Limited	Manual replication	Global CDN	Global CDN	Regional	Global replication
API Compatibility	Docker Registry v2	Docker Registry v2	Multiple APIs	Docker Registry v2	Docker Registry v2	Docker Registry v2

Production Reality: Storage Bills and Performance Hell

Running GitLab's registry in production means dealing with two main problems: storage costs spiraling out of control and performance degrading as you scale. You can run it as SaaS on GitLab.com (their problem), or self-hosted (your problem). Most enterprises end up self-hosted because of compliance requirements, which means you get to deal with all the operational complexity.

Docker System Components

Storage Backend Hell (And How to Not Go Broke)

Local filesystem storage works for dev environments. Production needs cloud object storage - S3, GCS, or Azure Blob - which is where your storage bill starts growing like a cancer.

Here's what happens: developers push 2GB images for every commit, CI builds create temporary images that never get cleaned up, and multi-stage builds leave intermediate layers scattered everywhere. Without cleanup policies, you're looking at exponential storage growth. I've seen companies hit $200k/year in S3 costs before anyone noticed.

The v2 storage drivers are supposed to fix performance issues, but they're still beta and breaking production deployments. The S3 v2 driver broke in GitLab 17.2 when using IAM roles, throwing NoCredentialProviders: no valid providers in chain errors that took three days to debug.

Enable lifecycle policies on your S3 bucket from day one. Set up storage quota limits per project. Configure automated cleanup policies, monitor storage usage analytics, implement retention policies, and check out cost optimization strategies. Your future self will thank you when the storage bill arrives.

Performance Degrades as You Scale (Shocking!)

If your registry is slow, it's probably because you're using filesystem storage and didn't configure the metadata database. This will bite you when you hit 100+ repositories and docker pull starts taking 30 seconds because it's enumerating layers through object storage API calls.

The metadata database actually works well once you get through the migration process. Tag listing becomes fast, cleanup policies run without timing out, and you get real storage metrics instead of "contact your S3 admin" nonsense.

For scaling beyond a single instance, you can run multiple registry instances behind a load balancer. This works until you hit database contention and now your registry is slow because PostgreSQL is the bottleneck. Redis caching helps with reads, but doesn't solve the fundamental issue that everyone's hitting the same database.

Network performance is where things get weird. The registry can redirect downloads to S3 directly, which reduces your bandwidth costs but breaks in air-gapped environments. You can configure CDN integration and proxy caching, but you're choosing between bandwidth costs and operational complexity.

Enterprise Features (Good Luck Getting IT Approval for the Database)

Protected container repositories in GitLab 17.8 let you lock down who can push to production registries. This is actually useful when you want to prevent developers from directly pushing to prod images and bypassing your entire CI/CD process.

The compliance features integrate with GitLab's audit logging, which generates massive amounts of logs that no one ever reads until the security audit. SBOM generation happens automatically, creating JSON files full of software bill of materials data that satisfies checkbox compliance but doesn't actually improve security.

Cleanup policies can be set at project, group, or instance level, which sounds great until you realize different teams need different retention policies and you're stuck maintaining a complex hierarchy of rules. The policies work with online garbage collection, assuming you've migrated to the metadata database and haven't hit any of the migration edge cases that leave your registry in an inconsistent state. You'll end up dealing with compliance frameworks and enterprise authentication headaches.

Questions from Engineers Who Actually Use This Shit

Why does my docker push randomly fail with "unauthorized" errors?

Your GitLab token expired, or there's clock skew between your GitLab server and the registry. The exact error is Error response from daemon: Head https://registry.gitlab.com/v2/myproject/myimage/manifests/latest: unauthorized: HTTP Basic: Access denied. Try docker login registry.gitlab.com again, or if you're using CI, check that your CI runner time is synchronized. Clock drift over 5 minutes breaks JWT validation completely.

Why is my storage bill $50k this month?

Because you didn't set up cleanup policies and your developers have been pushing 2GB images for every commit. Self-hosted instances have no storage limits by default, which is great until your S3 bill arrives. Set up lifecycle policies on your S3 bucket immediately, configure project-level retention policies, and educate developers about multi-stage builds to reduce image size.

Why is vulnerability scanning slowing down my builds by 10 minutes?

Container scanning with Trivy runs during your CI builds and scans every layer of your image for vulnerabilities. It's thorough but slow, especially for large images. You can speed it up by scanning only on main branch, using smaller base images, or disabling it for development branches. The security team won't be happy, but your developers will stop complaining about slow builds.

Can I use this with Jenkins/GitHub Actions/other CI systems?

Yes, it speaks the same Docker Registry HTTP API V2 as everything else. You'll need to create deploy tokens or personal access tokens for authentication. External CI systems work fine, but you lose the tight integration that makes GitLab's registry actually useful.

Should I migrate to the metadata database? (Spoiler: Yes, but it's scary)

The metadata database migration is scary as hell but necessary. You're moving the registry's entire brain from object storage to PostgreSQL. The migration can take hours or days, there's no rollback, and if it fails you might be fucked. But garbage collection finally works without downtime, and performance actually improves. Plan for a maintenance window anyway, despite what the docs say about online migration.

Why can't I delete this damn image?

Probably because other images reference the same layers, or you have dangling manifests that aren't cleaned up yet. Online garbage collection runs every 24 hours, not immediately. You can manually trigger cleanup, but it might take multiple runs to actually free up space. This is why storage bills grow faster than you can delete images.

How do I migrate 500 images from Docker Hub without losing my sanity?

Use Skopeo for bulk migrations: skopeo copy docker://docker.io/myimage docker://registry.gitlab.com/myproject/myimage. Don't use docker pull/tag/push for large migrations unless you enjoy waiting hours and hitting rate limits. Plan for downtime anyway because something always breaks during migration.

Which storage backend should I use?

Local filesystem for dev/testing. S3 for production, unless you want your registry to die when the disk fills up. The v2 storage drivers are supposed to be better but they're still beta and have authentication quirks. Stick with the legacy drivers unless you're feeling adventurous.

Why are my cleanup policies not freeing up space?

Because cleanup policies only mark images for deletion

garbage collection actually removes the data. If you're not on the metadata database, garbage collection requires downtime. If you are on metadata database, it runs daily but can take multiple cycles to clean up shared layers. Storage bills lag behind actual cleanup by weeks.

Quick Navigation

The Reality of GitLab Container Registry

How It Actually Works (And Where It Breaks)

The Metadata Database Finally Fixes the Garbage Collection Nightmare

Security Scanning (And the False Positive Hell)

Storage Backend Hell (And How to Not Go Broke)

Performance Degrades as You Scale (Shocking!)

Enterprise Features (Good Luck Getting IT Approval for the Database)

Why does my docker push randomly fail with "unauthorized" errors?

Why is my storage bill $50k this month?

Why is vulnerability scanning slowing down my builds by 10 minutes?

Can I use this with Jenkins/GitHub Actions/other CI systems?

Should I migrate to the metadata database? (Spoiler: Yes, but it's scary)

Why can't I delete this damn image?

How do I migrate 500 images from Docker Hub without losing my sanity?

Which storage backend should I use?

Why are my cleanup policies not freeing up space?

Related Tools & Recommendations

Amazon ECR Overview: Why You Need AWS Container Registry

Azure Container Registry: Private Docker Registry & Features Guide

GitLab CI/CD Overview: Features, Setup, & Real-World Use

Container Registry Cost Comparison: Enterprise Pricing & Hidden Fees

OpenCost: Kubernetes Cost Monitoring, Optimization & Setup Guide

Rancher Desktop: The Free Docker Desktop Alternative That Works

Enterprise Git Hosting: GitHub, GitLab & Bitbucket Cost Analysis

Fix Bun Container Crashes: Exit 143, OOM, & CI Failures

Debug Kubernetes Issues: The 3AM Production Survival Guide

Jsonnet Overview: Stop Copy-Pasting YAML Like an Animal

Azure AI Foundry Production Deployment: Reality Check & Debugging Guide

Composer: Essential PHP Dependency Management & Package Tool

Docker Desktop Won't Install? Welcome to Hell

Complete Guide to Setting Up Microservices with Docker and Kubernetes (2025)

Fix Docker Daemon Connection Failures

Kubernetes Operators: Custom Controllers for App Automation

Google Cloud Developer Tools: SDKs, CLIs & Automation Guide

GitHub Actions Marketplace: Simplify CI/CD with Pre-built Workflows

Grafana: Monitoring Dashboards, Observability & Ecosystem Overview

Change Data Capture (CDC) Integration Patterns for Production