Platform engineering was supposed to solve the DevOps chaos. Instead of developers clicking around AWS consoles at 3 AM or writing one-off Terraform that works exactly once, platform teams would build internal developer platforms that actually let developers self-serve infrastructure.
The theory was beautiful. The practice has been a goddamn nightmare.
Turns out what I've been saying for two years is finally sinking in: most Backstage installations collect dust while engineers still SSH into production to fix shit. I've seen this personally - organizations blow 12-18 months building developer portals that collect dust while engineers still SSH into production to restart services. There's a massive "Backstage backlash" happening as teams realize they've been building expensive tech demos instead of platforms.
The Fundamental Problem: Portal-First Thinking
Most platform teams start by asking "what portal should we build?" This is fucking backwards. Portals are just the frontend. Building a platform is like building an application - you need both a frontend and a robust backend. Starting with the UI is like building a house by putting up the front door first.
I've watched teams spend 8 months building a gorgeous Backstage catalog that lists services nobody can actually deploy. The "Create New Service" button submits a Jira ticket to the ops team. That's not self-service - that's a $300K form.
The portal-first approach fails because:
Infrastructure Anarchy: Without standardized infrastructure building blocks, developers still click around cloud consoles or write ad-hoc scripts that break in production. The portal becomes a pretty wrapper around the same manual chaos.
Backend Logic in Frontend: Teams cram business logic into Backstage plugins or custom portal code, violating basic application architecture principles. I've debugged Backstage plugins at 2 AM - when everything is in the frontend, nothing works reliably. Ever try to figure out why a TypeScript scaffolding template is generating malformed YAML? It's a special kind of hell.
No Real Self-Service: True self-service requires programmable infrastructure primitives that can be composed into higher-level abstractions. Without this foundation, "self-service" actually means "please submit a ticket and wait 3 days."
Operations Nightmare: Platform teams become glorified ticket handlers, manually provisioning infrastructure through the portal. The ops team workload increases instead of decreasing - I've seen platform teams that handle way more tickets after building their "self-service" portal.
The Missing Foundation: Infrastructure-First Architecture
Successful platform engineering starts with the infrastructure layer, not the portal layer. You need standardized, reusable infrastructure components that can be consumed programmatically before you worry about user interfaces.
Most teams get this completely backwards.
The infrastructure-first approach means:
- Standardized Building Blocks: Create reusable infrastructure components that encapsulate best practices, security policies, and operational requirements
- Programmable APIs: Enable infrastructure consumption through code, not just UI clicks
- Golden Paths: Provide opinionated templates that guide developers toward best practices while maintaining flexibility
- Policy Enforcement: Embed compliance and security rules directly into the infrastructure components
Only after establishing this foundation should you add portal interfaces on top.
Platform Engineering Finally Gets Its Shit Together
The platform engineering space is finally maturing past the "let's install Backstage" phase. Here's what's actually happening:
Portal Backlash is Real: Teams are realizing that Backstage is not your platform. Portals are interfaces to platforms, not platforms themselves. I've watched three different companies spend a year customizing Backstage plugins just to discover they still can't actually deploy anything.
Console Access is Going Away: Developers are losing direct access to infrastructure. The days of unrestricted cloud console access are ending - partly because security finally got tired of explaining why the intern has Admin access to production S3 buckets.
Everyone Has to Win: Platform initiatives that only help developers while making ops teams' lives worse are failing. You can't just move the complexity around - you have to actually eliminate it.
Real Platform Engineering Success Stories
Organizations that get platform engineering right share common patterns:
BMW scaled their platform to support 11,000+ developers handling hundreds of thousands of builds daily using hybrid cloud infrastructure. They abstracted complex infrastructure into standardized, repeatable components instead of letting each team build their own solutions.
Unity reduced deployment time from weeks to hours - an 80% improvement - by implementing standardized infrastructure components that developers could self-serve. The key insight: they built reusable infrastructure libraries once instead of each team rolling their own solutions.
Mercedes-Benz eliminated most of their manual infrastructure operations by building reusable components that developers could self-serve through code. Note: they still have manual operations for the edge cases, but the 80% common use cases became automated.
The common thread: these organizations started with standardized infrastructure building blocks and added user interfaces later, not the other way around.
The Cost of Getting It Wrong
I've watched platform engineering initiatives burn money like a dumpster fire. Here's the real cost breakdown:
- Engineer Time: Senior engineers spending way too much time per week building DIY solutions instead of features. When you're paying senior engineer salaries for infrastructure firefighting, that adds up fast.
- Infrastructure Tickets: Development teams submit 50+ infrastructure tickets per month, each taking 2-4 hours to fulfill. Platform teams hire more ops engineers to handle the "self-service" workload.
- Security Incidents: Manual processes and inconsistent configurations. I've seen a company get breached because someone fat-fingered an S3 bucket policy in the "self-service" portal, making customer data public for 6 hours before anyone noticed.
- Cloud Waste: Unmanaged resources and oversized instances. Developers provision c5.4xlarge instances for development because the portal doesn't have guardrails, then forget they exist until the monthly AWS bill shows up.
Real talk: a typical failed platform engineering project burns through anywhere from half a mil to a couple million just in engineer salaries. That's not counting the business impact when features get delayed by months or that one security incident where someone fat-fingered an S3 bucket policy and exposed customer data. All for a platform that ends up being a glorified bookmark page.
The Infrastructure-First Alternative
Platform engineering done right eliminates these costs by focusing on what actually matters:
Build Once, Use Everywhere: Instead of 47 different ways to deploy a web app, you have one standardized component that works. Security policies and operational knowledge are baked in, not documented in a wiki that nobody reads.
Actual Self-Service: Developers can provision what they need without submitting tickets. The key word is "actual" - not just a fancy form that creates a ticket behind the scenes.
Policy as Code: Instead of hoping people follow the security guidelines, the infrastructure components enforce the rules. You can't create an internet-facing database by accident because the component won't let you.
Treat Infrastructure Like Software: Version control, testing, code review, CI/CD. Apply the same engineering rigor to infrastructure that you apply to applications.
This infrastructure-first approach is exactly what Pulumi IDP was designed to enable. Instead of starting with portals and working backward to infrastructure, you start with solid infrastructure foundations and build user experiences on top.
The result: platform engineering that actually works at scale, with developer adoption that justifies the investment.