The Security Team Wants Everything, Yesterday
Remember when you showed the security team Temporal's basic auth? Yeah, that went about as well as suggesting we store passwords in plaintext. Now they want mTLS certificates for everything, SAML integration with the 47 different identity providers we somehow accumulated, and compliance docs that prove we're not running workflows on post-it notes.
I've been through this dance at three companies now. The first time I estimated 2 months and delivered in 11. The second time I was smarter but still underestimated certificate rotation hell. Third time? Still got burned by SCIM edge cases nobody documents.
The SCIM integration dropped in early 2025, which is great because manually provisioning users sucks. The improved API key system went GA and actually works now (unlike the beta which had weird permission bugs).
The Four-Layer Security Stack That Won't Get You Fired
Here's the authentication stack that passed audit at two Fortune 500s:
1. mTLS for Infrastructure (Pain Level: Weekend Destroyer)
Mutual TLS sounds simple - every connection gets a client certificate. I thought this too. Then I spent a Saturday debugging why workers couldn't connect, only to discover that one fucking intermediate CA certificate wasn't in the trust store on exactly 3 out of 47 containers. The error message? "certificate verification failed" - super helpful, thanks OpenSSL.
2. API Keys for Applications (Pain Level: Manageable)
The GA API key system actually works now, which shocked me after the beta's permission fuckery. No more certificate dance for every SDK call. Still need to coordinate key rotation across 23 services when they expire, but at least rotation works without downtime.
3. SAML SSO for Humans (Pain Level: Depends on Your IdP)
SAML integration with Okta/Azure AD works fine... until someone enables conditional access and suddenly nobody can log in from the office. I spent 3 hours debugging this before realizing Azure's conditional access policy was blocking the SAML flow.
4. SCIM for User Lifecycle (Pain Level: Surprisingly Low)
SCIM actually works, unlike that shitshow we tried with Jenkins. When Bob from accounting gets fired, his access disappears automatically instead of lingering for 6 months. Takes 5-15 minutes to sync, which is way better than our old "email the ops team" process.
Identity Provider Integration - What Actually Breaks
SAML SSO Reality Check
SAML setup works with Azure AD, Okta, and Google Workspace. Here's what the docs don't tell you:
- Azure AD: Works perfect in testing, then production goes live and BAM - nobody can log in from the office because someone quietly enabled conditional access. The SAML response just says "access denied" with zero useful context. Pro tip: open browser dev tools and look at the actual SAML assertion to see what Azure is complaining about.
- Okta: Device trust policies fail randomly for MacBook users, especially anyone who dared to upgrade macOS. Error message is "Authentication failed" - real fucking helpful. Don't bother looking at Temporal's logs, the actual error is buried in Okta's system logs 3 clicks deep in their admin console.
- Google Workspace: The only IdP that actually works reliably, which is terrifying. Session timeouts are aggressive though - users get logged out every 12 hours by default.
Group mapping to Temporal roles works fine, just remember that role changes take up to 5 minutes to propagate.
Service Accounts for Automation
Service accounts are how your CI/CD and monitoring systems authenticate. Key gotchas:
- API keys expire. Set calendar reminders or you'll get paged when deployments break.
- Namespace-level permissions are weird - you can't grant cross-namespace access even if you think you need it.
- The audit trail is actually useful for tracking down which system did what.
Private Network Setup - The Expensive Peace of Mind
PrivateLink/Private Service Connect Setup
Your network team saw "workflows going over the public internet" in the architecture review and nearly had a stroke. Now you need AWS PrivateLink or Google Private Service Connect. It's $400/month of pure compliance theater, but it stops the network team from asking stupid questions.
Temporal takes 2-3 business days to provision the connection. I learned the hard way to test from EVERY subnet before going live. One subnet couldn't reach Temporal because of route table nonsense that took me and two network engineers 6 hours to figure out. The fix was changing one route priority.
Namespace Isolation
Namespaces are your security boundary. Each namespace has separate auth, separate workers, separate everything. Don't try to share namespaces between teams - it gets messy fast.
Certificate-Based Worker Auth
Workers use client certificates signed by your CA. This is where mTLS gets complicated:
- Certificate chain validation is picky about intermediate CAs
- Clock skew >5 minutes breaks everything
- Certificate rotation requires coordinated deployments across all workers
Compliance Documentation - What Auditors Actually Want
SOC 2 Type II Reports
Temporal maintains SOC 2 Type II certification. The audit report is 120+ pages of security controls. Your auditors want to see this plus your own risk assessment of using Temporal.
GDPR and Data Residency
Multi-region options let you keep EU data in EU regions. But here's the gotcha: workflow execution data might temporarily traverse other regions during failover. Get this clarified in writing from Temporal.
Client-side encryption is mandatory for PII. Set this up early - retrofitting encryption is a nightmare.
Audit Logs for SIEM Integration
Audit logs capture everything and generate a shitload of data. Budget for log storage costs. The logs are JSON and integrate fine with Splunk/ELK/DataDog, but you'll need custom parsing rules.
Real Implementation Costs Nobody Warns You About
Certificate Management Is Expensive
mTLS certificate lifecycle management isn't free:
- Certificate authority licensing: $10k+/year
- HSM for key storage: $5k+/month
- Staff time for rotation procedures: 4-6 hours per rotation
- Emergency rotation during incidents: 2-4 hours downtime
Performance Impact Is Real
Authentication adds 20-50ms per workflow start. Sounds trivial until you're doing 10k workflows/minute. We saw 15% performance degradation with full mTLS compared to API keys.
API Key Rotation Automation
90-day rotation sounds reasonable until you realize you have 47 services that need coordinated key updates. Automate this or you'll spend weekends rotating keys manually.
The authentication stack above handles the "how do we secure access" problem. But once auditors show up, you'll discover that technical security is just the foundation. Real enterprise security means proving you're compliant with a dozen regulations you've never heard of.