I've been debugging Azure AI deployments for the past year and here's what nobody tells you about Foundry. Microsoft took their scattered AI services - OpenAI, Computer Vision, Speech, Document Intelligence - and crammed them into one platform. The good news: you don't have to manage 15 different service endpoints anymore. The bad news: your Azure bill is about to make your mortgage look reasonable.
How The New Architecture Actually Works
Instead of juggling separate services, Azure AI Foundry gives you one resource that handles everything. It's basically Microsoft.CognitiveServices/account with kind "AIServices" - think of it like a hub that connects to all the AI models and services you need.
The smart part is project isolation. Each AI app gets its own sandbox with dedicated managed identities and resource allocation. This fixes the biggest clusterfuck with shared services - when one project goes down, it doesn't kill everything else.
The Agent Service is Microsoft's attempt at building orchestration so you don't have to. It handles the coordination between models, tools, and data sources. Works great until you need to debug why it's doing something stupid. For complex scenarios, consider custom orchestration with Semantic Kernel or LangChain.
GPT-5: The Reality Check
GPT-5 landed in Azure AI Foundry and it's actually impressive, but the regional limitations will make you question Microsoft's geographic strategy.
The Geographic Nightmare
GPT-5 needs special approval and only works in East US 2 and Sweden Central. That's it. No other regions. So if your users are in Asia or your compliance team wants EU data residency, you're looking at cross-region latency and extra costs.
The 20,000 tokens per minute limit means GPT-5 isn't viable for anything with real traffic. Peak hours? Good luck getting capacity. Microsoft doesn't guarantee anything, so your app might just timeout with a helpful "RateLimitExceeded: Request was throttled. Please retry after 5 seconds"
error.
Which Model Actually Makes Sense
There are four GPT-5 variants and the pricing will hurt:
- gpt-5: Costs more than a weekend in Vegas - $10 per million output tokens. Save this for when you actually need the brain power.
- gpt-5-mini: Still pricey but manageable - probably what you want for most stuff
- gpt-5-nano: Actually affordable for once - good for simple tasks where speed matters
- gpt-5-chat: Same wallet-draining pricing as regular GPT-5, just optimized for conversations
Use gpt-5-mini for 80% of your use cases. The expensive models will bankrupt your project if you're not careful.
Security: Where Microsoft's Docs Fall Short
Microsoft's reference architecture is comprehensive but misses the important operational shit that will bite you in production:
Private Endpoints Are Mandatory
Your security team will demand private endpoints and they're right. This means:
- No public internet access to your AI services (good)
- Azure Firewall to control what your agents can reach (necessary but painful)
- Hub-and-spoke network topology (expensive but proper)
- DNS private zones that you'll spend a day configuring
The gotcha: your developers can't access the portal without VPN once you enable private endpoints. Prepare for help desk tickets on Monday morning when half your team forgets.
Identity Management Pain Points
The standard agent setup gets messy fast. Each project gets a managed identity that has broad permissions on its Cosmos DB, Storage, and AI Search. This is both a blessing and a curse.
Your AI Foundry Setup:
├── Customer Service Project
│ ├── Managed Identity → can read/write customer Cosmos DB
│ └── Cosmos DB (customer-service-cosmos)
├── HR Analytics Project
│ ├── Managed Identity → can read/write HR Cosmos DB
│ └── Cosmos DB (hr-analytics-cosmos)
└── Shared Key Vault (because someone has to store secrets)
The problem: those managed identities have elevated permissions they probably don't need. Don't share resources between projects or you'll regret it when one project brings down another.
Where Your AWS Bill Gets Scary
Microsoft's pricing calculator doesn't show the full picture. Here's where your money goes:
Pay-per-token vs Reserved Capacity
Deployment Type | When to Use | Cost Structure | Reality Check |
---|---|---|---|
Standard | Development, testing | $0.25-$10.00/1M tokens | Bills spike unexpectedly, gets throttled |
Provisioned (PTU) | Production with steady load | ~$1-50/hour per model | 70% cheaper if you can predict usage |
Data Zone | Compliance requirements | 10% more expensive | Limited regions, necessary evil |
The Infrastructure Tax
Agent Service drags along these dependencies whether you want them or not:
- Cosmos DB: Starts around $200/month, easily hits $800-1,200 for real workloads
- AI Search: You need S1 minimum for production - budget $400-600/month
- Storage: Depends on how much crap your users upload, usually $100-300/month
- Networking: Private Link + VPN + Firewall adds up to $500-700/month
Plan on dropping at least 2 grand a month, probably closer to 4-5K, before you even touch any AI models. Yeah, it'll make your CFO cry into their spreadsheets.
Deployment Patterns That Scale
Understanding the infrastructure costs is just the first step. Now let's talk about how to actually architect these deployments for production reliability without bankrupting your organization.
Multi-Environment Strategy
Development Environment:
- Single AI Foundry account
- Standard deployments
- Shared dependencies
- Cost: ~$300-500/month
Production Environment:
- Dedicated AI Foundry account
- Provisioned throughput (PTU)
- Zone-redundant dependencies
- Private endpoints + Firewall
- Cost: ~$2,000-8,000/month
High Availability Configuration
For enterprise SLAs, deploy across availability zones with proper disaster recovery planning:
- Azure Cosmos DB: Zone redundancy enabled
- AI Search: 3+ replicas across zones
- Storage: Zone-redundant storage (ZRS)
- Models: Global deployment + Data Zone fallback
Consider using Azure Site Recovery for critical workloads and Azure Backup for data protection.
Performance and Reliability Considerations
Agent Service Limitations
Foundry Agent Service provides codeless orchestration but with trade-offs:
- Nondeterministic behavior: Agents may invoke any connected tool/knowledge source unpredictably
- Limited customization: No fine-grained control over orchestration flow
- Token usage: No built-in limits on max_tokens or request throttling
- Debugging complexity: Less visibility into orchestration decisions
For applications requiring deterministic behavior or cost control, consider self-hosted orchestration using Semantic Kernel.
Monitoring and Observability
Production deployments require comprehensive monitoring beyond basic metrics:
- Token usage tracking across all models and projects
- Agent invocation patterns and tool usage analytics
- Performance monitoring for all dependencies (Cosmos DB RU consumption, AI Search query latency)
- Security monitoring for potential jailbreak attempts via Azure AI Content Safety
The reality: Most enterprises underestimate the operational overhead of monitoring distributed AI services at scale.
Migration Strategy from Legacy Azure AI Services
If you're currently using individual Azure AI Services (Azure OpenAI, Computer Vision, etc.), migration to Azure AI Foundry requires careful planning:
Assessment Phase (2-4 weeks)
- Inventory existing services and their regional deployments
- Map current authentication patterns (keys vs managed identity)
- Document integration points and custom orchestration logic
- Calculate current monthly costs vs Foundry pricing
Migration Phase (6-12 weeks)
- Establish Foundry account in target regions
- Set up private networking and security controls
- Deploy parallel infrastructure without cutting over traffic
- Test Agent Service vs custom orchestration performance
- Gradual traffic migration with rollback capability
Key decision point: Organizations with complex custom orchestration logic should evaluate whether Agent Service meets their needs before committing to migration.
The decision ultimately comes down to operational complexity vs. cost control. Microsoft unified their AI services and created a decent platform, but expect your baseline costs to increase significantly. The platform simplifies management at the cost of flexibility and predictable pricing - a trade-off most enterprises find acceptable once they factor in reduced operational overhead.